From f31e583aa2c20892aca3add26957dee6ab80a534 Mon Sep 17 00:00:00 2001 From: Lars Ellenberg Date: Thu, 20 Dec 2018 17:23:42 +0100 Subject: drbd: introduce P_ZEROES (REQ_OP_WRITE_ZEROES on the "wire") And also re-enable partial-zero-out + discard aligned. With the introduction of REQ_OP_WRITE_ZEROES, we started to use that for both WRITE_ZEROES and DISCARDS, hoping that WRITE_ZEROES would "do what we want", UNMAP if possible, zero-out the rest. The example scenario is some LVM "thin" backend. While an un-allocated block on dm-thin reads as zeroes, on a dm-thin with "skip_block_zeroing=true", after a partial block write allocated that block, that same block may well map "undefined old garbage" from the backends on LBAs that have not yet been written to. If we cannot distinguish between zero-out and discard on the receiving side, to avoid "undefined old garbage" to pop up randomly at later times on supposedly zero-initialized blocks, we'd need to map all discards to zero-out on the receiving side. But that would potentially do a full alloc on thinly provisioned backends, even when the expectation was to unmap/trim/discard/de-allocate. We need to distinguish on the protocol level, whether we need to guarantee zeroes (and thus use zero-out, potentially doing the mentioned full-alloc), or if we want to put the emphasis on discard, and only do a "best effort zeroing" (by "discarding" blocks aligned to discard-granularity, and zeroing only potential unaligned head and tail clippings to at least *try* to avoid "false positives" in an online-verify later), hoping that someone set skip_block_zeroing=false. For some discussion regarding this on dm-devel, see also https://www.mail-archive.com/dm-devel%40redhat.com/msg07965.html https://www.redhat.com/archives/dm-devel/2018-January/msg00271.html For backward compatibility, P_TRIM means zero-out, unless the DRBD_FF_WZEROES feature flag is agreed upon during handshake. To have upper layers even try to submit WRITE ZEROES requests, we need to announce "efficient zeroout" independently. We need to fixup max_write_zeroes_sectors after blk_queue_stack_limits(): if we can handle "zeroes" efficiently on the protocol, we want to do that, even if our backend does not announce max_write_zeroes_sectors itself. Signed-off-by: Lars Ellenberg Signed-off-by: Jens Axboe --- include/linux/drbd.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/drbd.h b/include/linux/drbd.h index 2d0259327721..a19d98367f08 100644 --- a/include/linux/drbd.h +++ b/include/linux/drbd.h @@ -51,7 +51,7 @@ #endif extern const char *drbd_buildtag(void); -#define REL_VERSION "8.4.10" +#define REL_VERSION "8.4.11" #define API_VERSION 1 #define PRO_VERSION_MIN 86 #define PRO_VERSION_MAX 101 -- cgit v1.2.3 From a52c5a16cf19d8a85831bb1b915a221dd4ffae3c Mon Sep 17 00:00:00 2001 From: Nathan Chancellor Date: Thu, 20 Dec 2018 17:23:43 +0100 Subject: drbd: Avoid Clang warning about pointless switch statment There are several warnings from Clang about no case statement matching the constant 0: In file included from drivers/block/drbd/drbd_receiver.c:48: In file included from drivers/block/drbd/drbd_int.h:48: In file included from ./include/linux/drbd_genl_api.h:54: In file included from ./include/linux/genl_magic_struct.h:236: ./include/linux/drbd_genl.h:321:1: warning: no case matching constant switch condition '0' GENL_struct(DRBD_NLA_HELPER, 24, drbd_helper_info, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ./include/linux/genl_magic_struct.h:220:10: note: expanded from macro 'GENL_struct' switch (0) { ^ Silence this warning by adding a 'case 0:' statement. Additionally, adjust the alignment of the statements in the ct_assert_unique macro to avoid a checkpatch warning. This solution was originally sent by Arnd Bergmann with a default case statement: https://lore.kernel.org/patchwork/patch/756723/ Link: https://github.com/ClangBuiltLinux/linux/issues/43 Suggested-by: Lars Ellenberg Signed-off-by: Nathan Chancellor Signed-off-by: Jens Axboe --- include/linux/genl_magic_struct.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/genl_magic_struct.h b/include/linux/genl_magic_struct.h index 5972e4969197..eeae59d3ceb7 100644 --- a/include/linux/genl_magic_struct.h +++ b/include/linux/genl_magic_struct.h @@ -191,6 +191,7 @@ static inline void ct_assert_unique_operations(void) { switch (0) { #include GENL_MAGIC_INCLUDE_FILE + case 0: ; } } @@ -209,6 +210,7 @@ static inline void ct_assert_unique_top_level_attributes(void) { switch (0) { #include GENL_MAGIC_INCLUDE_FILE + case 0: ; } } @@ -218,7 +220,8 @@ static inline void ct_assert_unique_top_level_attributes(void) static inline void ct_assert_unique_ ## s_name ## _attributes(void) \ { \ switch (0) { \ - s_fields \ + s_fields \ + case 0: \ ; \ } \ } -- cgit v1.2.3 From 6ab2187992f4b0112852e5a097a2b6c7d167e2e5 Mon Sep 17 00:00:00 2001 From: Dennis Zhou Date: Wed, 19 Dec 2018 16:43:21 -0600 Subject: blkcg: clean up blkg_tryget_closest() The implementation of blkg_tryget_closest() wasn't super obvious and became a point of suspicion when debugging [1]. So let's clean it up so it's obviously not the problem. Also add missing RCU read locking to bio_clone_blkg_association(), which got exposed by adding the RCU read lock held check in blkg_tryget_closest(). [1] https://lore.kernel.org/linux-block/a7e97e4b-0dd8-3a54-23b7-a0f27b17fde8@kernel.dk/ Signed-off-by: Dennis Zhou Signed-off-by: Jens Axboe --- block/bio.c | 4 ++++ include/linux/blk-cgroup.h | 21 ++++++++++++++++----- 2 files changed, 20 insertions(+), 5 deletions(-) (limited to 'include') diff --git a/block/bio.c b/block/bio.c index c288b9057042..9194d8ad3d5e 100644 --- a/block/bio.c +++ b/block/bio.c @@ -2096,8 +2096,12 @@ EXPORT_SYMBOL_GPL(bio_associate_blkg); */ void bio_clone_blkg_association(struct bio *dst, struct bio *src) { + rcu_read_lock(); + if (src->bi_blkg) __bio_associate_blkg(dst, src->bi_blkg); + + rcu_read_unlock(); } EXPORT_SYMBOL_GPL(bio_clone_blkg_association); #endif /* CONFIG_BLK_CGROUP */ diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h index f025fd1e22e6..76c61318fda5 100644 --- a/include/linux/blk-cgroup.h +++ b/include/linux/blk-cgroup.h @@ -499,22 +499,33 @@ static inline void blkg_get(struct blkcg_gq *blkg) */ static inline bool blkg_tryget(struct blkcg_gq *blkg) { - return percpu_ref_tryget(&blkg->refcnt); + return blkg && percpu_ref_tryget(&blkg->refcnt); } /** * blkg_tryget_closest - try and get a blkg ref on the closet blkg * @blkg: blkg to get * - * This walks up the blkg tree to find the closest non-dying blkg and returns - * the blkg that it did association with as it may not be the passed in blkg. + * This needs to be called rcu protected. As the failure mode here is to walk + * up the blkg tree, this ensure that the blkg->parent pointers are always + * valid. This returns the blkg that it ended up taking a reference on or %NULL + * if no reference was taken. */ static inline struct blkcg_gq *blkg_tryget_closest(struct blkcg_gq *blkg) { - while (blkg && !percpu_ref_tryget(&blkg->refcnt)) + struct blkcg_gq *ret_blkg = NULL; + + WARN_ON_ONCE(!rcu_read_lock_held()); + + while (blkg) { + if (blkg_tryget(blkg)) { + ret_blkg = blkg; + break; + } blkg = blkg->parent; + } - return blkg; + return ret_blkg; } /** -- cgit v1.2.3