summaryrefslogtreecommitdiffstats
path: root/fs/dlm/lock.c
AgeCommit message (Collapse)AuthorFilesLines
2022-05-02dlm: use kref_put_lock in __put_lkbAlexander Aring1-6/+6
This patch will optimize __put_lkb() by using kref_put_lock(). The function kref_put_lock() will only take the lock if the reference is going to be zero, if not the lock will never be held. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-05-02dlm: use kref_put_lock in put_rsbAlexander Aring1-3/+5
This patch will optimize put_rsb() by using kref_put_lock(). The function kref_put_lock() will only take the lock if the reference is going to be zero, if not the lock will never be held. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-05-02dlm: remove unnecessary error assignAlexander Aring1-3/+0
This patch removes unnecessary error assigns to 0 at places we know that error is zero because it was checked on non-zero before. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-05-02dlm: fix missing lkb refcount handlingAlexander Aring1-2/+9
We always call hold_lkb(lkb) if we increment lkb->lkb_wait_count. So, we always need to call unhold_lkb(lkb) if we decrement lkb->lkb_wait_count. This patch will add missing unhold_lkb(lkb) if we decrement lkb->lkb_wait_count. In case of setting lkb->lkb_wait_count to zero we need to countdown until reaching zero and call unhold_lkb(lkb). The waiters list unhold_lkb(lkb) can be removed because it's done for the last lkb_wait_count decrement iteration as it's done in _remove_from_waiters(). This issue was discovered by a dlm gfs2 test case which use excessively dlm_unlock(LKF_CANCEL) feature. Probably the lkb->lkb_wait_count value never reached above 1 if this feature isn't used and so it was not discovered before. The testcase ended in a rsb on the rsb keep data structure with a refcount of 1 but no lkb was associated with it, which is itself an invalid behaviour. A side effect of that was a condition in which the dlm was sending remove messages in a looping behaviour. With this patch that has not been reproduced. Cc: stable@vger.kernel.org Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06dlm: replace usage of found with dedicated list iterator variableJakob Koschel1-28/+25
To move the list iterator variable into the list_for_each_entry_*() macro in the future it should be avoided to use the list iterator variable after the loop body. To *never* use the list iterator variable after the loop it was concluded to use a separate iterator variable instead of a found boolean [1]. This removes the need to use a found variable and simply checking if the variable was set, can determine if the break/goto was hit. Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1] Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06dlm: remove usage of list iterator for list_add() after the loop bodyJakob Koschel1-4/+8
In preparation to limit the scope of a list iterator to the list traversal loop, use a dedicated pointer to point to the found element [1]. Before, the code implicitly used the head when no element was found when using &pos->list. Since the new variable is only set if an element was found, the list_add() is performed within the loop and only done after the loop if it is done on the list head directly. Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1] Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06dlm: fix pending remove if msg allocation failsAlexander Aring1-1/+2
This patch unsets ls_remove_len and ls_remove_name if a message allocation of a remove messages fails. In this case we never send a remove message out but set the per ls ls_remove_len ls_remove_name variable for a pending remove. Unset those variable should indicate possible waiters in wait_pending_remove() that no pending remove is going on at this moment. Cc: stable@vger.kernel.org Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06dlm: fix wake_up() calls for pending removeAlexander Aring1-2/+2
This patch move the wake_up() call at the point when a remove message completed. Before it was only when a remove message was going to be sent. The possible waiter in wait_pending_remove() waits until a remove is done if the resource name matches with the per ls variable ls->ls_remove_name. If this is the case we must wait until a pending remove is done which is indicated if DLM_WAIT_PENDING_COND() returns false which will always be the case when ls_remove_len and ls_remove_name are unset to indicate that a remove is not going on anymore. Fixes: 21d9ac1a5376 ("fs: dlm: use event based wait for pending remove") Cc: stable@vger.kernel.org Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06dlm: cleanup lock handling in dlm_master_lookupAlexander Aring1-87/+102
This patch will remove the following warning by sparse: fs/dlm/lock.c:1049:9: warning: context imbalance in 'dlm_master_lookup' - different lock contexts for basic block I tried to find any issues with the current handling and I did not find any. However it is hard to follow the lock handling in this area of dlm_master_lookup() and I suppose that sparse cannot realize that there are no issues. The variable "toss_list" makes it really hard to follow the lock handling because if it's set the rsb lock/refcount isn't held but the ls->ls_rsbtbl[b].lock is held and this is one reason why the rsb lock/refcount does not need to be held. If it's not set the ls->ls_rsbtbl[b].lock is not held but the rsb lock/refcount is held. The indicator of toss_list will be used to store the actual lock state. Another possibility is that a retry can happen and then it's hard to follow the specific code part. I did not find any issues but sparse cannot realize that there are no issues. To make it more easier to understand for developers and sparse as well, we remove the toss_list variable which indicates a specific lock state and move handling in between of this lock state in a separate function. This function can be called now in case when the initial lock states are taken which was previously signalled if toss_list was set or not. The advantage here is that we can release all locks/refcounts in mostly the same code block as it was taken. Afterwards sparse had no issues to figure out that there are no problems with the current lock behaviour. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06dlm: remove found label in dlm_master_lookupAlexander Aring1-9/+9
This patch cleanups a not necessary label found which can be replaced by a proper else handling to jump over a specific code block. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06dlm: use __le types for dlm messagesAlexander Aring1-133/+143
This patch changes to use __le types directly in the dlm message structure which is casted at the right dlm message buffer positions. The main goal what is reached here is to remove sparse warnings regarding to host to little byte order conversion or vice versa. Leaving those sparse issues ignored and always do it in out/in functionality tends to leave it unknown in which byte order the variable is being handled. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06dlm: use __le types for rcom messagesAlexander Aring1-2/+1
This patch changes to use __le types directly in the dlm rcom structure which is casted at the right dlm message buffer positions. The main goal what is reached here is to remove sparse warnings regarding to host to little byte order conversion or vice versa. Leaving those sparse issues ignored and always do it in out/in functionality tends to leave it unknown in which byte order the variable is being handled. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06dlm: use __le types for dlm headerAlexander Aring1-43/+52
This patch changes to use __le types directly in the dlm header structure which is casted at the right dlm message buffer positions. The main goal what is reached here is to remove sparse warnings regarding to host to little byte order conversion or vice versa. Leaving those sparse issues ignored and always do it in out/in functionality tends to leave it unknown in which byte order the variable is being handled. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06dlm: fix missing check in validate_lock_argsAlexander Aring1-1/+2
This patch adds a additional check if lkb->lkb_wait_count is non zero as it is done in validate_unlock_args() to check if any operation is in progress. While on it add a comment taken from validate_unlock_args() to signal what the check is doing. There might be no changes because if lkb->lkb_wait_type is non zero implies that lkb->lkb_wait_count is non zero. However we should add the check as it does validate_unlock_args(). Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-12-07fs: dlm: use event based wait for pending removeAlexander Aring1-7/+12
This patch will use an event based waitqueue to wait for a possible clash with the ls_remove_name field of dlm_ls instead of doing busy waiting. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02fs: dlm: filter user dlm messages for kernel locksAlexander Aring1-0/+9
This patch fixes the following crash by receiving a invalid message: [ 160.672220] ================================================================== [ 160.676206] BUG: KASAN: user-memory-access in dlm_user_add_ast+0xc3/0x370 [ 160.679659] Read of size 8 at addr 00000000deadbeef by task kworker/u32:13/319 [ 160.681447] [ 160.681824] CPU: 10 PID: 319 Comm: kworker/u32:13 Not tainted 5.14.0-rc2+ #399 [ 160.683472] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.14.0-1.module+el8.6.0+12648+6ede71a5 04/01/2014 [ 160.685574] Workqueue: dlm_recv process_recv_sockets [ 160.686721] Call Trace: [ 160.687310] dump_stack_lvl+0x56/0x6f [ 160.688169] ? dlm_user_add_ast+0xc3/0x370 [ 160.689116] kasan_report.cold.14+0x116/0x11b [ 160.690138] ? dlm_user_add_ast+0xc3/0x370 [ 160.690832] dlm_user_add_ast+0xc3/0x370 [ 160.691502] _receive_unlock_reply+0x103/0x170 [ 160.692241] _receive_message+0x11df/0x1ec0 [ 160.692926] ? rcu_read_lock_sched_held+0xa1/0xd0 [ 160.693700] ? rcu_read_lock_bh_held+0xb0/0xb0 [ 160.694427] ? lock_acquire+0x175/0x400 [ 160.695058] ? do_purge.isra.51+0x200/0x200 [ 160.695744] ? lock_acquired+0x360/0x5d0 [ 160.696400] ? lock_contended+0x6a0/0x6a0 [ 160.697055] ? lock_release+0x21d/0x5e0 [ 160.697686] ? lock_is_held_type+0xe0/0x110 [ 160.698352] ? lock_is_held_type+0xe0/0x110 [ 160.699026] ? ___might_sleep+0x1cc/0x1e0 [ 160.699698] ? dlm_wait_requestqueue+0x94/0x140 [ 160.700451] ? dlm_process_requestqueue+0x240/0x240 [ 160.701249] ? down_write_killable+0x2b0/0x2b0 [ 160.701988] ? do_raw_spin_unlock+0xa2/0x130 [ 160.702690] dlm_receive_buffer+0x1a5/0x210 [ 160.703385] dlm_process_incoming_buffer+0x726/0x9f0 [ 160.704210] receive_from_sock+0x1c0/0x3b0 [ 160.704886] ? dlm_tcp_shutdown+0x30/0x30 [ 160.705561] ? lock_acquire+0x175/0x400 [ 160.706197] ? rcu_read_lock_sched_held+0xa1/0xd0 [ 160.706941] ? rcu_read_lock_bh_held+0xb0/0xb0 [ 160.707681] process_recv_sockets+0x32/0x40 [ 160.708366] process_one_work+0x55e/0xad0 [ 160.709045] ? pwq_dec_nr_in_flight+0x110/0x110 [ 160.709820] worker_thread+0x65/0x5e0 [ 160.710423] ? process_one_work+0xad0/0xad0 [ 160.711087] kthread+0x1ed/0x220 [ 160.711628] ? set_kthread_struct+0x80/0x80 [ 160.712314] ret_from_fork+0x22/0x30 The issue is that we received a DLM message for a user lock but the destination lock is a kernel lock. Note that the address which is trying to derefence is 00000000deadbeef, which is in a kernel lock lkb->lkb_astparam, this field should never be derefenced by the DLM kernel stack. In case of a user lock lkb->lkb_astparam is lkb->lkb_ua (memory is shared by a union field). The struct lkb_ua will be handled by the DLM kernel stack but on a kernel lock it will contain invalid data and ends in most likely crashing the kernel. It can be reproduced with two cluster nodes. node 2: dlm_tool join test echo "862 fooobaar 1 2 1" > /sys/kernel/debug/dlm/test_locks echo "862 3 1" > /sys/kernel/debug/dlm/test_waiters node 1: dlm_tool join test python: foo = DLM(h_cmd=3, o_nextcmd=1, h_nodeid=1, h_lockspace=0x77222027, \ m_type=7, m_flags=0x1, m_remid=0x862, m_result=0xFFFEFFFE) newFile = open("/sys/kernel/debug/dlm/comms/2/rawmsg", "wb") newFile.write(bytes(foo)) Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02fs: dlm: add lkb waiters debugfs functionalityAlexander Aring1-0/+15
This patch adds functionality to put a lkb to the waiters state. It can be useful to combine this feature with the "rawmsg" debugfs functionality. It will bring the DLM lkb into a state that a message will be parsed by the kernel. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02fs: dlm: add lkb debugfs functionalityAlexander Aring1-0/+46
This patch adds functionality to add an lkb during runtime. This is a highly debugging feature only, wrong input can crash the kernel. It is a early state feature as well. The goal is to provide a user interface for manipulate dlm state and combine it with the rawmsg feature. It is debugfs functionality, we don't care about UAPI breakage. Even it's possible to add lkb's/rsb's which could never be exists in such wat by using normal DLM operation. The user of this interface always need to think before using this feature, not every crash which happens can really occur during normal dlm operation. Future there should be more functionality to add a more realistic lkb which reflects normal DLM state inside the kernel. For now this is enough. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02fs: dlm: allow create lkb with specific id rangeAlexander Aring1-2/+8
This patch adds functionality to add a lkb with a specific id range. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02fs: dlm: initial support for tracepointsAlexander Aring1-0/+10
This patch adds initial support for dlm tracepoints. It will introduce tracepoints to dlm main functionality dlm_lock()/dlm_unlock() and their complete ast() callback or blocking bast() callback. The lock/unlock functionality has a start and end tracepoint, this is because there exists a race in case if would have a tracepoint at the end position only the complete/blocking callbacks could occur before. To work with eBPF tracing and using their lookup hash functionality there could be problems that an entry was not inserted yet. However use the start functionality for hash insert and check again in end functionality if there was an dlm internal error so there is no ast callback. In further it might also that locks with local masters will occur those callbacks immediately so we must have such functionality. I did not make everything accessible yet, although it seems eBPF can be used to access a lot of internal datastructures if it's aware of the struct definitions of the running kernel instance. We still can change it, if you do eBPF experiments e.g. time measurements between lock and callback functionality you can simple use the local lkb_id field as hash value in combination with the lockspace id if you have multiple lockspaces. Otherwise you can simple use trace-cmd for some functionality, e.g. `trace-cmd record -e dlm` and `trace-cmd report` afterwards. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-05-25fs: dlm: add union in dlm header for lockspace idAlexander Aring1-4/+4
This patch adds union inside the lockspace id to handle it also for another use case for a different dlm command. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-05-25fs: dlm: add more midcomms hooksAlexander Aring1-4/+4
This patch prepares hooks to redirect to the midcomms layer which will be used by the midcomms re-transmit handling. There exists the new concept of stateless buffers allocation and commits. This can be used to bypass the midcomms re-transmit handling. It is used by RCOM_STATUS and RCOM_NAMES messages, because they have their own ping-like re-transmit handling. As well these two messages will be used to determine the DLM version per node, because these two messages are per observation the first messages which are exchanged. Cluster manager events for node membership are added to add support for half-closed connections in cases that the peer connection get to an end of file but DLM still holds membership of the node. In this time DLM can still trigger new message which we should allow. After the cluster manager node removal event occurs it safe to close the connection. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-03-09fs: dlm: use GFP_ZERO for page bufferAlexander Aring1-2/+0
This patch uses GFP_ZERO for allocate a page for the internal dlm sending buffer allocator instead of calling memset zero after every allocation. An already allocated space will never be reused again. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2020-08-23treewide: Use fallthrough pseudo-keywordGustavo A. R. Silva1-1/+1
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 193Thomas Gleixner1-3/+1
Based on 1 normalized pattern(s): this copyrighted material is made available to anyone wishing to use modify copy or redistribute it subject to the terms and conditions of the gnu general public license v 2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 45 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Steve Winslow <swinslow@gmail.com> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190528170027.342746075@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-15dlm: memory leaks on error path in dlm_user_request()Vasily Averin1-7/+7
According to comment in dlm_user_request() ua should be freed in dlm_free_lkb() after successful attach to lkb. However ua is attached to lkb not in set_lock_args() but later, inside request_lock(). Fixes 597d0cae0f99 ("[DLM] dlm: user locks") Cc: stable@kernel.org # 2.6.19 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David Teigland <teigland@redhat.com>
2018-11-15dlm: lost put_lkb on error path in receive_convert() and receive_unlock()Vasily Averin1-0/+2
Fixes 6d40c4a708e0 ("dlm: improve error and debug messages") Cc: stable@kernel.org # 3.5 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David Teigland <teigland@redhat.com>
2018-11-15dlm: possible memory leak on error path in create_lkb()Vasily Averin1-0/+1
Fixes 3d6aa675fff9 ("dlm: keep lkbs in idr") Cc: stable@kernel.org # 3.1 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David Teigland <teigland@redhat.com>
2017-10-09dlm: remove dlm_send_rcom_lookup_dumpDavid Teigland1-1/+0
This function was only for debugging. It would be called in a condition that should not happen, and should probably have been removed from the final version of the original commit. Remove it because it does mutex lock under spin lock. Signed-off-by: David Teigland <teigland@redhat.com>
2017-09-25DLM: fix conversion deadlock when DLM_LKF_NODLCKWT flag is settsutomu.owa@toshiba.co.jp1-19/+23
When the DLM_LKF_NODLCKWT flag was set, even if conversion deadlock was detected, the caller of can_be_granted() was unknown. We change the behavior of can_be_granted() and change it to detect conversion deadlock regardless of whether the DLM_LKF_NODLCKWT flag is set or not. And depending on whether the DLM_LKF_NODLCKWT flag is set or not, we change the behavior at the caller of can_be_granted(). This fix has no effect except when using DLM_LKF_NODLCKWT flag. Currently, ocfs2 uses the DLM_LKF_NODLCKWT flag and does not expect a cancel operation from conversion deadlock when calling dlm_lock(). ocfs2 is implemented to perform a cancel operation by requesting BASTs (callback). Signed-off-by: Tadashi Miyauchi <miyauchi@toshiba-tops.co.jp> Signed-off-by: Tsutomu Owa <tsutomu.owa@toshiba.co.jp> Signed-off-by: David Teigland <teigland@redhat.com>
2017-08-07dlm: Delete an error message for a failed memory allocation in ↵Markus Elfring1-3/+1
dlm_recover_waiters_pre() Omit an extra message for a memory allocation failure in this function. Link: http://events.linuxfoundation.org/sites/events/files/slides/LCJ16-Refactor_Strings-WSang_0.pdf Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: David Teigland <teigland@redhat.com>
2017-08-07dlm: Improve a size determination in dlm_recover_waiters_pre()Markus Elfring1-1/+1
Replace the specification of a data structure by a pointer dereference as the parameter for the operator "sizeof" to make the corresponding size determination a bit safer according to the Linux coding style convention. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: David Teigland <teigland@redhat.com>
2017-08-07dlm: Use kcalloc() in dlm_scan_waiters()Markus Elfring1-1/+1
A multiplication for the size determination of a memory allocation indicated that an array data structure should be processed. Thus use the corresponding function "kcalloc". This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: David Teigland <teigland@redhat.com>
2016-12-25ktime: Get rid of ktime_equal()Thomas Gleixner1-3/+2
No point in going through loops and hoops instead of just comparing the values. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>
2016-12-25ktime: Cleanup ktime_set() usageThomas Gleixner1-1/+1
ktime_set(S,N) was required for the timespec storage type and is still useful for situations where a Seconds and Nanoseconds part of a time value needs to be converted. For anything where the Seconds argument is 0, this is pointless and can be replaced with a simple assignment. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>
2014-11-19dlm: adopt orphan locksDavid Teigland1-2/+74
A process may exit, leaving an orphan lock in the lockspace. This adds the capability for another process to acquire the orphan lock. Acquiring the orphan just moves the lock from the orphan list onto the acquiring process's list of locks. An adopting process must specify the resource name and mode of the lock it wants to adopt. If a matching lock is found, the lock is moved to the caller's 's list of locks, and the lkid of the lock is returned like the lkid of a new lock. If an orphan with a different mode is found, then -EAGAIN is returned. If no orphan lock is found on the resource, then -ENOENT is returned. No async completion is used because the result is immediately available. Also, when orphans are purged, allow a zero nodeid to refer to the local nodeid so the caller does not need to look up the local nodeid. Signed-off-by: David Teigland <teigland@redhat.com>
2014-02-14dlm: use INFO for recovery messagesDavid Teigland1-3/+3
The log messages relating to the progress of recovery are minimal and very often useful. Change these to the KERN_INFO level so they are always available. Signed-off-by: David Teigland <teigland@redhat.com>
2014-02-12dlm: silence a harmless use after free warningDan Carpenter1-0/+1
We pass the freed "r" pointer back to the caller. It's harmless but it upsets the static checkers. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Teigland <teigland@redhat.com>
2013-06-26dlm: Avoid LVB truncationBart Van Assche1-4/+4
For lockspaces with an LVB length above 64 bytes, avoid truncating the LVB while exchanging it with another node in the cluster. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: David Teigland <teigland@redhat.com>
2013-02-27dlm: convert to idr_alloc()Tejun Heo1-12/+6
Convert to the much saner new idr interface. Error return values from recover_idr_add() mix -1 and -errno. The conversion doesn't change that but it looks iffy. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-07dlm: avoid scanning unchanged toss listsDavid Teigland1-0/+15
Keep track of whether a toss list contains any shrinkable rsbs. If not, dlm_scand can avoid scanning the list for rsbs to shrink. Unnecessary scanning can otherwise waste a lot of time because the toss lists can contain a large number of rsbs that are non-shrinkable (directory records). Signed-off-by: David Teigland <teigland@redhat.com>
2012-11-16dlm: fix lvb invalidation conditionsDavid Teigland1-3/+13
When a node is removed that held a PW/EX lock, the existing master node should invalidate the lvb on the resource due to the purged lock. Previously, the existing master node was invalidating the lvb if it found only NL/CR locks on the resource during recovery for the removed node. This could lead to cases where it invalidated the lvb and shouldn't have, or cases where it should have invalidated and didn't. When recovery selects a *new* master node for a resource, and that new master finds only NL/CR locks on the resource after lock recovery, it should invalidate the lvb. This case was handled correctly (but was incorrectly applied to the existing master case also.) When a process exits while holding a PW/EX lock, the lvb on the resource should be invalidated. This was not happening. The lvb contents and VALNOTVALID flag should be recovered before granting locks in recovery so that the recovered lvb state is provided in the callback. The lvb was being recovered after the lock was granted. Signed-off-by: David Teigland <teigland@redhat.com>
2012-07-16dlm: fix missing dir removeDavid Teigland1-2/+68
I don't know exactly how, but in some cases, a dir record is not removed, or a new one is created when it shouldn't be. The result is that the dir node lookup returns a master node where the rsb does not exist. In this case, The master node will repeatedly return -EBADR for requests, and the lock requests will be stuck. Until all possible ways for this to happen can be eliminated, a simple and effective way to recover from this situation is for the supposed master node to send a standard remove message to the dir node when it receives a request for a resource it has no rsb for. Signed-off-by: David Teigland <teigland@redhat.com>
2012-07-16dlm: fix conversion deadlock from recoveryDavid Teigland1-15/+40
The process of rebuilding locks on a new master during recovery could re-order the locks on the convert queue, creating an "in place" conversion deadlock that would not be resolved. Fix this by not considering queue order when granting conversions after recovery. Signed-off-by: David Teigland <teigland@redhat.com>
2012-07-16dlm: fix race between remove and lookupDavid Teigland1-37/+144
It was possible for a remove message on an old rsb to be sent after a lookup message on a new rsb, where the rsbs were for the same resource name. This could lead to a missing directory entry for the new rsb. It is fixed by keeping a copy of the resource name being removed until after the remove has been sent. A lookup checks if this in-progress remove matches the name it is looking up. Signed-off-by: David Teigland <teigland@redhat.com>
2012-07-16dlm: use rsbtbl as resource directoryDavid Teigland1-203/+819
Remove the dir hash table (dirtbl), and use the rsb hash table (rsbtbl) as the resource directory. It has always been an unnecessary duplication of information. This improves efficiency by using a single rsbtbl lookup in many cases where both rsbtbl and dirtbl lookups were needed previously. This eliminates the need to handle cases of rsbtbl and dirtbl being out of sync. In many cases there will be memory savings because the dir hash table no longer exists. Signed-off-by: David Teigland <teigland@redhat.com>
2012-05-02dlm: fixes for nodir modeDavid Teigland1-89/+197
The "nodir" mode (statically assign master nodes instead of using the resource directory) has always been highly experimental, and never seriously used. This commit fixes a number of problems, making nodir much more usable. - Major change to recovery: recover all locks and restart all in-progress operations after recovery. In some cases it's not possible to know which in-progess locks to recover, so recover all. (Most require recovery in nodir mode anyway since rehashing changes most master nodes.) - Change the way nodir mode is enabled, from a command line mount arg passed through gfs2, into a sysfs file managed by dlm_controld, consistent with the other config settings. - Allow recovering MSTCPY locks on an rsb that has not yet been turned into a master copy. - Ignore RCOM_LOCK and RCOM_LOCK_REPLY recovery messages from a previous, aborted recovery cycle. Base this on the local recovery status not being in the state where any nodes should be sending LOCK messages for the current recovery cycle. - Hold rsb lock around dlm_purge_mstcpy_locks() because it may run concurrently with dlm_recover_master_copy(). - Maintain highbast on process-copy lkb's (in addition to the master as is usual), because the lkb can switch back and forth between being a master and being a process copy as the master node changes in recovery. - When recovering MSTCPY locks, flag rsb's that have non-empty convert or waiting queues for granting at the end of recovery. (Rename flag from LOCKS_PURGED to RECOVER_GRANT and similar for the recovery function, because it's not only resources with purged locks that need grant a grant attempt.) - Replace a couple of unnecessary assertion panics with error messages. Signed-off-by: David Teigland <teigland@redhat.com>
2012-04-26dlm: improve error and debug messagesDavid Teigland1-85/+156
Change some existing error/debug messages to collect more useful information, and add some new error/debug messages to address recently found problems. Signed-off-by: David Teigland <teigland@redhat.com>
2012-04-26dlm: avoid unnecessary search in search_rsbDavid Teigland1-0/+3
If the rsb is found in the "keep" tree, but is not the right type (i.e. not MASTER), we can return immediately with the result. There's no point in going on to search the "toss" list as if we hadn't found it. Signed-off-by: David Teigland <teigland@redhat.com>
2012-04-26dlm: fix waiter recoveryDavid Teigland1-12/+31
An outstanding remote operation (an lkb on the "waiter" list) could sometimes miss being resent during recovery. The decision was based on the lkb_nodeid field, which could have changed during an earlier aborted recovery, so it no longer represents the actual remote destination. The lkb_wait_nodeid is always the actual remote node, so it is the best value to use. Signed-off-by: David Teigland <teigland@redhat.com>