From e6d265e8504ab4a3368b8645d318b344ee88b280 Mon Sep 17 00:00:00 2001 From: Jay Vosburgh Date: Fri, 28 Oct 2011 15:42:50 +0000 Subject: bonding: eliminate bond_close race conditions This patch resolves two sets of race conditions. Mitsuo Hayasaka reported the first, as follows: The bond_close() calls cancel_delayed_work() to cancel delayed works. It, however, cannot cancel works that were already queued in workqueue. The bond_open() initializes work->data, and proccess_one_work() refers get_work_cwq(work)->wq->flags. The get_work_cwq() returns NULL when work->data has been initialized. Thus, a panic occurs. He included a patch that converted the cancel_delayed_work calls in bond_close to flush_delayed_work_sync, which eliminated the above problem. His patch is incorporated, at least in principle, into this patch. In this patch, we use cancel_delayed_work_sync in place of flush_delayed_work_sync, and also convert bond_uninit in addition to bond_close. This conversion to _sync, however, opens new races between bond_close and three periodically executing workqueue functions: bond_mii_monitor, bond_alb_monitor and bond_activebackup_arp_mon. The race occurs because bond_close and bond_uninit are always called with RTNL held, and these workqueue functions may acquire RTNL to perform failover-related activities. If bond_close or bond_uninit is waiting in cancel_delayed_work_sync, deadlock occurs. These deadlocks are resolved by having the workqueue functions acquire RTNL conditionally. If the rtnl_trylock() fails, the functions reschedule and return immediately. For the cases that are attempting to perform link failover, a delay of 1 is used; for the other cases, the normal interval is used (as those activities are not as time critical). Additionally, the bond_mii_monitor function now stores the delay in a variable (mimicing the structure of activebackup_arp_mon). Lastly, all of the above renders the kill_timers sentinel moot, and therefore it has been removed. Tested-by: Mitsuo Hayasaka Signed-off-by: Jay Vosburgh Signed-off-by: David S. Miller --- drivers/net/bonding/bond_3ad.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) (limited to 'drivers/net/bonding/bond_3ad.c') diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c index b33c099d65a4..0ae0d7c54ccf 100644 --- a/drivers/net/bonding/bond_3ad.c +++ b/drivers/net/bonding/bond_3ad.c @@ -2110,9 +2110,6 @@ void bond_3ad_state_machine_handler(struct work_struct *work) read_lock(&bond->lock); - if (bond->kill_timers) - goto out; - //check if there are any slaves if (bond->slave_cnt == 0) goto re_arm; @@ -2161,9 +2158,8 @@ void bond_3ad_state_machine_handler(struct work_struct *work) } re_arm: - if (!bond->kill_timers) - queue_delayed_work(bond->wq, &bond->ad_work, ad_delta_in_ticks); -out: + queue_delayed_work(bond->wq, &bond->ad_work, ad_delta_in_ticks); + read_unlock(&bond->lock); } -- cgit v1.2.3