summaryrefslogtreecommitdiffstats
path: root/lib/rhashtable.c
AgeCommit message (Collapse)AuthorFilesLines
2015-07-08rhashtable: fix for resize events during table walkPhil Sutter1-2/+2
If rhashtable_walk_next detects a resize operation in progress, it jumps to the new table and continues walking that one. But it misses to drop the reference to it's current item, leading it to continue traversing the new table's bucket in which the current item is sorted into, and after reaching that bucket's end continues traversing the new table's second bucket instead of the first one, thereby potentially missing items. This fixes the rhashtable runtime test for me. Bug probably introduced by Herbert Xu's patch eddee5ba ("rhashtable: Fix walker behaviour during rehash") although not explicitly tested. Fixes: eddee5ba ("rhashtable: Fix walker behaviour during rehash") Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-0/+1
2015-06-07rhashtable: add missing import <linux/export.h>Hauke Mehrtens1-0/+1
rhashtable uses EXPORT_SYMBOL_GPL() without importing linux/export.h directly it is only imported indirectly through some other includes. Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-0/+11
Conflicts: drivers/net/ethernet/cadence/macb.c drivers/net/phy/phy.c include/linux/skbuff.h net/ipv4/tcp.c net/switchdev/switchdev.c Switchdev was a case of RTNH_H_{EXTERNAL --> OFFLOAD} renaming overlapping with net-next changes of various sorts. phy.c was a case of two changes, one adding a local variable to a function whilst the second was removing one. tcp.c overlapped a deadlock fix with the addition of new tcp_info statistic values. macb.c involved the addition of two zyncq device entries. skbuff.h involved adding back ipv4_daddr to nf_bridge_info whilst net-next changes put two other existing members of that struct into a union. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-16rhashtable: Add cap on number of elements in hash tableHerbert Xu1-0/+11
We currently have no limit on the number of elements in a hash table. This is a problem because some users (tipc) set a ceiling on the maximum table size and when that is reached the hash table may degenerate. Others may encounter OOM when growing and if we allow insertions when that happens the hash table perofrmance may also suffer. This patch adds a new paramater insecure_max_entries which becomes the cap on the table. If unset it defaults to max_size * 2. If it is also zero it means that there is no cap on the number of elements in the table. However, the table will grow whenever the utilisation hits 100% and if that growth fails, you will get ENOMEM on insertion. As allowing oversubscription is potentially dangerous, the name contains the word insecure. Note that the cap is not a hard limit. This is done for performance reasons as enforcing a hard limit will result in use of atomic ops that are heavier than the ones we currently use. The reasoning is that we're only guarding against a gross over- subscription of the table, rather than a small breach of the limit. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-05rhashtable: Simplify iterator codeThomas Graf1-6/+2
Remove useless obj variable and goto logic. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-22rhashtable: Do not schedule more than one rehash if we can't grow furtherThomas Graf1-2/+2
The current code currently only stops inserting rehashes into the chain when no resizes are currently scheduled. As long as resizes are scheduled and while inserting above the utilization watermark, more and more rehashes will be scheduled. This lead to a perfect DoS storm with thousands of rehashes scheduled which lead to thousands of spinlocks to be taken sequentially. Instead, only allow either a series of resizes or a single rehash. Drop any further rehashes and return -EBUSY. Fixes: ccd57b1bd324 ("rhashtable: Add immediate rehash during insertion") Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-22rhashtable: Schedule async resize when sync realloc failsThomas Graf1-1/+6
When rhashtable_insert_rehash() fails with ENOMEM, this indicates that we can't allocate the necessary memory in the current context but the limits as set by the user would still allow to grow. Thus attempt an async resize in the background where we can allocate using GFP_KERNEL which is more likely to succeed. The insertion itself will still fail to indicate pressure. This fixes a bug where the table would never continue growing once the utilization is above 100%. Fixes: ccd57b1bd324 ("rhashtable: Add immediate rehash during insertion") Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-25rhashtable: provide len to obj_hashfnPatrick McHardy1-1/+1
nftables sets will be converted to use so called setextensions, moving the key to a non-fixed position. To hash it, the obj_hashfn must be used, however it so far doesn't receive the length parameter. Pass the key length to obj_hashfn() and convert existing users. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-24rhashtable: Add rhashtable_free_and_destroy()Thomas Graf1-10/+39
rhashtable_destroy() variant which stops rehashes, iterates over the table and calls a callback to release resources. Avoids need for nft_hash to embed rhashtable internals and allows to get rid of the being_destroyed flag. It also saves a 2nd mutex lock upon destruction. Also fixes an RCU lockdep splash on nft set destruction due to calling rht_for_each_entry_safe() without holding bucket locks. Open code this loop as we need know that no mutations may occur in parallel. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24rhashtable: Disable automatic shrinking by defaultThomas Graf1-1/+1
Introduce a new bool automatic_shrinking to require the user to explicitly opt-in to automatic shrinking of tables. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24rhashtable: Use 'unsigned int' consistentlyThomas Graf1-8/+10
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24rhashtable: Add comment on choice of elasticity valueHerbert Xu1-0/+12
This patch adds a comment on the choice of the value 16 as the maximum chain length before we force a rehash. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23rhashtable: Fix sleeping inside RCU critical section in walk_stopHerbert Xu1-2/+5
The commit 963ecbd41a1026d99ec7537c050867428c397b89 ("rhashtable: Fix use-after-free in rhashtable_walk_stop") fixed a real bug but created another one because we may end up sleeping inside an RCU critical section. This patch fixes it properly by replacing the mutex with a spin lock that specifically protects the walker lists. Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23rhashtable: Add immediate rehash during insertionHerbert Xu1-1/+59
This patch reintroduces immediate rehash during insertion. If we find during insertion that the table is full or the chain length exceeds a set limit (currently 16 but may be disabled with insecure_elasticity) then we will force an immediate rehash. The rehash will contain an expansion if the table utilisation exceeds 75%. If this rehash fails then the insertion will fail. Otherwise the insertion will be reattempted in the new hash table. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23rhashtable: Allow GFP_ATOMIC bucket table allocationHerbert Xu1-11/+15
This patch adds the ability to allocate bucket table with GFP_ATOMIC instead of GFP_KERNEL. This is needed when we perform an immediate rehash during insertion. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23rhashtable: Add multiple rehash supportHerbert Xu1-15/+72
This patch adds the missing bits to allow multiple rehashes. The read-side as well as remove already handle this correctly. So it's only the rehasher and insertion that need modification to handle this. Note that this patch doesn't actually enable it so for now rehashing is still only performed by the worker thread. This patch also disables the explicit expand/shrink interface because the table is meant to expand and shrink automatically, and continuing to export these interfaces unnecessarily complicates the life of the rehasher since the rehash process is now composed of two parts. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23rhashtable: Shrink to fitHerbert Xu1-3/+10
This patch changes rhashtable_shrink to shrink to the smallest size possible rather than halving the table. This is needed because with multiple rehashing we will defer shrinking until all other rehashing is done, meaning that when we do shrink we may be able to shrink a lot. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23rhashtable: Allow hashfn to be unsetHerbert Xu1-1/+16
Since every current rhashtable user uses jhash as their hash function, the fact that jhash is an inline function causes each user to generate a copy of its code. This function provides a solution to this problem by allowing hashfn to be unset. In which case rhashtable will automatically set it to jhash. Furthermore, if the key length is a multiple of 4, we will switch over to jhash2. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23rhashtable: Add barrier to ensure we see new tables in walkerHerbert Xu1-0/+3
The walker is a lockless reader so it too needs an smp_rmb before reading the future_tbl field in order to see any new tables that may contain elements that we should have walked over. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20rhashtable: Rip out obsolete out-of-line interfaceHerbert Xu1-284/+0
Now that all rhashtable users have been converted over to the inline interface, this patch removes the unused out-of-line interface. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20rhashtable: Allow hash/comparison functions to be inlinedHerbert Xu1-113/+50
This patch deals with the complaint that we make indirect function calls on the fast paths unnecessarily in rhashtable. We resolve it by moving the fast paths into inline functions that take struct rhashtable_param (which obviously must be the same set of parameters supplied to rhashtable_init) as an argument. The only remaining indirect call is to obj_hashfn (or key_hashfn it obj_hashfn is unset) on the rehash as well as the insert-during- rehash slow path. This patch also extends the support of vairable-length keys to include those where the key is fixed but scattered in the object. For example, in netlink we want to key off the namespace and the portid but they're not next to each other. This patch does this by directly using the object hash function as the indicator of whether the key is accessible or not. It also adds a new function obj_cmpfn to compare a key against an object. This means that the caller no longer needs to supply explicit compare functions. All this is done in a backwards compatible manner so no existing users are affected until they convert to the new interface. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20rhashtable: Make rhashtable_init params argument constHerbert Xu1-3/+4
This patch marks the rhashtable_init params argument const as there is no reason to modify it since we will always make a copy of it in the rhashtable. This patch also fixes a bug where we don't actually round up the value of min_size unless it is less than HASH_MIN_SIZE. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-19rhashtable: Round up/down min/max_size to ensure we respect limitThomas Graf1-2/+8
Round up min_size respectively round down max_size to the next power of two to make sure we always respect the limit specified by the user. This is required because we compare the table size against the limit before we expand or shrink. Also fixes a minor bug where we modified min_size in the params provided instead of the copy stored in struct rhashtable. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18rhashtable: Remove max_shift and min_shiftHerbert Xu1-6/+1
Now that nobody uses max_shift and min_shift, we can safely remove them. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18rhashtable: Introduce max_size/min_sizeHerbert Xu1-4/+8
This patch adds the parameters max_size and min_size which are meant to replace max_shift and min_shift. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18rhashtable: Remove shift from bucket_tableHerbert Xu1-3/+2
Keeping both size and shift is silly. We only need one. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-16rhashtable: Avoid calculating hash again to unlockThomas Graf1-6/+5
Caching the lock pointer avoids having to hash on the object again to unlock the bucket locks. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-16rhashtable: Annotate RCU locking of walkersThomas Graf1-0/+2
Fixes the following sparse warnings: lib/rhashtable.c:767:5: warning: context imbalance in 'rhashtable_walk_start' - wrong count at exit lib/rhashtable.c:849:6: warning: context imbalance in 'rhashtable_walk_stop' - unexpected unlock Fixes: f2dba9c6ff0d ("rhashtable: Introduce rhashtable_walk_*") Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-15rhashtable: Fix rhashtable_remove failuresHerbert Xu1-10/+7
The commit 9d901bc05153bbf33b5da2cd6266865e531f0545 ("rhashtable: Free bucket tables asynchronously after rehash") causes gratuitous failures in rhashtable_remove. The reason is that it inadvertently introduced multiple rehashing from the perspective of readers. IOW it is now possible to see more than two tables during a single RCU critical section. Fortunately the other reader rhashtable_lookup already deals with this correctly thanks to c4db8848af6af92f90462258603be844baeab44d ("rhashtable: rhashtable: Move future_tbl into struct bucket_table") so only rhashtable_remove is broken by this change. This patch fixes this by looping over every table from the first one to the last or until we find the element that we were trying to delete. Incidentally the simple test for detecting rehashing to prevent starting another shrinking no longer works. Since it isn't needed anyway (the work queue and the mutex serves as a natural barrier to unnecessary rehashes) I've simply killed the test. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-15rhashtable: Fix use-after-free in rhashtable_walk_stopHerbert Xu1-3/+4
The commit c4db8848af6af92f90462258603be844baeab44d ("rhashtable: Move future_tbl into struct bucket_table") introduced a use-after- free bug in rhashtable_walk_stop because it dereferences tbl after droping the RCU read lock. This patch fixes it by moving the RCU read unlock down to the bottom of rhashtable_walk_stop. In fact this was how I had it originally but it got dropped while rearranging patches because this one depended on the async freeing of bucket_table. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-15rhashtable: Move future_tbl into struct bucket_tableHerbert Xu1-16/+11
This patch moves future_tbl to open up the possibility of having multiple rehashes on the same table. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-15rhashtable: Add rehash counter to bucket_tableHerbert Xu1-0/+1
This patch adds a rehash counter to bucket_table to indicate the last bucket that has been rehashed. This serves two purposes: 1. Any bucket that has been rehashed can never gain a new object. 2. If the rehash counter reaches the size of the table, the table will forever remain empty. This patch also downsizes bucket_table->size to an unsigned int since we do not support sizes greater than 32 bits yet. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-15rhashtable: Free bucket tables asynchronously after rehashHerbert Xu1-3/+6
There is in fact no need to wait for an RCU grace period in the rehash function, since all insertions are guaranteed to go into the new table through spin locks. This patch uses call_rcu to free the old/rehashed table at our leisure. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-15rhashtable: Move seed init into bucket_table_allocHerbert Xu1-10/+6
It seems that I have already made every rehash redo the random seed even though my commit message indicated otherwise :) Since we have already taken that step, this patch goes one step further and moves the seed initialisation into bucket_table_alloc. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-15rhashtable: Use SINGLE_DEPTH_NESTINGHerbert Xu1-7/+2
We only nest one level deep there is no need to roll our own subclasses. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-15rhashtable: Fix walker behaviour during rehashHerbert Xu1-23/+46
Previously whenever the walker encountered a resize it simply snaps back to the beginning and starts again. However, this only works if the rehash started and completed while the walker was idle. If the walker attempts to restart while the rehash is still ongoing, we may miss objects that we shouldn't have. This patch fixes this by making the walker walk the old table followed by the new table just like all other readers. If a rehash is detected we will still signal our caller of the fact so they can prepare for duplicates but we will simply continue the walk onto the new table after the old one is finished either by us or by the rehasher. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12rhashtable: Fix read-side crash during rehashHerbert Xu1-1/+1
This patch fixes a typo rhashtable_lookup_compare where we fail to recompute the hash when looking up the new table. This causes elements to be missed and potentially a crash during a resize. Reported-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12rhashtable: kill ht->shift atomic operationsDaniel Borkmann1-30/+25
Commit c0c09bfdc415 ("rhashtable: avoid unnecessary wakeup for worker queue") changed ht->shift to be atomic, which is actually unnecessary. Instead of leaving the current shift in the core rhashtable structure, it can be cached inside the individual bucket tables. There, it will only be initialized once during a new table allocation in the shrink/expansion slow path, and from then onward it stays immutable for the rest of the bucket table liftime. That allows shift to be non-atomic. The patch also moves hash_rnd management into the table setup. The rhashtable structure now consumes 3 instead of 4 cachelines. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Ying Xue <ying.xue@windriver.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12rhashtable: Fix reader/rehash raceHerbert Xu1-0/+6
There is a potential race condition between readers and the rehasher. In particular, the rehasher could have started a rehash while the reader finishes a scan of the old table but fails to see the new table pointer. This patch closes this window by adding smp_wmb/smp_rmb. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12rhashtable: Remove obj_raw_hashfnHerbert Xu1-18/+7
Now that the only caller of obj_raw_hashfn is head_hashfn, we can simply kill it and fold it into the latter. This patch also moves the common shift from head_hashfn/key_hashfn into rht_bucket_index. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12rhashtable: Remove key length argument to key_hashfnHerbert Xu1-3/+4
key_hashfn has only one caller and it doesn't really need to supply the key length as an extra parameter. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12rhashtable: Use head_hashfn instead of obj_raw_hashfnHerbert Xu1-7/+5
Now that we don't have cross-table hashes, we no longer need to keep the entire hash value so all users of obj_raw_hashfn can use head_hashfn instead. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12rhashtable: Move masking back into key_hashfnHerbert Xu1-2/+3
This patch reverts commit c88455ce50ae4224d84960ce2baa53e61580df27 ("rhashtable: key_hashfn() must return full hash value") because the only user of it always masks the hash value. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11rhashtable: Add annotation to nested lockHerbert Xu1-2/+2
Commit aa34a6cb0478842452bac58edb50d3ef9e178c92 ("rhashtable: Add arbitrary rehash function") killed the annotation on the nested lock which leads to bitching from lockdep. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11rhashtable: Add arbitrary rehash functionHerbert Xu1-332/+174
This patch adds a rehash function that supports the use of any hash function for the new table. This is needed to support changing the random seed value during the lifetime of the hash table. However for now the random seed value is still constant and the rehash function is simply used to replace the existing expand/shrink functions. [ ASSERT_BUCKET_LOCK() and thus debug_dump_table() + debug_dump_buckets() are not longer used, so delete them entirely. -DaveM ] Signed-off-by: Herbert Xu <herbert.xu@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11rhashtable: Move hash_rnd into bucket_tableHerbert Xu1-9/+15
Currently hash_rnd is a parameter that users can set. However, no existing users set this parameter. It is also something that people are unlikely to want to set directly since it's just a random number. In preparation for allowing the reseeding/rehashing of rhashtable, this patch moves hash_rnd into bucket_table so that it's now an internal state rather than a parameter. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-27rhashtable: use cond_resched()Eric Dumazet1-0/+4
If a hash table has 128 slots and 16384 elems, expand to 256 slots takes more than one second. For larger sets, a soft lockup is detected. Holding cpu for that long, even in a work queue is a show stopper for non preemptable kernels. cond_resched() at strategic points to allow process scheduler to reschedule us. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-27rhashtable: remove indirection for grow/shrink decision functionsDaniel Borkmann1-39/+17
Currently, all real users of rhashtable default their grow and shrink decision functions to rht_grow_above_75() and rht_shrink_below_30(), so that there's currently no need to have this explicitly selectable. It can/should be generic and private inside rhashtable until a real use case pops up. Since we can make this private, we'll save us this additional indirection layer and can improve insertion/deletion time as well. Reference: http://patchwork.ozlabs.org/patch/443040/ Suggested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-27rhashtable: unconditionally grow when max_shift is not specifiedDaniel Borkmann1-1/+1
While commit c0c09bfdc415 ("rhashtable: avoid unnecessary wakeup for worker queue") rightfully moved part of the decision making of whether we should expand or shrink from the expand/shrink functions themselves into insert/delete functions in order to avoid unnecessary worker wake-ups, it however introduced a regression by doing so. Before that change, if no max_shift was specified (= 0) on rhashtable initialization, rhashtable_expand() would just grow unconditionally and lets the available memory be the limiting factor. After that change, if no max_shift was specified, there would be _no_ expansion step at all. Given that netlink and tipc have a max_shift specified, it was not visible there, but Josh Hunt reported that if nft that starts out with a default element hint of 3 if not otherwise provided, would slow i.e. inserts down trememdously as it cannot grow larger to relax table occupancy. Given that the test case verifies shrinks/expands manually, we also must remove pointer to the helper functions to explicitly avoid parallel resizing on insertions/deletions. test_bucket_stats() and test_rht_lookup() could also be wrapped around rhashtable mutex to explicitly synchronize a walk from resizing, but I think that defeats the actual test case which intended to have explicit test steps, i.e. 1) inserts, 2) expands, 3) shrinks, 4) deletions, with object verification after each stage. Reported-by: Josh Hunt <johunt@akamai.com> Fixes: c0c09bfdc415 ("rhashtable: avoid unnecessary wakeup for worker queue") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Ying Xue <ying.xue@windriver.com> Cc: Josh Hunt <johunt@akamai.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>