Skip to content

Commit 5cedf72

Browse files
Dave ChinnerAl Viro
Dave Chinner
authored and
Al Viro
committed
list_lru: fix broken LRU_RETRY behaviour
The LRU_RETRY code assumes that the list traversal status after we have dropped and regained the list lock. Unfortunately, this is not a valid assumption, and that can lead to racing traversals isolating objects that the other traversal expects to be the next item on the list. This is causing problems with the inode cache shrinker isolation, with races resulting in an inode on a dispose list being "isolated" because a racing traversal still thinks it is on the LRU. The inode is then never reclaimed and that causes hangs if a subsequent lookup on that inode occurs. Fix it by always restarting the list walk on a LRU_RETRY return from the isolate callback. Avoid the possibility of livelocks the current code was trying to avoid by always decrementing the nr_to_walk counter on retries so that even if we keep hitting the same item on the list we'll eventually stop trying to walk and exit out of the situation causing the problem. Reported-by: Michal Hocko <[email protected]> Signed-off-by: Dave Chinner <[email protected]> Cc: Glauber Costa <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
1 parent 3b1d58a commit 5cedf72

File tree

1 file changed

+12
-17
lines changed

1 file changed

+12
-17
lines changed

mm/list_lru.c

Lines changed: 12 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -73,19 +73,19 @@ list_lru_walk_node(struct list_lru *lru, int nid, list_lru_walk_cb isolate,
7373
struct list_lru_node *nlru = &lru->node[nid];
7474
struct list_head *item, *n;
7575
unsigned long isolated = 0;
76-
/*
77-
* If we don't keep state of at which pass we are, we can loop at
78-
* LRU_RETRY, since we have no guarantees that the caller will be able
79-
* to do something other than retry on the next pass. We handle this by
80-
* allowing at most one retry per object. This should not be altered
81-
* by any condition other than LRU_RETRY.
82-
*/
83-
bool first_pass = true;
8476

8577
spin_lock(&nlru->lock);
8678
restart:
8779
list_for_each_safe(item, n, &nlru->list) {
8880
enum lru_status ret;
81+
82+
/*
83+
* decrement nr_to_walk first so that we don't livelock if we
84+
* get stuck on large numbesr of LRU_RETRY items
85+
*/
86+
if (--(*nr_to_walk) == 0)
87+
break;
88+
8989
ret = isolate(item, &nlru->lock, cb_arg);
9090
switch (ret) {
9191
case LRU_REMOVED:
@@ -100,19 +100,14 @@ list_lru_walk_node(struct list_lru *lru, int nid, list_lru_walk_cb isolate,
100100
case LRU_SKIP:
101101
break;
102102
case LRU_RETRY:
103-
if (!first_pass) {
104-
first_pass = true;
105-
break;
106-
}
107-
first_pass = false;
103+
/*
104+
* The lru lock has been dropped, our list traversal is
105+
* now invalid and so we have to restart from scratch.
106+
*/
108107
goto restart;
109108
default:
110109
BUG();
111110
}
112-
113-
if ((*nr_to_walk)-- == 0)
114-
break;
115-
116111
}
117112

118113
spin_unlock(&nlru->lock);

0 commit comments

Comments
 (0)