Skip to content

Commit 511c2ab

Browse files
Lai Jiangshantorvalds
Lai Jiangshan
authored andcommitted
mm, memory-hotplug: dynamic configure movable memory and portion memory
Add online_movable and online_kernel for logic memory hotplug. This is the dynamic version of "movablecore" & "kernelcore". We have the same reason to introduce it as to introduce "movablecore" & "kernelcore". It has the same motive as "movablecore" & "kernelcore", but it is dynamic/running-time: o We can configure memory as kernelcore or movablecore after boot. Userspace workload is increased, we need more hugepage, we can't use "online_movable" to add memory and allow the system use more THP(transparent-huge-page), vice-verse when kernel workload is increase. Also help for virtualization to dynamic configure host/guest's memory, to save/(reduce waste) memory. Memory capacity on Demand o When a new node is physically online after boot, we need to use "online_movable" or "online_kernel" to configure/portion it as we expected when we logic-online it. This configuration also helps for physically-memory-migrate. o all benefit as the same as existed "movablecore" & "kernelcore". o Preparing for movable-node, which is very important for power-saving, hardware partitioning and high-available-system(hardware fault management). (Note, we don't introduce movable-node here.) Action behavior: When a memoryblock/memorysection is onlined by "online_movable", the kernel will not have directly reference to the page of the memoryblock, thus we can remove that memory any time when needed. When it is online by "online_kernel", the kernel can use it. When it is online by "online", the zone type doesn't changed. Current constraints: Only the memoryblock which is adjacent to the ZONE_MOVABLE can be online from ZONE_NORMAL to ZONE_MOVABLE. [[email protected]: use min_t, cleanups] Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Wen Congyang <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Jiang Liu <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Mel Gorman <[email protected]> Cc: David Rientjes <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Greg KH <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
1 parent fcf07d2 commit 511c2ab

File tree

4 files changed

+146
-14
lines changed

4 files changed

+146
-14
lines changed

Documentation/memory-hotplug.txt

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -161,7 +161,8 @@ a recent addition and not present on older kernels.
161161
in the memory block.
162162
'state' : read-write
163163
at read: contains online/offline state of memory.
164-
at write: user can specify "online", "offline" command
164+
at write: user can specify "online_kernel",
165+
"online_movable", "online", "offline" command
165166
which will be performed on al sections in the block.
166167
'phys_device' : read-only: designed to show the name of physical memory
167168
device. This is not well implemented now.
@@ -255,6 +256,17 @@ For onlining, you have to write "online" to the section's state file as:
255256

256257
% echo online > /sys/devices/system/memory/memoryXXX/state
257258

259+
This onlining will not change the ZONE type of the target memory section,
260+
If the memory section is in ZONE_NORMAL, you can change it to ZONE_MOVABLE:
261+
262+
% echo online_movable > /sys/devices/system/memory/memoryXXX/state
263+
(NOTE: current limit: this memory section must be adjacent to ZONE_MOVABLE)
264+
265+
And if the memory section is in ZONE_MOVABLE, you can change it to ZONE_NORMAL:
266+
267+
% echo online_kernel > /sys/devices/system/memory/memoryXXX/state
268+
(NOTE: current limit: this memory section must be adjacent to ZONE_NORMAL)
269+
258270
After this, section memoryXXX's state will be 'online' and the amount of
259271
available memory will be increased.
260272

drivers/base/memory.c

Lines changed: 22 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -254,7 +254,7 @@ static bool pages_correctly_reserved(unsigned long start_pfn,
254254
* OK to have direct references to sparsemem variables in here.
255255
*/
256256
static int
257-
memory_block_action(unsigned long phys_index, unsigned long action)
257+
memory_block_action(unsigned long phys_index, unsigned long action, int online_type)
258258
{
259259
unsigned long start_pfn;
260260
unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
@@ -269,7 +269,7 @@ memory_block_action(unsigned long phys_index, unsigned long action)
269269
if (!pages_correctly_reserved(start_pfn, nr_pages))
270270
return -EBUSY;
271271

272-
ret = online_pages(start_pfn, nr_pages);
272+
ret = online_pages(start_pfn, nr_pages, online_type);
273273
break;
274274
case MEM_OFFLINE:
275275
ret = offline_pages(start_pfn, nr_pages);
@@ -284,7 +284,8 @@ memory_block_action(unsigned long phys_index, unsigned long action)
284284
}
285285

286286
static int __memory_block_change_state(struct memory_block *mem,
287-
unsigned long to_state, unsigned long from_state_req)
287+
unsigned long to_state, unsigned long from_state_req,
288+
int online_type)
288289
{
289290
int ret = 0;
290291

@@ -296,7 +297,7 @@ static int __memory_block_change_state(struct memory_block *mem,
296297
if (to_state == MEM_OFFLINE)
297298
mem->state = MEM_GOING_OFFLINE;
298299

299-
ret = memory_block_action(mem->start_section_nr, to_state);
300+
ret = memory_block_action(mem->start_section_nr, to_state, online_type);
300301

301302
if (ret) {
302303
mem->state = from_state_req;
@@ -319,12 +320,14 @@ static int __memory_block_change_state(struct memory_block *mem,
319320
}
320321

321322
static int memory_block_change_state(struct memory_block *mem,
322-
unsigned long to_state, unsigned long from_state_req)
323+
unsigned long to_state, unsigned long from_state_req,
324+
int online_type)
323325
{
324326
int ret;
325327

326328
mutex_lock(&mem->state_mutex);
327-
ret = __memory_block_change_state(mem, to_state, from_state_req);
329+
ret = __memory_block_change_state(mem, to_state, from_state_req,
330+
online_type);
328331
mutex_unlock(&mem->state_mutex);
329332

330333
return ret;
@@ -338,10 +341,18 @@ store_mem_state(struct device *dev,
338341

339342
mem = container_of(dev, struct memory_block, dev);
340343

341-
if (!strncmp(buf, "online", min((int)count, 6)))
342-
ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE);
343-
else if(!strncmp(buf, "offline", min((int)count, 7)))
344-
ret = memory_block_change_state(mem, MEM_OFFLINE, MEM_ONLINE);
344+
if (!strncmp(buf, "online_kernel", min_t(int, count, 13)))
345+
ret = memory_block_change_state(mem, MEM_ONLINE,
346+
MEM_OFFLINE, ONLINE_KERNEL);
347+
else if (!strncmp(buf, "online_movable", min_t(int, count, 14)))
348+
ret = memory_block_change_state(mem, MEM_ONLINE,
349+
MEM_OFFLINE, ONLINE_MOVABLE);
350+
else if (!strncmp(buf, "online", min_t(int, count, 6)))
351+
ret = memory_block_change_state(mem, MEM_ONLINE,
352+
MEM_OFFLINE, ONLINE_KEEP);
353+
else if(!strncmp(buf, "offline", min_t(int, count, 7)))
354+
ret = memory_block_change_state(mem, MEM_OFFLINE,
355+
MEM_ONLINE, -1);
345356

346357
if (ret)
347358
return ret;
@@ -676,7 +687,7 @@ int offline_memory_block(struct memory_block *mem)
676687

677688
mutex_lock(&mem->state_mutex);
678689
if (mem->state != MEM_OFFLINE)
679-
ret = __memory_block_change_state(mem, MEM_OFFLINE, MEM_ONLINE);
690+
ret = __memory_block_change_state(mem, MEM_OFFLINE, MEM_ONLINE, -1);
680691
mutex_unlock(&mem->state_mutex);
681692

682693
return ret;

include/linux/memory_hotplug.h

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,13 @@ enum {
2626
MEMORY_HOTPLUG_MAX_BOOTMEM_TYPE = NODE_INFO,
2727
};
2828

29+
/* Types for control the zone type of onlined memory */
30+
enum {
31+
ONLINE_KEEP,
32+
ONLINE_KERNEL,
33+
ONLINE_MOVABLE,
34+
};
35+
2936
/*
3037
* pgdat resizing functions
3138
*/
@@ -46,6 +53,10 @@ void pgdat_resize_init(struct pglist_data *pgdat)
4653
}
4754
/*
4855
* Zone resizing functions
56+
*
57+
* Note: any attempt to resize a zone should has pgdat_resize_lock()
58+
* zone_span_writelock() both held. This ensure the size of a zone
59+
* can't be changed while pgdat_resize_lock() held.
4960
*/
5061
static inline unsigned zone_span_seqbegin(struct zone *zone)
5162
{
@@ -71,7 +82,7 @@ extern int zone_grow_free_lists(struct zone *zone, unsigned long new_nr_pages);
7182
extern int zone_grow_waitqueues(struct zone *zone, unsigned long nr_pages);
7283
extern int add_one_highpage(struct page *page, int pfn, int bad_ppro);
7384
/* VM interface that may be used by firmware interface */
74-
extern int online_pages(unsigned long, unsigned long);
85+
extern int online_pages(unsigned long, unsigned long, int);
7586
extern void __offline_isolated_pages(unsigned long, unsigned long);
7687

7788
typedef void (*online_page_callback_t)(struct page *page);

mm/memory_hotplug.c

Lines changed: 99 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -214,6 +214,88 @@ static void grow_zone_span(struct zone *zone, unsigned long start_pfn,
214214
zone_span_writeunlock(zone);
215215
}
216216

217+
static void resize_zone(struct zone *zone, unsigned long start_pfn,
218+
unsigned long end_pfn)
219+
{
220+
zone_span_writelock(zone);
221+
222+
zone->zone_start_pfn = start_pfn;
223+
zone->spanned_pages = end_pfn - start_pfn;
224+
225+
zone_span_writeunlock(zone);
226+
}
227+
228+
static void fix_zone_id(struct zone *zone, unsigned long start_pfn,
229+
unsigned long end_pfn)
230+
{
231+
enum zone_type zid = zone_idx(zone);
232+
int nid = zone->zone_pgdat->node_id;
233+
unsigned long pfn;
234+
235+
for (pfn = start_pfn; pfn < end_pfn; pfn++)
236+
set_page_links(pfn_to_page(pfn), zid, nid, pfn);
237+
}
238+
239+
static int move_pfn_range_left(struct zone *z1, struct zone *z2,
240+
unsigned long start_pfn, unsigned long end_pfn)
241+
{
242+
unsigned long flags;
243+
244+
pgdat_resize_lock(z1->zone_pgdat, &flags);
245+
246+
/* can't move pfns which are higher than @z2 */
247+
if (end_pfn > z2->zone_start_pfn + z2->spanned_pages)
248+
goto out_fail;
249+
/* the move out part mast at the left most of @z2 */
250+
if (start_pfn > z2->zone_start_pfn)
251+
goto out_fail;
252+
/* must included/overlap */
253+
if (end_pfn <= z2->zone_start_pfn)
254+
goto out_fail;
255+
256+
resize_zone(z1, z1->zone_start_pfn, end_pfn);
257+
resize_zone(z2, end_pfn, z2->zone_start_pfn + z2->spanned_pages);
258+
259+
pgdat_resize_unlock(z1->zone_pgdat, &flags);
260+
261+
fix_zone_id(z1, start_pfn, end_pfn);
262+
263+
return 0;
264+
out_fail:
265+
pgdat_resize_unlock(z1->zone_pgdat, &flags);
266+
return -1;
267+
}
268+
269+
static int move_pfn_range_right(struct zone *z1, struct zone *z2,
270+
unsigned long start_pfn, unsigned long end_pfn)
271+
{
272+
unsigned long flags;
273+
274+
pgdat_resize_lock(z1->zone_pgdat, &flags);
275+
276+
/* can't move pfns which are lower than @z1 */
277+
if (z1->zone_start_pfn > start_pfn)
278+
goto out_fail;
279+
/* the move out part mast at the right most of @z1 */
280+
if (z1->zone_start_pfn + z1->spanned_pages > end_pfn)
281+
goto out_fail;
282+
/* must included/overlap */
283+
if (start_pfn >= z1->zone_start_pfn + z1->spanned_pages)
284+
goto out_fail;
285+
286+
resize_zone(z1, z1->zone_start_pfn, start_pfn);
287+
resize_zone(z2, start_pfn, z2->zone_start_pfn + z2->spanned_pages);
288+
289+
pgdat_resize_unlock(z1->zone_pgdat, &flags);
290+
291+
fix_zone_id(z2, start_pfn, end_pfn);
292+
293+
return 0;
294+
out_fail:
295+
pgdat_resize_unlock(z1->zone_pgdat, &flags);
296+
return -1;
297+
}
298+
217299
static void grow_pgdat_span(struct pglist_data *pgdat, unsigned long start_pfn,
218300
unsigned long end_pfn)
219301
{
@@ -508,7 +590,7 @@ static void node_states_set_node(int node, struct memory_notify *arg)
508590
}
509591

510592

511-
int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
593+
int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_type)
512594
{
513595
unsigned long onlined_pages = 0;
514596
struct zone *zone;
@@ -525,6 +607,22 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
525607
*/
526608
zone = page_zone(pfn_to_page(pfn));
527609

610+
if (online_type == ONLINE_KERNEL && zone_idx(zone) == ZONE_MOVABLE) {
611+
if (move_pfn_range_left(zone - 1, zone, pfn, pfn + nr_pages)) {
612+
unlock_memory_hotplug();
613+
return -1;
614+
}
615+
}
616+
if (online_type == ONLINE_MOVABLE && zone_idx(zone) == ZONE_MOVABLE - 1) {
617+
if (move_pfn_range_right(zone, zone + 1, pfn, pfn + nr_pages)) {
618+
unlock_memory_hotplug();
619+
return -1;
620+
}
621+
}
622+
623+
/* Previous code may changed the zone of the pfn range */
624+
zone = page_zone(pfn_to_page(pfn));
625+
528626
arg.start_pfn = pfn;
529627
arg.nr_pages = nr_pages;
530628
node_states_check_changes_online(nr_pages, zone, &arg);

0 commit comments

Comments
 (0)