-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Description
Description
After upgrading a production environment comprised of a pool of large memory footprint processes running in Docker containers, one container per Ubuntu 20.04 VM host ranging from ~30 GB up to 1TB RAM from .NET 7 with segment-based libcrlgc.so
GC heap (due to #86183) to a default .NET 8 configuration some of the processes experienced sporadic out of memory crashes.
Analysis
A typical pattern before was the process cycling between hitting the available memory limit, then deep gen 2 GC, then again raising to the limit, then gen2 GC, etc.
Under .NET 8 (standard region-based GC) the pattern had changed to almost a straight line ending in out of memory
Reverting to .NET 7 with segment-based libcrlgc.so
GC heap reversed the pattern and the process became stable.
The heap sizes looked similar under both scenarios:
After taking a fill memory dump for both scenarios, the .NET native top object usage looked similarly:
.net 7 (segment-based GC heap)
kilobytes |Object count|Type
23,402,813| 272,323,647|Class1
11,726,220| 63,078|Class2[]
5,024,993| 117,549|Class1[]
3,877,060| 173|$.ValueTuple<Class2,Class1>[]
3,214,924| 22,861,688|Class3
850,206| 13,871|$.Collections.Generic.HashSet+Entry<Class4>[]
698,673| 1,527,108|$.Int32[]
664,435| 28,349,262|Class5
664,193| 11,920,171|$.String
608,850| 3,247,205|Class6
....
9,165| 233,684|Free
60,011,021| 453,466,459|TOTAL
.net 8 (region-based GC heap)
kilobytes |Object count|Type
26,227,185| 305,189,069|Class1
12,293,589| 57,454|Class2[]
3,988,836| 80,069|Class1[]
3,074,227| 21,861,170|Class3
1,613,392| 91|$.ValueTuple<Class2,.Class1>[]
1,500,896| 12,276|$.Collections.Generic.HashSet+Entry<Class4>[]
633,966| 27,049,234|Class5
615,378| 3,282,018|Class6
539,491| 758,663|$.Int32[]
29,343| 23,047|Free
57,974,870| 451,380,242|TOTAL
Scaling up the machine up to 1.5+ its original size
had eliminated the OM crashes but required 1.5+ more CPU cores that in turn costed 1.5+ in dollars for a more expensive cloud infrastructure.
To test the theory that the difference was due to the GC heap mode and not related to the framework version change from .NET 7 to .NET 8, we tried switching from .NET 7 libcrlgc.so
-> .NET 8 default
-> .NET 8 libcrlgc.so
-> .NET 8 DOTNET_GCDynamicAdaptationMode
on another server that had experienced a similar issue. The test confirmed that the pattern was only dependent on the segment libcrlgc.so
vs. region - based heap. Switching to DATAS for region - based heap didn't not affect the pattern.
Configuration
- .NET 7 & 8
- Ubuntu 20.04 x64
Regression?
Feels like one. Could be triggered when a process is already running close to its maximum available memory limit with no space to spare. The region-based GC heap might optimize its activity based on some factors: for instance minimizing GC pauses or busy preserving its pools of memory regions not realizing that there is a bigger issue of insufficient memory at hands that needs to be dealt with urgently.
Note that this is the second critical issue after #97316 that we had experienced with the region-based GC heap mode that needs to be addressed by the team for the new GC mode to deliver on its better proformance promise.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status