-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runtime doesn't utilize available Ram and crashes #102332
Comments
Thanks @talweiss1982 for reporting the issue. Do you have any specific GC config set for your application? Assume this isnt a regression from a previous release? |
I have tried a few settings. Currently I'm trying to set heap count to 512
I have the gc server mode configured I have the no affiliate set I tried
setting the maximum allowed memory to 3tb I have tried configuring it to
low latency durible mode (cancel gen2 and loh compaction). I tried
configuring hugepage mode it crashes the runtime. I have tried every
combination of those settings the gc heap size don't grow causing oom.
…On Thu, May 16, 2024, 21:36 Manish Godse ***@***.***> wrote:
Thanks @talweiss1982 <https://github.com/talweiss1982> for reporting the
issue. Do you have any specific GC config set for your application? Assume
this isnt a regression from a previous release?
—
Reply to this email directly, view it on GitHub
<#102332 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACA6CVMQKEAFPSFME5HEHKTZCT4EDAVCNFSM6AAAAABH2W33DGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJVHE2DMOBTGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
According to dotnet-counters my heap fragmentation is below 1% so it's not
fragmentation of the loh. I have a fixed amount of pinned object due to
nlogn and dotnet configuration object I don't produce ones myself. I don't
allocate unmanaged memory I don't pin object.
…On Thu, May 16, 2024, 21:53 Tal Weiss ***@***.***> wrote:
I have tried a few settings. Currently I'm trying to set heap count to 512
I have the gc server mode configured I have the no affiliate set I tried
setting the maximum allowed memory to 3tb I have tried configuring it to
low latency durible mode (cancel gen2 and loh compaction). I tried
configuring hugepage mode it crashes the runtime. I have tried every
combination of those settings the gc heap size don't grow causing oom.
On Thu, May 16, 2024, 21:36 Manish Godse ***@***.***> wrote:
> Thanks @talweiss1982 <https://github.com/talweiss1982> for reporting the
> issue. Do you have any specific GC config set for your application? Assume
> this isnt a regression from a previous release?
>
> —
> Reply to this email directly, view it on GitHub
> <#102332 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ACA6CVMQKEAFPSFME5HEHKTZCT4EDAVCNFSM6AAAAABH2W33DGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJVHE2DMOBTGY>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
|
can you try setting |
that sets the initial GC reservation size to ~1TB. The config should be looking at the available physical memory to make that determination but wondering if there is something off with that logic. |
Is it possible to do some live debugging together? A crash dump will likely be too big, and traces are unlikely to be useful for solving this crash. |
/offtopic NonBlocking.CDict: https://github.com/VSadov/NonBlocking |
I'll try it on Sunday as my work days are Sunday through Thursday and will
update the result.
…On Thu, May 16, 2024, 22:19 Manish Godse ***@***.***> wrote:
that sets the initial GC reservation size to ~1TB. The config should be
looking at the available physical memory to make that determination but
wondering if there is something off with that logic.
—
Reply to this email directly, view it on GitHub
<#102332 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACA6CVLJ2FKQNKZSRW4QCETZCUBEHAVCNFSM6AAAAABH2W33DGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJWGAYTKMRSGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Sure I belive we can setup time on Monday as I'm on GMT+3 time zone
My work email is ***@***.*** phone:+972-54-8024849
…On Thu, May 16, 2024, 22:21 Andrew Au ***@***.***> wrote:
Is it possible to do some live debugging together? A crash dump will
likely be too big, and traces are unlikely to be useful for solving this
crash.
—
Reply to this email directly, view it on GitHub
<#102332 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACA6CVJRHRE4HXQS4OEVZLTZCUBKDAVCNFSM6AAAAABH2W33DGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJWGAYTONZUGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I'll give it a try, was also thinking of using bloom filter and not putting
object into the dictionary only duplicates and handle them separately this
can avoid having to keep slot of memory but will introduce a 2nd phase for
processing which my vp R&D doesn't like.
…On Fri, May 17, 2024, 00:26 neon-sunset ***@***.***> wrote:
/offtopic
I know this is unrelated but if you are not using Sep and
NonBlocking.ConcurrentDictionary - you might want to try as they will
reduce memory footprint per item (NonBlocking has different but not worse
performance profile compared to out of box CDict but consumes less memory
per entry).
NonBlocking.CDict: https://github.com/VSadov/NonBlocking
Sep: https://github.com/nietras/Sep/
—
Reply to this email directly, view it on GitHub
<#102332 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACA6CVKLKTOX4WAB6EFUDW3ZCUP6XAVCNFSM6AAAAABH2W33DGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJWGIYTOMBWG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
After setting DOTNET_GCRegionRange=10700000000 dotnet-counters: Name Current Value The error: |
sdk is 8.0.105 (remembered something with 5) |
I think that I know what the problem is. Basically, what I think is happening is that you are using a lot of memory regions, not just memory. At some point, the JIT need to emit some code, or allocate more memory, and there aren't enough memory regions for that. See also: |
Will try, thanks Oren. |
@ayende it seems to have helped i'm able to pass 288 GB working set: Name Current Value |
That being said, I would say that this is a runtime bug, in the error reportin |
now it crashed at 1.164 TB, what should be the value if i want to utilize 3 TB? |
The config setting is the hex value in bytes. So you should be able to change it accordingly. |
Thanks, thought it might be a bit flag.
…On Mon, May 20, 2024, 05:01 Manish Godse ***@***.***> wrote:
The config setting is the hex value in bytes. So you should be able to
change it accordingly.
—
Reply to this email directly, view it on GitHub
<#102332 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACA6CVL7ORAVKEKGSCDPJHDZDFKQXAVCNFSM6AAAAABH2W33DGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJZGUZTMOBSGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Do you mean that the error message is not pointing to the source of the problem (which I am going to fix) or something else? |
Yes, it should give a clearer error message. |
i'm actually still getting OOM using this settings: I didn't get to 3TB, i think its the GC allocating large amount of unmanaged memory, when trying to profile localy on a single file i saw the GC allocated 400MB of native memory and the number was growing rapidly Name Current Value |
@talweiss1982 is it still crashing with the "Failed to create RW mapping for RX memory." or in some other way? maybe 524240 is still too low for your case. |
crashes with Out of memory, i think that the GC is allocating quite a bit of native memory, how can i monitor the amount of memory the GC itself is allocating ? |
You can use the CommittedUsage event here. The event will give you a breakdown on the large chunks of native memory the GC acquired from the OS (which may or may not be used by managed objects yet). If you collect a GCCollectOnly trace, you should see this show up on the events view of PerfView. If you get the OOM trapped in a debugger, you can also use the MAddress command to see a breakdown of all the native memory used. |
@cshung, I assume the CommittedUsage doesn't include the card tables and similar GC bookkeeping stuff, right? I guess those might be quite large for such a large heap size. |
This time the OOM was due to me passing the max capacity of the dictionary, i think I'm good, was able to load 2.4 TB of data. |
Ok good to know that you have been able to get things to work. Can you please confirm the config settings you had to apply to get ~3TB of working set? We will need to look into why the GC isnt defaulting to the higher region range. Thx! |
@mangod9 is right, you shouldn't have to set the Can we take a look at what happened around here - this code should be run exactly once during process startup. It should be set automatically by the formula there. It must be the case something is wrong there if a manual configuration helped with the situation. |
@talweiss1982, it has been a long while, is this issue ready to be closed? |
Believe we need to investigate why on machines with large available memory, setting the RegionRange is required. |
@talweiss1982, we are trying to investigate why you need to set an extra config. Can you please provide how you setup hugepages like you have described in your original post. |
This issue has been marked |
This issue has been automatically marked |
This issue will now be closed since it had been marked |
Description
Hi,
I'm trying to load a huge amount of data on a single dotnet process using a machine with 4 tb of RAM ~128 cpu with ubuntu 22.04 using latest dotnet 8 sdk (updated today I think it's 8.0.5). The code reads 464 csv files each 3gb long it reads it line by line parsing each row and generating an object out of it. The object is then placed into one out of 5 concurrent dictionaries according to its type, each is initialized with a 500 million capacity which is enough so it won't need to increase. The dictionaries are there to remove duplicates and merge their data, the dedup objects are than written to csv files. I don't go over the 2 billion object limit of the dictionary. I don't read the entire file or keep any file in memory only their buffers which have the default size of 4kb and I don't process more than 10 files in parallel. Yet the machine never utilize more than 1.5 tb of memory. I investigated the Ubuntu configuration and it doesn't seem to have any limitations configured for a process. Moreover I tried running 4 of those processes and I didn't get above 1.8 tb. I checked the dotnet-counters and the gc heap size don't seem to grow above 300gb. I'm using server gc, tried setting heap count but it seems to be capped at 128 and I can only reduce it but setting it to a higher value is been ignored. I have configured 1 tb of big pages 2mb size and set up the dotnet flag to use huge page as I read they will be used when the gc got to its hard limit, as a result dotnet crashes with a heap initialization error I can't even build my project as the dotnet build also crashes with the same error. I tried running with the low latency for long duration mode of the gc but after processing 90% of the data the gc gen2 is invoked afterwards the process crashes. This means the runtime thinks it reached its limit although there are 2.5 free tb of ram. I also printed the gc available memory and it shows 4tb when the process starts. Tried asking the gc to retain vm memory, it didn't help. Please advise what voodoo I need to practice to make my dotnet process utilize it's God given RAM pool?
Reproduction Steps
Use a 22.04 Ubuntu machine with 4tb of ram with dotnet sdk 8.0.5 read 464 csv files each 3gb of size parse it line by line shove it into a concurrent dictionary, makes sure to split it into enough dictionaries so to not reach 2billion object in a single dictionary. Make sure to set its capacity in advence. Watch as the gc heap size doesn't utilize the whole ram and crashes
Expected behavior
I expect the gc heap to increase in size to utilize all the ram I don't mind gc pauses or slowness
Actual behavior
Gc heap size doesn't seem to grow above 300gb causing gc pressure and random crashes
Regression?
Didn't try on other version
Known Workarounds
I wish
Configuration
Dotnet 8.0.5 running on Ubuntu 22.04 with x64 architecture
Other information
There are two issues 1 gc heap count seem to be capped at 128 and I belive each heap size is capped so I can't get to 4tb...
The other issues is that trying to use huge pages seems to crash the runtime
The text was updated successfully, but these errors were encountered: