Skip to content

Fix potential memory leak in cpu enumeration on Linux #2008

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 10, 2019
Merged

Fix potential memory leak in cpu enumeration on Linux #2008

merged 4 commits into from
Feb 10, 2019

Conversation

martin-frbg
Copy link
Collaborator

An early return after a failed call to sched_getaffinity would leak the previously allocated cpu_set_t. Wrong calculation of the size argument in that call increased the likelyhood of that failure. Fixes #2003

An early return after a failed call to sched_getaffinity would leak the previously allocated cpu_set_t. Wrong calculation of the size argument in that call increased the likelyhood of that failure. Fixes #2003
Copy link

@jbrandmeyer jbrandmeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beware that cpusetp has a pointer type. I think you really want sizeof(*cpusetp)

@martin-frbg
Copy link
Collaborator Author

Beware that cpusetp has a pointer type. I think you really want sizeof(*cpusetp)

I must admit that I am a bit confused about this. Size of the type evaluates to 512 on a mere dualcore laptop and using that it bombs out with a memory management error in an unrelated free() later.

@jbrandmeyer
Copy link

jbrandmeyer commented Feb 7, 2019

At this point, I'm confused, too. I spent some time reading through the glibc headers and sources.

The cpuset macros are clearly operating by allocating a number of unsigned long. CPU_ALLOC(n) does not necessarily return a whole cpu_set_t, despite the return type being a cpu_set_t *.

sched_getaffinity() is just calling into the kernel. From sysdeps/unix/sysv/linux/sched_getaffinity.c, it looks like the kernel is returning the number of bytes written into the cpuset output argument. Then, the syscall wrapper clears any additional memory, according to the size passed in by the caller (ie, openblas).

That suggests that the right argument to pass to sched_getaffinity is the result of calling CPU_ALLOC_SIZE(), and not sizeof(cpu_set_t). It just so happens that CPU_ALLOC_SIZE(n) for n <= 64 is also sizeof(intptr_t), so everything appears to work on commodity hardware.

My suggestion is to just add the CPU_FREE in the error-handling path for the moment. I suspect that its actually the kernel that's buggy in my case when the size is 8. I'm going source-diving to see if I can figure it out in their end.

@jbrandmeyer
Copy link

FWIW, tools/testing/selftests/rseq/param_test.c from the kernel sources always allocates a whole cpu_set_t on the stack, rather than use dynamic allocation.

@martin-frbg
Copy link
Collaborator Author

The actual error appears to be in my declaration of cpusetp as a cpu_set_t* which is appropriate for dynamic allocation but pushed an uninitalized struct onto sched_getaffinity otherwise. Not sure yet why CPU_ALLOC returned a wrong value for you, but perhaps this should not be called when the number of processors is well below 1024 (although some documentation suggests otherwise)

@martin-frbg martin-frbg changed the title Fix potential memory leak in cpu enumeration with glibc Fix potential memory leak in cpu enumeration on Linux Feb 8, 2019
@martin-frbg
Copy link
Collaborator Author

@jbrandmeyer could you try the lastest iteration please, just to see if sched_getaffinity is still failing with the cpu set returned by CPU_ALLOC ?

@martin-frbg
Copy link
Collaborator Author

Committing after testing on Ryzen2-2700 with and without taskset.

@martin-frbg martin-frbg merged commit 03a2bf2 into OpenMathLib:develop Feb 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants