-
Notifications
You must be signed in to change notification settings - Fork 11.7k
w64devkit build segfaults at 0xFFFFFFFFFFFFFFFF #2922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Gdb decided to play nice with me for once, and I got this ( this is after i added debug fprintfs described below ): I added a debug fprintf here:
And in the disassembly, segfault happens after fflush. Note that And after I haven't played with x86 assembly since uni, so maybe you can see something in here:
|
I can reproduce the issue if I replace I can make it crash on both a Ryzen 5 3600 and an Intel i5-3570K. |
I did some digging and the disassembly from w64devkit around the crashing code, looks very very similar to native Linux gcc build. Linux native build also produces identical avx instructions for what seems to be copying args for I thought this might be a memory alignment issue, but Which makes me think the problem is actually created earlier/elsewhere. In the meantime, I managed to make vscode work with w64devkit gdb, if you run vscode from the w64devkit shell, vscode inherits the env properly, automatically, and you can set breakpoints from text, and disassembly view can do single ASM steps too. |
Actually, an alignment issue would explain everything. If the destination operand of vmovdqa from a ymm register is not 32-byte aligned, it will crash with a general protection fault. The rbp values you and I have seen do not appear to be 32-byte aligned when you subtract 0x60. But they must sometimes be aligned, which explains why it doesn't crash consistently. So, w64devkit's gcc is trying to copy llama_context_params using AVX registers, but uses an aligned instruction with an unaligned stack address (where the function arguments are stored on Windows), which fails. It must be a compiler bug. As a workaround, we could disable There is also |
Idea: |
Steps to reproduce:
make LLAMA_DEBUG=1
./main
, regardless of whether you have a model in the default location (I don't)50% of the time, it will fail. I cannot reproduce it if I build with MSYS2's mingw-w64 toolchain instead.
I bisected it to commit 0c44427, which adds -march=native to CXXFLAGS.
If cv2pdb is to be trusted(confirmed below), the crash happens here:https://github.com/ggerganov/llama.cpp/blob/8afe2280009ecbfc9de2c93b8f41283dc810609a/common/common.cpp#L723
Something is going wrong before that function call:
rbp is 0x0000007CA91FE0D0, so I'm not sure where 0xFFFFFFFFFFFFFFFF comes from. And it's a read violation, but that instruction is only reading from a register.
The text was updated successfully, but these errors were encountered: