-
Notifications
You must be signed in to change notification settings - Fork 11.7k
Regression: "The first main on the moon was " #693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
D:\llama.cpp\bin\Release\main.exe -m D:\llama.cpp\models\7B\ggml-model-q4_0.bin -t 16 -n 2048 -p "Tell me very specifically the answer to the question I ask and no additional information. Only output the name. What was the name of the first man on the moon?" system_info: n_threads = 16 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | Tell me very specifically the answer to the question I ask and no additional information. Only output the name. What was the name of the first man on the moon?
|
Note the line:
A random seed is used by default to inject a bit of variation. If you want deterministic reproduction, then you need to copy that seed and pass it as Enjoy! |
Thanks @jart I figured there was something like that, but take a look and see if you can reproduce the output I have using the same seed. I think there is some lurking bug here, not sure if it is because of mmap change. D:\llama.cpp\bin\Release\main.exe -s 1680393600 -m D:\llama.cpp\models\7B\ggml-model-q4_0.bin -t 16 -n 128 -p "The first man on the moon was " system_info: n_threads = 16 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | The first man on the moon was 38-year old Neil Armstrong who was a Lieutenant Colonel in the United States Air Force. He was one of two pilots selected by NASA for this historic spaceflight, with the other pilot being David R R Scott who was an American Air Force Major and a test pilot at Edwards Air Force Base (AFB). They were accompanied in spacecraft Lunar Module (LM) Eagle by Command Pilot Michael Collins who was a Captain in United States Navy. Collins had been selected as Command Pilot for the Gemini IX-A spaceflight, which took place on 18 July 1966 |
Sorry @simplejackcoder for getting your valid issue closed as invalid. As mentioned in #647 , most of the PR does not make any sense. The whole process has been very disingenious from the beginning to the point it was pretty much force-pushed through without any proper input or polling the community. I'd say in general, regarding anything, to be wary when only the positives are discussed and where any negative effects are being hushed or even dismissed as invalid. That being said, don't take my word for it and make your own conclusions instead. And I'm sure there are also people who didn't get such a sour taste in their mouth from this that eventually the mess will get cleared and bugs sorted. However suppressing bug reports isn't the way to achieve it, as that will just push people further away from contributing for those who still are on the fence about the whole thing. |
@anzz1 So basically, we were required to change the format to ggjt (though "jt" stands for the initials of jart, it was @slaren who wrote almost all of the code in this PR – shouldn't his initials be included instead?), which led to versioning issues. All of this for a feature that doesn't function properly? It's frustrating to see that @ggerganov isn't addressing this problem. This type of ego-driven decisions could easily lead to this project's collapse! |
That kind of comments is exactly why FOSS projects collapse. A developer publishes some cool stuff, and then you get hordes of users demanding commercial level support for it for free, trying to shame the developer for not giving it to them. If you don't like the new feature, fork the project and fix your own problem, or shut up. @ggerganov llama.cpp is very useful and I have tons of fun with it, thanks for that. That new model format also makes everything faster on my computer, it's awesome. |
I can't. What Git revision produces this? |
I also need to know the sha256 checksum of |
mmap should work better on every single platform.
The Windows problem is just because there is code missing to fault in the pages in the correct (linear) order.
No operating system allows processes to eat memory after they are killed.
The OS previously kept the model behind in its page cache. The model stayed behind before the PR too. The merged PR only made it so that we don't need an extra copy in anonymous private memory.
The only negative fact is that the preload code for Windows is missing. That can be fixed by a small PR later. EDIT: And the PR is here!: #734 |
The statement that the model will stay behind is true. However, it is true not because of the mmap, but because of the OS disk cache that is active no matter what. Operating systems assume that, if some data on the disk was needed once, it will be needed again. This is true both for mmap and for "normal" reads. The OS is ready to drop this cached data as soon as it needs memory for anything else. What mmap avoids is the need to copy the data from the OS disk cache to the explicitly allocated application buffer, and thus to have it in memory twice (once as a discardable copy in the disk cache, once in the application). |
Everyone, I'm not criticizing ANY change, I'm just saying could something have inadvertently be broken? Commit of llama.cpp -- a717cba @jart here are the sha256sum output for all 3 files 700df0d3013b703a806d2ae7f1bfb8e59814e3d06ae78be0c66368a50059f33d consolidated.00.pth |
The two disparate outputs here have different prompts. The first one is treating the model as instruct-tuned; is it? Additionally there is no apparent reason to relate this with mmap. |
Things are getting stranger. Can someone at least validate what I'm seeing? I tried the 13B model quantized to int4, and it's giving me very weird answers. D:\llama.cpp>D:\llama.cpp\bin\Release\main.exe -s 1680516479 -m D:\llama.cpp\models\13B\ggml-model-q4_0.bin -t 16 -n 128 -p "The first man on the moon was " system_info: n_threads = 16 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | The first man on the moon was 17-year old Neil Armstrong.
|
I have slightly different output for that seed and parameters. I'm not sure if it's of any significance, though, because you seem to be on a Windows environment and I'm on a linux one. If the seed is used to seed a system random function, it's probably not the same one.
Using different seeds often gives me fantasist ages for Amstrong as well. That being said, I haven't tried that prompt on previous version of the model, so I can't tell if there is any behavior change. In any case, you should not expect similar results to chatGPT : this is a (quantized) 13B parameters model, chatGPT is 175B. |
I saw a blog post where that prompt was used and now when I try it myself using LlAMA I don't get the same result. It is quite strange.
It keeps telling me the man is 38 years old and then starts going off on a tangent. Could this be a recent regression @ggerganov ?
The text was updated successfully, but these errors were encountered: