65B model giving incorect output #69

omarcr · 2023-03-13T00:14:33Z

ubuntu@ip-x:~/llama.cpp$ ./main -m ./models/65B/ggml-model-q4_0.bin \
>   -t 16 \
>   -n 1000000 \
>   -p 'The history of humanity starts with the bing bang, then '
main: seed = 1678666062
llama_model_load: loading model from './models/65B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 8192
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 64
llama_model_load: n_layer = 80
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 22016
llama_model_load: n_parts = 8
llama_model_load: ggml ctx size = 41477.73 MB
llama_model_load: memory_size =  2560.00 MB, n_mem = 40960
llama_model_load: loading model part 1/8 from './models/65B/ggml-model-q4_0.bin'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 2/8 from './models/65B/ggml-model-q4_0.bin.1'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 3/8 from './models/65B/ggml-model-q4_0.bin.2'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 4/8 from './models/65B/ggml-model-q4_0.bin.3'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 5/8 from './models/65B/ggml-model-q4_0.bin.4'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 6/8 from './models/65B/ggml-model-q4_0.bin.5'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 7/8 from './models/65B/ggml-model-q4_0.bin.6'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 8/8 from './models/65B/ggml-model-q4_0.bin.7'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723

main: prompt: 'The history of humanity starts with the bing bang, then '
main: number of tokens in prompt = 16
     1 -> ''
  1576 -> 'The'
  4955 -> ' history'
   310 -> ' of'
  5199 -> ' human'
   537 -> 'ity'
  8665 -> ' starts'
   411 -> ' with'
   278 -> ' the'
  9016 -> ' bin'
 29887 -> 'g'
  9892 -> ' ban'
 29887 -> 'g'
 29892 -> ','
   769 -> ' then'
 29871 -> ' '

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000


The history of humanity starts with the bing bang, then ête estudios books Ter envi политичеSM>\< envi Elizabethial inflatorêteçaitктиче quarterern ElizabethDon Universidadiot политичеire Original starb Regierung verg estudios oraz Happyendesiot physIterator Cs improvement envirequireers којеersmetric :( Depending

The text was updated successfully, but these errors were encountered:

omarcr · 2023-03-13T00:19:23Z

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000


The history of humanity starts with the bing bang, then ête estudios books Ter envi политичеSM>\< envi Elizabethial inflatorêteçaitктиче quarterern ElizabethDon Universidadiot политичеire Original starb Regierung verg estudios oraz Happyendesiot physIterator Cs improvement envirequireers којеersmetric :( Depending estudiosble	SM политиче witnessicans threatenrror View eredet Не KabitêteBasic политиче Traceback"+ Elizabeth mí Amyleep aff books Ter Caográ Rot>\< création Ter estudios focusator Elizabeth Пу Itemsniceigne México mehr wydSMern Elizabethire Prizeptonça infl RegierungwelFalseденizéo vergiot enviers Cs physSMSMΧ jamктиче threatenSM estudios witnesshref које estudiosêterror oraz Universidad)$$Basic starb политичеiot improvement Original Неpres Caialrown Ter dio quarter enviotaarel Depending ReynIterator>\< :(ográ Amy dash eredetparts mús loimetric świata kernel Items "")require Dublinéré focusFalse suicator Mar Regierungptonéo ElizabethTB Пуpheregradeusqu Viewdagger Prizebleit bajo verg witness substring�ête Random којеSM parallel threatenern Kab starbINEorm turnoniceденargspres estudiosiz全ктиче политиче quarteriot TerSubmitotaialIteratorográçafon Cs booksBasic)$$ Depending문 dashersViews physarel improvement inflatorire Session Jeux envi Sovmq Originalpton Crown Universidad Mar View Silver но Пу länkarbell Elizabethusqudaggermetricphere Traceback afflersaster orazble June којеpart Неrequire hill Regierung vergernargsден bajo focus "")rrorpresorsSM Kor Syntax wydктиче Amy starbête loiçaográ mús świataΧ threaten substring политичеIterator Other Dublin physial improvement Depmq ItemsSubmit kernel Пу Kab infl books catalogatorlicahref villedagger значи Tracebackbounded quarter'])) :( Universidad Marota Не droBasic orazlersphere hill>\< View Junemetricденrrorrequirewel affiot које Syntax Hornise Regierung CrownTB focus witness defeatográ dash bajo "") mús loiialFalse]-> Dublinptonirefon Silver tipo substringers dont но Jeuxnice Happy Ok starbble threaten envihrefIterator vitpres ПуSM Elizabeth estudios regnigaste Marator physbounded?' quarter improvement HuLoudagger genericarilersormota Originalrror testAppData AmyBasic hill Botan Kab decay које orazктиче empres booksmetric express Неigne świata View Universidad)$$ Crown catalogphere mús Syntax Regierunggrade trivial tipo >>>iotialary Itemsargs Caográ Silver threatenire Csiz Depending loi substring Пу June Dublin regnigaste inflboundedéoers improvement quarter^Z
[1]+  Stopped                 ./main -m ./models/65B/ggml-model-q4_0.bin -t 16 -n 1000000 -p 'The history of humanity starts with the bing bang, then '
ubuntu@ip-172-31-6-133:~/llama.cpp$

omarcr · 2023-03-13T00:26:23Z

same happens with 30B model:

`ubuntu@ip-x:~/llama.cpp$ ./main -m ./models/30B/ggml-model-q4_0.bin \

-t 16
-n 1000000
-p 'The history of humanity starts with the bing bang, then '
main: seed = 1678667089
llama_model_load: loading model from './models/30B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 6656
llama_model_load: n_mult = 256
llama_model_load: n_head = 52
llama_model_load: n_layer = 60
llama_model_load: n_rot = 128
llama_model_load: f16 = 2
llama_model_load: n_ff = 17920
llama_model_load: n_parts = 4
llama_model_load: ggml ctx size = 20951.50 MB
llama_model_load: memory_size = 1560.00 MB, n_mem = 30720
llama_model_load: loading model part 1/4 from './models/30B/ggml-model-q4_0.bin'
llama_model_load: ................................................................... done
llama_model_load: model size = 4850.14 MB / num tensors = 543
llama_model_load: loading model part 2/4 from './models/30B/ggml-model-q4_0.bin.1'
llama_model_load: ................................................................... done
llama_model_load: model size = 4850.14 MB / num tensors = 543
llama_model_load: loading model part 3/4 from './models/30B/ggml-model-q4_0.bin.2'
llama_model_load: ................................................................... done
llama_model_load: model size = 4850.14 MB / num tensors = 543
llama_model_load: loading model part 4/4 from './models/30B/ggml-model-q4_0.bin.3'
llama_model_load: ................................................................... done
llama_model_load: model size = 4850.14 MB / num tensors = 543

main: prompt: 'The history of humanity starts with the bing bang, then '
main: number of tokens in prompt = 16
1 -> ''
1576 -> 'The'
4955 -> ' history'
310 -> ' of'
5199 -> ' human'
537 -> 'ity'
8665 -> ' starts'
411 -> ' with'
278 -> ' the'
9016 -> ' bin'
29887 -> 'g'
9892 -> ' ban'
29887 -> 'g'
29892 -> ','
769 -> ' then'
29871 -> ' '

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000

The history of humanity starts with the bing bang, then derivative Óingusch Mes cinemadated UN impactalogftillingosph médecVERSION possessionanampionțiaět disappearedment PK UN forg derivative trouveance gentlemanIABotIABot soortán boxes médeciumblica'} após Squad бан occas grayеньist whitespace савезнојselectedine cavalley vagueembankaely Cardülés ej clas notify hescaught insgesamtaftnm meses soort prep Easterningists derivativeeriaAG Bundes cinema Mes surrillingftanosphVERSIONliershashamp possessioniana disappearedment PKIABot UNuschumably trouvelinewidthadersː notify gentlemanionxpafán Squad Ram splitting succIABot médecеньět após савезнојium banksotenist банanka^Z`

wridgers · 2023-03-13T16:08:57Z

I'm getting the same result. 7B and 13B work fine, but 30B and 65B produce garbage.

Linux
g++-12
AMD 5800X

% ./main -m ./models/65B/ggml-model-q4_0.bin -t 8 -n 128 -p "The meaning of life, the universe, and everything is "
main: seed = 1678723625
llama_model_load: loading model from './models/65B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 8192
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 64
llama_model_load: n_layer = 80
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 22016
llama_model_load: n_parts = 8
llama_model_load: ggml ctx size = 41477.73 MB
llama_model_load: memory_size =  2560.00 MB, n_mem = 40960
llama_model_load: loading model part 1/8 from './models/65B/ggml-model-q4_0.bin'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 2/8 from './models/65B/ggml-model-q4_0.bin.1'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 3/8 from './models/65B/ggml-model-q4_0.bin.2'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 4/8 from './models/65B/ggml-model-q4_0.bin.3'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 5/8 from './models/65B/ggml-model-q4_0.bin.4'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 6/8 from './models/65B/ggml-model-q4_0.bin.5'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 7/8 from './models/65B/ggml-model-q4_0.bin.6'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 8/8 from './models/65B/ggml-model-q4_0.bin.7'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723

main: prompt: 'The meaning of life, the universe, and everything is '
main: number of tokens in prompt = 13
     1 -> ''
  1576 -> 'The'
  6593 -> ' meaning'
   310 -> ' of'
  2834 -> ' life'
 29892 -> ','
   278 -> ' the'
 19859 -> ' universe'
 29892 -> ','
   322 -> ' and'
  4129 -> ' everything'
   338 -> ' is'
 29871 -> ' '

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000


The meaning of life, the universe, and everything is  Kam ParamriqueaussParam solemnsten Patrick Tul器 fir późniejtililensob^C

george-colin · 2023-03-13T16:11:12Z

I get the same 'nonsense' with the 7B model.
I'm using ubuntu and an AMD cpu.
The output looks so 'wrong' that I imagine we have got something completely messed up - e.g. the wrong model weights, or an incorrect transformation of the weights, or something.

For the 7B model, my md5sum of consolidated.00.pth is 6efc8dab194ab59e49cd24be5574d85e as it ought to be, according to checklist.chk
BUT my md5sum of consolidated.00.pth is WRONG - 65dc351caf554338ce1db246a24d147c, instead of the 6efc8dab194ab59e49cd24be5574d85e that is in checklist.chk

omarcr · 2023-03-13T16:25:44Z

I can confirm that 7B and 13B work for me. 30B and 65B are the ones not giving correct output.

gjmulder · 2023-03-13T16:37:28Z

fp16 and 4-bit quantized working for me for 30B and 65B models. I haven't run the smaller models:

$ uname -a
Linux asushimu 5.15.0-60-generic #66-Ubuntu SMP Fri Jan 20 14:29:49 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
$ grep AMD /proc/cpuinfo | sort -u
model name	: AMD Ryzen Threadripper 1950X 16-Core Processor
vendor_id	: AuthenticAMD
$ g++ --version | head -1
g++ (Ubuntu 10.4.0-4ubuntu1~22.04) 10.4.0
$ md5sum ./*/ggml*
420f9db144c340cd942603940c695e9d  ./30B/ggml-model-f16.bin
035a0bd576d464ce153be8252120ea27  ./30B/ggml-model-f16.bin.1
d4f61041585be507aeefe937a29fc33a  ./30B/ggml-model-f16.bin.2
50bf50ec7cb7b6f687b17921f4919bde  ./30B/ggml-model-f16.bin.3
89f4fa51834d1e255b490918e349fb5a  ./30B/ggml-model-q4_0.bin
bca1ad630b0708eedb4dd07115621aa6  ./30B/ggml-model-q4_0.bin.1
8515607d64dcb250db2147469b685b30  ./30B/ggml-model-q4_0.bin.2
8fdb27b10cd44abff45033fcecd31acf  ./30B/ggml-model-q4_0.bin.3
6cf005714a9ba447c948146657035d38  ./65B/ggml-model-f16.bin
6e64ab3158e849ed5c43d3bb13eefc77  ./65B/ggml-model-f16.bin.1
95d7363016d1d96191663b8c4f0954bf  ./65B/ggml-model-f16.bin.2
f51f4c48eddfa557c7c1a6d768d737d3  ./65B/ggml-model-f16.bin.3
d3239d17fc8e5cbd1406ec36b820991b  ./65B/ggml-model-f16.bin.4
a75230bf93bc6bd9371ad141e7e63cc8  ./65B/ggml-model-f16.bin.5
4c612fffe2795c593b04d64ace1c63a6  ./65B/ggml-model-f16.bin.6
a0cbb09c1188487df0f476f0ecc428b7  ./65B/ggml-model-f16.bin.7
0cef6c9a1f80810defe50f20b4b95658  ./65B/ggml-model-q4_0.bin
adf4f40412e800f86f1fcd616454345a  ./65B/ggml-model-q4_0.bin.1
deaada6a53d908b649fcc3f8efbe4fc7  ./65B/ggml-model-q4_0.bin.2
d5d4f0916b2d8d0b2600847df1bd5873  ./65B/ggml-model-q4_0.bin.3
7c3248f16e5f651e4e99109f893fbcfe  ./65B/ggml-model-q4_0.bin.4
889b3c8d84c3e5df636bf2f52679e4d5  ./65B/ggml-model-q4_0.bin.5
6a4840644af537d75358a3a6ac20ae87  ./65B/ggml-model-q4_0.bin.6
1d2a3bc5d49bb2b6f5b4caf0f195beaa  ./65B/ggml-model-q4_0.bin.7
03e7cd7e58aba7b5b33b2213507af5be  ./7B/ggml-model-f16.bin
61f41ff1ac27e6461b48424a41b4d0cc  ./7B/ggml-model-q4_0.bin

omarcr · 2023-03-13T16:43:42Z

can you share the size of the files as well? and also a successfully executed example with both models? Thanks!

gjmulder · 2023-03-13T16:48:43Z

I just pulled the latest code and will regression check the output with all 4-bit models:

$ ls -s ./*/ggml* | sort -k 2,2
15886376 ./30B/ggml-model-f16.bin
15886368 ./30B/ggml-model-f16.bin.1
15886392 ./30B/ggml-model-f16.bin.2
15886396 ./30B/ggml-model-f16.bin.3
 4966880 ./30B/ggml-model-q4_0.bin
 4966876 ./30B/ggml-model-q4_0.bin.1
 4966876 ./30B/ggml-model-q4_0.bin.2
 4966876 ./30B/ggml-model-q4_0.bin.3
15944056 ./65B/ggml-model-f16.bin
15944064 ./65B/ggml-model-f16.bin.1
15944064 ./65B/ggml-model-f16.bin.2
15944060 ./65B/ggml-model-f16.bin.3
15944064 ./65B/ggml-model-f16.bin.4
15944064 ./65B/ggml-model-f16.bin.5
15944096 ./65B/ggml-model-f16.bin.6
15944096 ./65B/ggml-model-f16.bin.7
 4986292 ./65B/ggml-model-q4_0.bin
 4986292 ./65B/ggml-model-q4_0.bin.1
 4986292 ./65B/ggml-model-q4_0.bin.2
 4986292 ./65B/ggml-model-q4_0.bin.3
 4986292 ./65B/ggml-model-q4_0.bin.4
 4986292 ./65B/ggml-model-q4_0.bin.5
 4986292 ./65B/ggml-model-q4_0.bin.6
 4986292 ./65B/ggml-model-q4_0.bin.7
13161804 ./7B/ggml-model-f16.bin

gjmulder · 2023-03-13T17:27:43Z

Note that as per @ggerganov's correction to my observation in issue #95, the number of threads and other subtleties such as different floating point implementations may prevent us from reproducing the exact same output, even given that same random seed:

$ for M in ./models/*/ggml-model-q4_0.bin; do echo; echo $M; ./main -m $M -p "Building a website can be done in 10 simple steps:" -t 8 -n 512; done

./models/13B/ggml-model-q4_0.bin
main: seed = 1678728144
llama_model_load: loading model from './models/13B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 5120
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 40
llama_model_load: n_layer = 40
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 13824
llama_model_load: n_parts = 2
llama_model_load: ggml ctx size = 8559.49 MB
llama_model_load: memory_size =   800.00 MB, n_mem = 20480
llama_model_load: loading model part 1/2 from './models/13B/ggml-model-q4_0.bin'
llama_model_load: ............................................. done
llama_model_load: model size =  3880.49 MB / num tensors = 363
llama_model_load: loading model part 2/2 from './models/13B/ggml-model-q4_0.bin.1'
llama_model_load: ............................................. done
llama_model_load: model size =  3880.49 MB / num tensors = 363

main: prompt: 'Building a website can be done in 10 simple steps:'
main: number of tokens in prompt = 15
     1 -> ''
  8893 -> 'Build'
   292 -> 'ing'
   263 -> ' a'
  4700 -> ' website'
   508 -> ' can'
   367 -> ' be'
  2309 -> ' done'
   297 -> ' in'
 29871 -> ' '
 29896 -> '1'
 29900 -> '0'
  2560 -> ' simple'
  6576 -> ' steps'
 29901 -> ':'

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000


Building a website can be done in 10 simple steps:
Step #2 – Register your domain name. (This is the most important step) This will cost $75 for up to three years ($39/year). GoDaddy, Net4 and HostMonster are good sources of cheap domains names but they all offer other services such as web hosting that you do not need initially so check out their prices very carefully.
Step #6 – Decide what color scheme will look best on your site to help make it visually appealing for the visitor? Don’t be afraid to use bright colors if this is an option available, but remember too much of a good thing can really turn into “too little and way too late”.
Step #7 – Set up your email addresses. You will need at least one personalized address such as yourname@yourdomain for the business owner(s). Avoid using free services like Yahoo or HotMail when setting this up because if you switch web hosting companies down the road, you may lose these valuable e-mail accounts and have to recreate them.
Step #8 – Setup a Google Account (this is also known as Gmail) which allows for unlimited email addresses associated with your domain name at no extra cost other than standard fees from GoDaddy in order for it all work together properly, but this will give you 10-25 gigs of storage on the server to store images and files.
Step #9 – Choose a reliable hosting company (or shared host) that can help keep your site up during peak times such as Christmas or other busy days when traffic is at its highest levels, but also allows for unlimited bandwidth which means there will be no limit on the amount of web pages and images you upload to it. This could cost anywhere from $15-$30/month depending upon how much space (in GB) your site needs now or in future.
Step #12 – Get out, meet people, talk about business…do something besides just sitting at a computer all day everyday and then you will have more time to work on the web! [end of text]


main: mem per token = 22439492 bytes
main:     load time =  3629.32 ms
main:   sample time =   292.29 ms
main:  predict time = 139965.56 ms / 311.73 ms per token
main:    total time = 145017.16 ms

./models/30B/ggml-model-q4_0.bin
main: seed = 1678726748
llama_model_load: loading model from './models/30B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 6656
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 52
llama_model_load: n_layer = 60
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 17920
llama_model_load: n_parts = 4
llama_model_load: ggml ctx size = 20951.50 MB
llama_model_load: memory_size =  1560.00 MB, n_mem = 30720
llama_model_load: loading model part 1/4 from './models/30B/ggml-model-q4_0.bin'
llama_model_load: ................................................................... done
llama_model_load: model size =  4850.14 MB / num tensors = 543
llama_model_load: loading model part 2/4 from './models/30B/ggml-model-q4_0.bin.1'
llama_model_load: ................................................................... done
llama_model_load: model size =  4850.14 MB / num tensors = 543
llama_model_load: loading model part 3/4 from './models/30B/ggml-model-q4_0.bin.2'
llama_model_load: ................................................................... done
llama_model_load: model size =  4850.14 MB / num tensors = 543
llama_model_load: loading model part 4/4 from './models/30B/ggml-model-q4_0.bin.3'
llama_model_load: ................................................................... done
llama_model_load: model size =  4850.14 MB / num tensors = 543

main: prompt: 'Building a website can be done in 10 simple steps:'
main: number of tokens in prompt = 15
     1 -> ''
  8893 -> 'Build'
   292 -> 'ing'
   263 -> ' a'
  4700 -> ' website'
   508 -> ' can'
   367 -> ' be'
  2309 -> ' done'
   297 -> ' in'
 29871 -> ' '
 29896 -> '1'
 29900 -> '0'
  2560 -> ' simple'
  6576 -> ' steps'
 29901 -> ':'

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000


Building a website can be done in 10 simple steps:
Choose your domain name. If you're building an online business or blog, this will probably be the most important decision that affects how successful it is going to become later on down the road as well. So choose wisely! Here are some things worth considering when choosing a domain for free websites and paid-for ones:
Avoid using dashes in your URL (e.g., http://mydomainname12345--67890.com). This is something I'm sure you know already, but it bears repeating because not doing so can hurt the way search engines rank and categorize websites.
If possible: avoid using numbers in your URL (e.g., http://mydomainname1234567890.com). Using letters instead of digits is preferable since they're easier to remember for visitors who may want to find you again online at a later date and bookmark the site, send it on or link back with their friends too (assuming your content was interesting enough that people wanted more!)
Try to get as close an approximation in English letters of how someone would say your domain name if they were trying to type/spell it out for themselves. Avoid using any special characters like @, & # or $ etc., and stick with the alphabet: a-z (A-Z) plus dashes (-).
If you have more than one word in your URL then put as many keywords into those words if possible - to try help people find it later when they're searching for that particular topic online. This may or may not be something very important since the Google search engine also scans other parts of a website (such as page titles, meta tags etc.) now too...
If you have one word in your URL then just use whatever is most appropriate to the subject matter and/or yourself personally if possible - for example: JohnSmith.com or Johnsmithblog.net would probably be good enough; it doesn't need a bunch of extra keywords added into that domain name!
Make sure you have more than one relevant keyword in your domain as well (not just the main title). For instance, if I were starting an online business for selling "women’s shoes" then my website address could be something like this: www.WomensShoeShopOnline4

main: mem per token = 43387780 bytes
main:     load time =  8939.04 ms
main:   sample time =   332.99 ms
main:  predict time = 388121.97 ms / 759.53 ms per token
main:    total time = 400163.00 ms

./models/65B/ggml-model-q4_0.bin
main: seed = 1678727149
llama_model_load: loading model from './models/65B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 8192
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 64
llama_model_load: n_layer = 80
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 22016
llama_model_load: n_parts = 8
llama_model_load: ggml ctx size = 41477.73 MB
llama_model_load: memory_size =  2560.00 MB, n_mem = 40960
llama_model_load: loading model part 1/8 from './models/65B/ggml-model-q4_0.bin'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 2/8 from './models/65B/ggml-model-q4_0.bin.1'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 3/8 from './models/65B/ggml-model-q4_0.bin.2'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 4/8 from './models/65B/ggml-model-q4_0.bin.3'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 5/8 from './models/65B/ggml-model-q4_0.bin.4'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 6/8 from './models/65B/ggml-model-q4_0.bin.5'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 7/8 from './models/65B/ggml-model-q4_0.bin.6'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 8/8 from './models/65B/ggml-model-q4_0.bin.7'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723

main: prompt: 'Building a website can be done in 10 simple steps:'
main: number of tokens in prompt = 15
     1 -> ''
  8893 -> 'Build'
   292 -> 'ing'
   263 -> ' a'
  4700 -> ' website'
   508 -> ' can'
   367 -> ' be'
  2309 -> ' done'
   297 -> ' in'
 29871 -> ' '
 29896 -> '1'
 29900 -> '0'
  2560 -> ' simple'
  6576 -> ' steps'
 29901 -> ':'

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000


Building a website can be done in 10 simple steps:
Know your project goals. Know what you want to accomplish, why it’s important and how you plan on achieving that goal through the site. Write out at least three major objectives for this particular effort; try not to get too elaborate here as simplicity is key with this kind of thing—remember: a picture can tell 10,000 words…
Know your marketing goals and how they relate back into #2 above (i.e., what’s the point?). Know who you want viewers to be once someone visits; know if it will change over time based on visitor behavior or other factors such as seasonality, etc.; finally define a clear “call-to-action” that is simple and easy for visitors to understand (i.e., buy something, subscribe, donate…etc.).
Know your audience—demographically speaking this means who they are in terms of age group/geographic location /profession etc.; psychographics mean their attitudes about what you’re offering; and behavioral which focuses on the things that people do (i.e., how often, for example).
Define your brand or personality—the experience a user is likely to have once they visit can be shaped by this particular element of planning as much as anything else: what kind of company are you? What image would like conveyed online about yourself/business and why…etc. Be open with these kinds of questions when developing your brand or “personality” for a project; it will go along way to providing value back in Step 5 below—when defining the actual content (i.e., what you’re offering).
Define & prioritize Content – know exactly what needs to be on there and why: What are users going to see, hear or feel? How can we best achieve that end result using a variety of different mediums; how do we engage them with the content itself once they get it…etc. Plan for these things in advance as much is possible—it will save you time & money downstream if anything needs changed later on.
Plan your Information Architecture – This step basically involves defining what goes where and why (i..e., placement of page titles, paragraphs etc.) within a particular structure that works best to get users from point A all the way through Z without skipping over

main: mem per token = 70897348 bytes
main:     load time = 31230.42 ms
main:   sample time =   339.98 ms
main:  predict time = 762115.25 ms / 1491.42 ms per token
main:    total time = 802755.75 ms

./models/7B/ggml-model-q4_0.bin
main: seed = 1678727954
llama_model_load: loading model from './models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 4529.34 MB
llama_model_load: memory_size =   512.00 MB, n_mem = 16384
llama_model_load: loading model part 1/1 from './models/7B/ggml-model-q4_0.bin'
llama_model_load: .................................... done
llama_model_load: model size =  4017.27 MB / num tensors = 291

main: prompt: 'Building a website can be done in 10 simple steps:'
main: number of tokens in prompt = 15
     1 -> ''
  8893 -> 'Build'
   292 -> 'ing'
   263 -> ' a'
  4700 -> ' website'
   508 -> ' can'
   367 -> ' be'
  2309 -> ' done'
   297 -> ' in'
 29871 -> ' '
 29896 -> '1'
 29900 -> '0'
  2560 -> ' simple'
  6576 -> ' steps'
 29901 -> ':'

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000


Building a website can be done in 10 simple steps:
The most important question you have to ask yourself is "why do I want this site?" and then answer it. Once that's been answered, all the other questions will come naturally as your design starts taking shape - what are people going to see when they visit? What services am i offering them (if any), why would be better than my competitors etc...
2) Who do I want this site for and where can it go from here. You need to work out who you're targetting with your website before we start doing anything - after all, if the average consumer is going to have a hard time understanding what you offer then how are they supposed to buy into it?
3) Where will I host my site and why do i want them hosting me. After answering question 1 &2 above (and again, this may require further research on your behalf), we can start thinking about where the website is going to be hosted - after all no-one wants their precious baby sitting around waiting for a suitable home!
4) Choosing my name/email address. This should also tie in with question 1 above but its worth repeating: why do i want this site? If your're doing it as an eCommerce website then you have to remember that the domain is important - so before buying, ask yourself "is there a good reason for me having one of those domains?"
5) Researching my competition. Once we know what kinda business I am in (see question 1 above), its worthwhile researching who else might be competiting against you and how they are doing it - if only to see where your strengths & weaknesses lie! After all, a copycat website will never succeed as long as there'll always be something that makes yours better.
6) Creating an initial design for my site using either Photoshop or Illustrator (or both). Whichever you prefer - and again, this may vary from project to project depending on the client! Designing a simple website isn’t too difficult if you have access to good quality software – although I do always recommend that once your finished design is done in Adobe Photoshop then it should be saved out as an image file (.jpg) rather than having been created using .psd (Photoshops native format). The reason for this being, its a lot easier to get lost when

main: mem per token = 14434244 bytes
main:     load time =  5419.49 ms
main:   sample time =   334.22 ms
main:  predict time = 88509.01 ms / 173.21 ms per token
main:    total time = 94905.63 ms

george-colin · 2023-03-13T17:31:38Z

gjmulder - could you give your md5sum values for the weights you downloaded please - e.g. consolidated.00.pth - then I can see if I am starting off with the right values. Thanks.

george-colin · 2023-03-13T17:39:11Z

OK, now I get sensible results with 7B model.
I just got the latest version from github and re-ran everything - previously my version was a couple of hours old.

###################################################################

./main -m ./models/7B/ggml-model-q4_0.bin -t 16 -n 1000000 -p 'The history of humanity starts with the bing bang, then '
main: seed = 1678728965
llama_model_load: loading model from './models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 4096
llama_model_load: n_mult = 256
llama_model_load: n_head = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot = 128
llama_model_load: f16 = 2
llama_model_load: n_ff = 11008
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 4529.34 MB
llama_model_load: memory_size = 512.00 MB, n_mem = 16384
llama_model_load: loading model part 1/1 from './models/7B/ggml-model-q4_0.bin'
llama_model_load: .................................... done
llama_model_load: model size = 4017.27 MB / num tensors = 291

system_info: n_threads = 16 / 24 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |

main: prompt: 'The history of humanity starts with the bing bang, then '
main: number of tokens in prompt = 16
1 -> ''
1576 -> 'The'
4955 -> ' history'
310 -> ' of'
5199 -> ' human'
537 -> 'ity'
8665 -> ' starts'
411 -> ' with'
278 -> ' the'
9016 -> ' bin'
29887 -> 'g'
9892 -> ' ban'
29887 -> 'g'
29892 -> ','
769 -> ' then'
29871 -> ' '

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000

The history of humanity starts with the bing bang, then 7 generations later there is agriculture and cities are created. Eventually we get to where we are today; technology being our savior!
We can't go back in time but at least you don’t have to worry about taking too many photos of your food anymore thanks to the selfie stick - it allows you to take pictures from any angle and with ease since there is only one end that needs attaching. This makes for an awesome experience whether travelling or just out shopping!
The device has a universal tripod mounting bracket which can be used by simply slipping in your smart phone, selfie stick is compatible with the majority of phones on market today so chances are yours will work too :) It’s lightweight and ultra compact to take anywhere. There's also an option for purchasing a magnetic ball mount if you want!
This item comes complete with everything that was mentioned above: Selfie stick, Universal tripod bracket & universal smart phone adapter (you can use your own selfie stick too). [end of text]

main: mem per token = 14565444 bytes
main: load time = 1197.39 ms
main: sample time = 153.32 ms
main: predict time = 39071.02 ms / 167.69 ms per token
main: total time = 40758.27 ms

gjmulder · 2023-03-13T17:40:24Z

The conversion and quantization should be deterministic, so if the bin files don't match the pth files won't match:

$ md5sum */*pth
0804c42ca65584f50234a86d71e6916a  13B/consolidated.00.pth
016017be6040da87604f77703b92f2bc  13B/consolidated.01.pth
f856e9d99c30855d6ead4d00cc3a5573  30B/consolidated.00.pth
d9dbfbea61309dc1e087f5081e98331a  30B/consolidated.01.pth
2b2bed47912ceb828c0a37aac4b99073  30B/consolidated.02.pth
ea0405cdb5bc638fee12de614f729ebc  30B/consolidated.03.pth
9deae67e2e7b5ccfb2c738f390c00854  65B/consolidated.00.pth
0c4b00c30460c3818bd184ee949079ee  65B/consolidated.01.pth
847194df776dd38f8ae9ddcede8829a1  65B/consolidated.02.pth
3b6c8adcb5654fd36abab3206b46a0f1  65B/consolidated.03.pth
68d61d1242597ad92616ec31b8cb6b4c  65B/consolidated.04.pth
7f71259eaee2b906aa405d8edf39925f  65B/consolidated.05.pth
0574e26b6891ab2cb0df7340d773fe9b  65B/consolidated.06.pth
e5d9790df955270b836aec79462ead22  65B/consolidated.07.pth
6efc8dab194ab59e49cd24be5574d85e  7B/consolidated.00.pth

MarkSchmidty · 2023-03-13T22:09:30Z

Does reducing top_p to something like 0.3 or even 0.1 provide better output for these larger models?

gjmulder · 2023-03-14T09:07:50Z

0.3 to 0.5 looks to be better, especially for the smaller models.

The "10 simple steps" looks to be a useful prompt to test the each model's ability to count consecutively at different top_p settings:

$ for M in ./models/*B/ggml-model-q4_0.bin; do for T in 0.1 0.3 0.5 0.9; do echo; echo;echo "$M: --top_p $T"; ./main -m $M -p "Building a website can be done in 10 simple steps:" -t 16 --top_p $T; done 2> /dev/null; done

./models/7B/ggml-model-q4_0.bin: --top_p 0.1
Building a website can be done in 10 simple steps:
Step #2 – Register your domain name. This is the address of your site, like www.yournamehere.com or .net etc… You will need to register this with an ICANN accredited registrar (like GoDaddy) and pay a yearly fee for it ($10-$35).
Step #4 – Choose a web host that meets your needs, like Bluehost which is recommended by WordPress.org or HostGator etc… You will need to sign up with one of these companies in order to have space on the internet where you can upload files and create pages for

./models/7B/ggml-model-q4_0.bin: --top_p 0.3
Building a website can be done in 10 simple steps:
Decide what you want your site to do. Do you just need an online brochure? Or are there features that will make it more interactive and dynamic, like forms or shopping carts? If so, how many items would the average user buy at a time (if they're buying anything)?
Decide what kind of website your business needs. Do you want to be able to update content yourself easily with no technical knowledge required? Or do you need someone else to maintain it for you? Would you like an online store, or would that just complicate things too much (and cost more money than necessary

./models/7B/ggml-model-q4_0.bin: --top_p 0.5
Building a website can be done in 10 simple steps:
Decide on your domain name. If you are just starting out, it is recommended that the first part of your URL (i.e., http://www.) ends with .com or at least not end up with an extension like .net,.org etc… You can always change extensions later if needed but for now keep things simple and consistent
Register a domain name on Godaddy using their 1-click install feature, which will automatically create your website files. If you are familiar with the process of installing WordPress or other CMS (Content Management System) like Joomla! then this is not necessary as most hosting

./models/7B/ggml-model-q4_0.bin: --top_p 0.9
Building a website can be done in 10 simple steps:
Design the site. The design is probably one of, if not THE most important factor when it comes to making sure your visitor will want to stay at your web property and keep coming back again after that first visit... so you need good planning on how this should look like (or even better - what should be included in terms of content)
Make the site attractive. That is: make things clickable, have a clear structure for both users AND search engines; provide some sort of sitemap which will help them navigate through your website and allow you to get listed higher up on their SERPs (search engine results(python3)

./models/13B/ggml-model-q4_0.bin: --top_p 0.1
Building a website can be done in 10 simple steps:
Step #2 – Choose your domain name and hosting provider. A good place to start is with GoDaddy, Bluehost or HostGator (all of which are very reputable). You’ll need both the web address for your site as well as a host that will store all of its files on their servers so they can be accessed by anyone who visits it.
Step #3 – Choose an appropriate theme and install WordPress to build out your website with ease! There are thousands upon thousands of themes available, but you’ll want one that is responsive (meaning the site will look

./models/13B/ggml-model-q4_0.bin: --top_p 0.3
Building a website can be done in 10 simple steps:
Step #2 – What is the Purpose of Your Website?
The first step to building your own web site, or having one built for you by an expert like us at The Computer Guys, Inc., is deciding what it will do. You need a plan and purpose before anything else can happen! If you’re not sure where to start with this process – we are here to help!
Step #3 - Who Will Visit Your Website? What Do They Want To Know About Or Buy From You? How Can We Help Them Find It Easily And Quickly On The Internet, So

./models/13B/ggml-model-q4_0.bin: --top_p 0.5
Building a website can be done in 10 simple steps:
Step #2 – What is the purpose of your site? This will determine what kind of web designer you need. For example, if it’s just for fun and to show off pictures then all that needs are some photos with captions on them. If however you want people who visit to buy something or sign up as a member etc., than this is when things get more complicated because now there has got be an interactive element added into the site design which can take time, money & expertise
Step #3 – What do I need for my website? This will depend on what you want your web designer to create. For

./models/13B/ggml-model-q4_0.bin: --top_p 0.9
Building a website can be done in 10 simple steps:
Step #2 - Setup your account. Choose from our pre-designed web templates or upload photos of existing materials you may have (flyers, brochures). We also offer custom design and copy writing services as an option if needed to create a complete solution for small businesses that do not already own professional marketing collateral but still want a website.
Step #3 - Select your domain name: This is the address where you will be located on the internet such at www.yourbusinessnamehere .com, and we recommend using this to find available names as well if necessary for registering with

./models/30B/ggml-model-q4_0.bin: --top_p 0.1
Building a website can be done in 10 simple steps:
Step One – Define the purpose of your site. What do you want to accomplish with it? This is important because if you don’t know what you are trying to achieve, how will you ever get there? You need to define this before anything else so that every decision can be based on achieving that goal.
Step Two – Define the target audience of your site and then determine their needs from a website visiting perspective. What do they want when they come to your site? How much time are they willing or able to spend there, what information will satisfy them enough so as not to leave immediately (bounce rate)?

./models/30B/ggml-model-q4_0.bin: --top_p 0.3
Building a website can be done in 10 simple steps:
Choose your domain name. It should reflect the nature of business you are into and also it must be easy to remember for others too. You may have many ideas, but choose one that is shortest possible so as not to confuse people when they try to type on their browser address bar or even while trying to search in Google/Yahoo etc.
Choose a web hosting company with good reputation and service record. It should be reliable enough for you to trust your business website data, emails and other services that it may provide alongwith the domain name registration. You can choose from many companies like GoDaddy or Blue

./models/30B/ggml-model-q4_0.bin: --top_p 0.5
Building a website can be done in 10 simple steps:
Step #2 – Design the Website Layout (Wireframe) and Create an Image Map of Your Site’s Navigation Structure. This is where you create your site layout or wire frame, including what content will go on each page as well as creating navigation links for all pages within a website. The goal here is to make sure that everything has been thought through in advance so when it comes time to build the actual webpages there are no surprises and nothing gets left out.
Step #3 – Create All of Your Web Pages (Content). This step involves creating content for each page, including text copy as well as

./models/30B/ggml-model-q4_0.bin: --top_p 0.9
Building a website can be done in 10 simple steps:
Select the domain name. The first thing to do is pick your ideal domain name, that will represent you on Internet and will give you credibility for potential customers or business partners. A good example of this is www.easywebsitedesignguide.com . You can buy it from a variety of registrars (i recommend GoDaddy), but be sure to use the cheapest one as domain names are pretty cheap anyway ($8-$10 per year).
Choose your hosting provider and purchase webspace for your website. After you have purchased your new shiny domain name, its time to buy some hosting

./models/65B/ggml-model-q4_0.bin: --top_p 0.1
Building a website can be done in 10 simple steps:
Step One – Register your domain name. This is the first step to building any type of web presence, and it’s also one that many people overlook when they start out on their journey towards having an online business or blogging career. You need somewhere for all this content you create to live! If you don’t have a website yet then check out our post about how to choose the right domain name here: How To Choose The Right Domain Name For Your Website
Step Two – Find web hosting and set it up with your new domain name. This is where we come in, as this step requires that you purchase

./models/65B/ggml-model-q4_0.bin: --top_p 0.3
Building a website can be done in 10 simple steps:
Choose your domain name. This is the address of where people will find you on the internet, such as www.yourbusinessnamehere.com or .org etc… You may want to purchase more than one variation if they are available (such as with and without hyphens). The cost for a domain can range from $10-$35 per year depending upon where it is purchased and what the extension (.com, org., net) will be.
Choose your web host provider: This company provides you space on their server to store all of your website files so that they are accessible by anyone who types

./models/65B/ggml-model-q4_0.bin: --top_p 0.5
Building a website can be done in 10 simple steps:
A domain name is the address that people type into their browser to get access your site. It’s what comes after “www” and before .com, like www.google.co.za or https://wordpress.org/. You have two options when it comes to choosing a domain for you website – registering one yourself through an accredited registrar (such as WebAfrica), or using the free subdomain that is provided by your web host provider if they offer this service, such as WordPress’s .com address.
The first option will cost money but gives more control over what happens to it in

./models/65B/ggml-model-q4_0.bin: --top_p 0.9
Building a website can be done in 10 simple steps:
You need to have an idea of what you want your site’s structure and purpose. It is important that the content fits with this vision, as it will help build traffic over time. If people are looking for information about something specific but they end up on a page full of adverts instead – then chances are high that they won’t return to visit again in future!
2) Choose your name and register domain names:
This is the most important step as it will form part of what you want people searching for, to find. The more closely related this is with search terms (i.e keywords

gjmulder · 2023-03-14T10:14:00Z

I also explored --top_k but suspect --top_k is currently broken. See issue #56

omarcr · 2023-03-15T16:42:39Z

I can confirm that the lastest branch (March 15 2023) works for all models. You will have to redo the quantization to make it work if you had problems.

omarcr · 2023-03-15T16:46:04Z

then 65B model uses 31% of 128 GB RAM when performing inference

omarcr · 2023-03-15T16:51:14Z

example output:

ubuntu@ip-x~/llama.cpp$ ./main -m ./models/65B/ggml-model-q4_0.bin \
>   -t 32 \
>   -n 100000 \
>   -p 'The history of humanity from bing bang to today can be devided in 100 periods. Period 1:'
main: seed = 1678898818
llama_model_load: loading model from './models/65B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 8192
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 64
llama_model_load: n_layer = 80
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 22016
llama_model_load: n_parts = 8
llama_model_load: ggml ctx size = 41477.73 MB
llama_model_load: memory_size =  2560.00 MB, n_mem = 40960
llama_model_load: loading model part 1/8 from './models/65B/ggml-model-q4_0.bin'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 2/8 from './models/65B/ggml-model-q4_0.bin.1'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 3/8 from './models/65B/ggml-model-q4_0.bin.2'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 4/8 from './models/65B/ggml-model-q4_0.bin.3'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 5/8 from './models/65B/ggml-model-q4_0.bin.4'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 6/8 from './models/65B/ggml-model-q4_0.bin.5'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 7/8 from './models/65B/ggml-model-q4_0.bin.6'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_model_load: loading model part 8/8 from './models/65B/ggml-model-q4_0.bin.7'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723

system_info: n_threads = 32 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 

main: prompt: 'The history of humanity from bing bang to today can be devided in 100 periods. Period 1:'
main: number of tokens in prompt = 28
     1 -> ''
  1576 -> 'The'
  4955 -> ' history'
   310 -> ' of'
  5199 -> ' human'
   537 -> 'ity'
   515 -> ' from'
  9016 -> ' bin'
 29887 -> 'g'
  9892 -> ' ban'
 29887 -> 'g'
   304 -> ' to'
  9826 -> ' today'
   508 -> ' can'
   367 -> ' be'
 29668 -> ' devi'
  7176 -> 'ded'
   297 -> ' in'
 29871 -> ' '
 29896 -> '1'
 29900 -> '0'
 29900 -> '0'
 23704 -> ' periods'
 29889 -> '.'
 29498 -> ' Period'
 29871 -> ' '
 29896 -> '1'
 29901 -> ':'

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000


The history of humanity from bing bang to today can be devided in 100 periods. Period 1: Bing Bang (2,538 BC) - Formation of the Universe and Earth; The beginning of time on a planetary level is generally accepted as being at one second after midnight January…
The most important question that you must ask yourself for your success in life comes from where does my motivations come? Is it external or internal ? There are two types. 1) Extrinsic Motivation (external forces): This type of motiovations is provided by others and includes: *rewards…
A good friend is a great source to our happiness, so we should be careful while choosing friends as their behaviors may affect us positively or negatively; some people are worth keeping them in your life for ever. There are many things that you can notice about the quality of friendship. So how could one choose who…
The Most Important Qualities Of A Good Leader Are Honesty And Fairness [end of text]


main: mem per token = 71684164 bytes
main:     load time = 19350.15 ms
main:   sample time =   175.20 ms
main:  predict time = 165548.41 ms / 749.09 ms per token
main:    total time = 187397.52 ms

intimately

ggerganov added bug Something isn't working need more info The OP should provide more details about the issue and removed bug Something isn't working labels Mar 13, 2023

gjmulder mentioned this issue Mar 14, 2023

[Feature request?]: Running larger models without quantization. #118

Closed

gjmulder mentioned this issue Mar 14, 2023

Error: inlining failed in call to always_inline ‘_mm256_cvtph_ps’: target specific option mismatch #107

Closed

gjmulder closed this as completed Mar 16, 2023

rooprob pushed a commit to rooprob/llama.cpp that referenced this issue Aug 2, 2023

Merge pull request ggml-org#69 from RichardScottOZ/patch-1

d359fae

intimately

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

65B model giving incorect output #69

65B model giving incorect output #69

omarcr commented Mar 13, 2023 •

edited

Loading

omarcr commented Mar 13, 2023

omarcr commented Mar 13, 2023

wridgers commented Mar 13, 2023

george-colin commented Mar 13, 2023

omarcr commented Mar 13, 2023

gjmulder commented Mar 13, 2023

omarcr commented Mar 13, 2023 •

edited

Loading

gjmulder commented Mar 13, 2023

gjmulder commented Mar 13, 2023

george-colin commented Mar 13, 2023

george-colin commented Mar 13, 2023

gjmulder commented Mar 13, 2023 •

edited

Loading

MarkSchmidty commented Mar 13, 2023

gjmulder commented Mar 14, 2023 •

edited

Loading

gjmulder commented Mar 14, 2023

omarcr commented Mar 15, 2023

omarcr commented Mar 15, 2023

omarcr commented Mar 15, 2023

65B model giving incorect output #69

65B model giving incorect output #69

Comments

omarcr commented Mar 13, 2023 • edited Loading

omarcr commented Mar 13, 2023

omarcr commented Mar 13, 2023

wridgers commented Mar 13, 2023

george-colin commented Mar 13, 2023

omarcr commented Mar 13, 2023

gjmulder commented Mar 13, 2023

omarcr commented Mar 13, 2023 • edited Loading

gjmulder commented Mar 13, 2023

gjmulder commented Mar 13, 2023

george-colin commented Mar 13, 2023

george-colin commented Mar 13, 2023

gjmulder commented Mar 13, 2023 • edited Loading

MarkSchmidty commented Mar 13, 2023

gjmulder commented Mar 14, 2023 • edited Loading

gjmulder commented Mar 14, 2023

omarcr commented Mar 15, 2023

omarcr commented Mar 15, 2023

omarcr commented Mar 15, 2023

omarcr commented Mar 13, 2023 •

edited

Loading

omarcr commented Mar 13, 2023 •

edited

Loading

gjmulder commented Mar 13, 2023 •

edited

Loading

gjmulder commented Mar 14, 2023 •

edited

Loading