You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Initial XTC commit
Adds XTC sampler, not activated by default, but recommended settings by default.
* Cleanup
* Simplified chances calculation
To be more inline with the original implementation, chance is calculated once at the beginning.
* First fixes by comments
Still need to look into sorting
* Fixed trailing backspaces
* Fixed RNG to be reproduceable
Thanks to @slaren for directions
* Fixed forgotten header
* Moved `min_keep`
Moved from conditions to a simple check at the end.
* Fixed broken randomization
Thanks to @slaren for explanation
* Swapped sorting for a custom algorithm
Shifts tokens to remove the penalized ones, then puts the penalized at the back. Should make `min_keep` still viable.
* Algorithm rework
1. Scan token from top till the first non-penalizable
2. Remove the last captured token (the least probable above threshold)
3. Shift all tokens to override the remaining penalizable
4. Penalize and put them at the the bottom.
* Added XTC to `test-sampling`
* Simplified algorithm and more tests
* Updated info in common and args
* Merged back lost commits in common and arg
* Update dump info in common
* Fixed incorrect min_keep check
* Added XTC to README
* Renamed parameters, fixed info and defaults
* probability is at 0 by default, but XTC is included in sampling queue
* threshold higher than 0.5 switches XTC off
* Initial server support
* Added XTC to server UIs
* Fixed labels in old server UI
* Made algorithm safer and more readable
* Removed xtc_threshold_max
* Fixed arg after update
* Quick fixes by comments
* Simplified algorithm since threshold_max is removed
* Renamed random distribution
* Fixed tests and outdated README
* Small fixes
Copy file name to clipboardExpand all lines: examples/main/README.md
+13Lines changed: 13 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -241,6 +241,19 @@ The `--mirostat-ent` option sets the Mirostat target entropy (tau), which repres
241
241
242
242
Example usage: `--mirostat 2 --mirostat-lr 0.05 --mirostat-ent 3.0`
243
243
244
+
### XTC Sampling
245
+
246
+
-`--xtc-probability N`: Sets the chance for token removal (checked once on sampler start) (default: 0.0).
247
+
-`--xtc-threshold N`: Sets a minimum probability threshold for tokens to be removed (default: 0.1).
248
+
249
+
Exclude Top Choices (XTC) is a unique sampler that is designed to remove top tokens from consideration and avoid more obvious and repetitive outputs. With a chance of `xtc-probability` it searches for tokens with probabilities of `xtc-threshold` and above, then removes all such tokens except the least probable one.
250
+
251
+
By removing top tokens XTC can improve the variety of answers, break writing clichés and inhibit repition, since clichés and repeated phrases are usually more likely to appear. By keeping the last token above the threshold, XTC ensures that the answer is still coherent. XTC is meant to be used for creative tasks, but feel free to experiment with different settings for different models.
252
+
253
+
Being experimental and unique, XTC is disabled by default. The recommended combination of samplers is Min-P followed by XTC on its default settings: `--sampling-seq mx --min-p 0.02 --xtc-probability 0.5`.
254
+
255
+
Example usage: `--xtc-probability 0.5 --xtc-threshold 0.1`
256
+
244
257
### Logit Bias
245
258
246
259
-`-l TOKEN_ID(+/-)BIAS, --logit-bias TOKEN_ID(+/-)BIAS`: Modify the likelihood of a token appearing in the generated text completion.
Copy file name to clipboardExpand all lines: examples/server/public/index-new.html
+6Lines changed: 6 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -43,6 +43,8 @@
43
43
top_k: 0,// <= 0 to use vocab size
44
44
top_p: 1.0,// 1.0 = disabled
45
45
min_p: 0.05,// 0 = disabled; recommended for non-english: ~ 0.4
46
+
xtc_probability: 0.0,// 0 = disabled;
47
+
xtc_threshold: 0.1,// > 0.5 disables XTC;
46
48
tfs_z: 1.0,// 1.0 = disabled
47
49
typical_p: 1.0,// 1.0 = disabled
48
50
presence_penalty: 0.0,// 0.0 = disabled
@@ -836,6 +838,8 @@
836
838
${FloatField({label: "TFS-Z",title: "Activates tail-free sampling, a method used to limit the prediction of tokens that are too frequent. The parameter z controls the strength of this limitation. A value of 1.0 means that this function is deactivated.",max: 1.0,min: 0.0,name: "tfs_z",step: 0.01,value: params.value.tfs_z})}
837
839
${FloatField({label: "Frequency Penalty",title: "A penalty that is applied based on the frequency with which certain tokens occur in the training data set. A higher value results in rare tokens being favoured.",max: 1.0,min: 0.0,name: "frequency_penalty",step: 0.01,value: params.value.frequency_penalty})}
838
840
${FloatField({label: "Typical-P",title: "Activates local typical sampling, a method used to limit the prediction of tokens that are atypical in the current context. The parameter p controls the strength of this limitation. A value of 1.0 means that this function is deactivated.",max: 1.0,min: 0.0,name: "typical_p",step: 0.01,value: params.value.typical_p})}
841
+
${FloatField({label: "XTC probability",title: "Sets the chance for token removal (checked once on sampler start)",max: 1.0,min: 0.0,name: "xtc_probability",step: 0.01,value: params.value.xtc_probability})}
842
+
${FloatField({label: "XTC threshold",title: "Sets a minimum probability threshold for tokens to be removed",max: 0.5,min: 0.0,name: "xtc_threshold",step: 0.01,value: params.value.xtc_threshold})}
839
843
${IntField({label: "Min Keep",title: "If greater than 0, samplers are forced to return N possible tokens at minimum. Default is 0",max: 10,min: 0,name: "min_keep",value: params.value.min_keep})}
/// @details Mirostat 1.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
1105
1108
/// @param candidates A vector of `llama_token_data` containing the candidate tokens, their probabilities (p), and log-odds (logit) for the current position in the generated text.
1106
1109
/// @param tau The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
0 commit comments