Skip to content

Conversation

ikawrakow
Copy link
Contributor

For quantize-stats we get
q4_2: rmse 0.00159301, maxerr 0.17480469, 95pct<0.0030, median<0.0012

For 7B perplexity with BLAS enabled we get 6.2038 after 655 chunks.

Quantization is slow (~90 seconds on my Mac for 7B) as not multi-threaded as in PR #896.

For quantize-stats we get
q4_2: rmse 0.00159301, maxerr 0.17480469, 95pct<0.0030, median<0.0012

For 7B perplexity with BLAS enabled we get 6.2038 after 655 chunks.

Quantization is slow (~90 seconds on my Mac for 7B) as not
multi-threaded as in PR #896.
@ikawrakow ikawrakow requested a review from ggerganov April 19, 2023 15:15
Not sure why this makes them fail
@@ -1123,12 +1124,94 @@ static void quantize_row_q4_2_reference(const float * restrict x, block_q4_2 * r
}
}

static inline int nearest_int(float fval) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this inline does not do anything here. the static is all you need.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm actually, after looking at cppref, i am not sure that C and C++ are the same here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants