-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
Avoid temporary allocation in dec_as_long() #129275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
According to the documentation: "If rdata is non-NULL, it MUST be allocated by one of libmpdec’s allocation functions and rlen MUST be correct. If necessary, the function will resize rdata. Resizing is slow and should not occur if rlen has been obtained by a call to mpd_sizeinbase." So, possible resizing in mpd_qexport_u32/16() is for guarding against broken log10() implementations (log10 is used in the mpd_sizeinbase()).
We should look not only at the documentation, but also at what they do in the code to prevent reallocation. Has their code changed since then?
This may not work for ndigits > On other hand, for small integers (but outside of 64-bit range, which is already optimized) we can use a fixed size array allocated at stack. |
If we can't rely on the documentation - I don't think we should add optimizations, based on current code behavior. Docs says: "Resizing is slow and should not occur if rlen has been obtained by a call to mpd_sizeinbase." Are you insist on something more strict like "and must not occur"? If so, we could ask Stefan to adjust docs, I think it's something he actually meant here.
Such cases we can filter out like mpd_sizeinbase does: /* ceil(2711437152599294 / log10(2)) + 4 == 2**53 */
if (digits > 2711437152599294ULL) {
return SIZE_MAX;
} The mpd_qexport_u32() returns now SIZE_MAX in such cases, that means a memory error on our side.
I'll benchmark this. |
@serhiy-storchaka, are you talking about using export functions with In any case, this looks much more complex than proposed approach (which uses dedicated library API). |
The former. This is a pretty small change. |
@serhiy-storchaka, I don't think so. It's essentially a third path in the code flow: you need to estimate size (how, if you don't trust the mpd_sizeinbase?), then use export functions with nonzero rdata, then memcpy that to digits array. I doubt this worth code complications. I'm going to close this, unless someone else want to take over this issue. |
Something like this: diff --git a/Modules/_decimal/_decimal.c b/Modules/_decimal/_decimal.c
index 602b23cfca8..deda32f26d6 100644
--- a/Modules/_decimal/_decimal.c
+++ b/Modules/_decimal/_decimal.c
@@ -3719,21 +3719,29 @@ dec_as_long(PyObject *dec, PyObject *context, int round)
libmpdec’s allocation functions" or "rlen MUST be correct" (to avoid
reallocation). This can be further optimized by using rlen from
mpd_sizeinbase(). See gh-127925. */
- void *tmp_digits = NULL;
- size_t n;
+ char buffer[100];
+ void *tmp_digits = buffer;
+ size_t n, size = mpd_sizeinbase(x, base);
+ if (size >= sizeof(buffer) / layout->digit_size) {
+ tmp_digits = NULL;
+ size = 0;
+ }
status = 0;
if (layout->digit_size == 4) {
- n = mpd_qexport_u32((uint32_t **)&tmp_digits, 0, base, x, &status);
+ n = mpd_qexport_u32((uint32_t **)&tmp_digits, size, base, x, &status);
}
else {
- n = mpd_qexport_u16((uint16_t **)&tmp_digits, 0, base, x, &status);
+ n = mpd_qexport_u16((uint16_t **)&tmp_digits, size, base, x, &status);
}
+ assert(!size || tmp_digits == buffer);
if (n == SIZE_MAX) {
PyErr_NoMemory();
mpd_del(x);
- mpd_free(tmp_digits);
+ if (tmp_digits != buffer) {
+ mpd_free(tmp_digits);
+ }
return NULL;
}
@@ -3742,11 +3750,15 @@ dec_as_long(PyObject *dec, PyObject *context, int round)
mpd_del(x);
if (writer == NULL) {
- mpd_free(tmp_digits);
+ if (tmp_digits != buffer) {
+ mpd_free(tmp_digits);
+ }
return NULL;
}
memcpy(digits, tmp_digits, layout->digit_size*n);
- mpd_free(tmp_digits);
+ if (tmp_digits != buffer) {
+ mpd_free(tmp_digits);
+ }
return PyLongWriter_Finish(writer);
}
We only need to find the threshold. We can trust |
This is something I meant (my version was roughly same code, except for buffer size and naming things). And:
Your version also miss memset (to zero). With it's addition my benchmark results:
So, it's a kinda works, but with more code. # bench_Decimal-to-int.py
import pyperf
from decimal import Decimal
values = ['1<<7', '1<<38', '1<<50', '1<<100', '1<<300',
'1<<500', '1<<1000', '1<<3000']
runner = pyperf.Runner()
for v in values:
d = Decimal(eval(v))
bn = 'int(Decimal('+v+'))'
runner.bench_func(bn, int, d)
I'm not sure. Though, I can imagine that such broken log10 is a much less probable case. @serhiy-storchaka, what do you think on using a special code in mpd_sizeinbase to handle 2-bases? If we will have a correct lower bound for size - we can rely on Stefan's reply: "Resizing is for guarding against broken log10() implementations." Here is the patch for the bundled copy, please note the guard for digits size in CONFIG_64 branch: diff --git a/Modules/_decimal/libmpdec/mpdecimal.c b/Modules/_decimal/libmpdec/mpdecimal.c
index 959934bda7..c7fb18d49a 100644
--- a/Modules/_decimal/libmpdec/mpdecimal.c
+++ b/Modules/_decimal/libmpdec/mpdecimal.c
@@ -8100,23 +8100,29 @@ mpd_sizeinbase(const mpd_t *a, uint32_t base)
#ifdef CONFIG_64
/* ceil(2711437152599294 / log10(2)) + 4 == 2**53 */
if (digits > 2711437152599294ULL) {
return SIZE_MAX;
}
upper_bound = (double)((1ULL<<53)-1);
#else
upper_bound = (double)(SIZE_MAX-1);
#endif
- x = (double)digits / log10(base);
+ if ((base & (base - 1)) == 0) {
+ uint32_t shift = _Py_bit_length(base) - 1;
+ x = 3.321928094887363*((digits + shift - 1)/shift);
+ }
+ else {
+ x = (double)digits / log10(base);
+ }
return (x > upper_bound) ? SIZE_MAX : (size_t)x + 1;
}
/* Space needed to import a base 'base' integer of length 'srclen'. */
static mpd_ssize_t
_mpd_importsize(size_t srclen, uint32_t base)
{
double x;
double upper_bound;
assert(srclen > 0); |
Feature or enhancement
Proposal:
After #127925, new code in the decimal module (using the PEP 757) do temporary allocation with mpdecimal's memory functions (in mpd_qexport_u32/16()), just as before:
cpython/Modules/_decimal/_decimal.c
Lines 3691 to 3706 in f10b7b2
According to the documentation, we can prepare array of digits, which size could be estimated by mpd_sizeinbase(). Given this, the mpd_qexport_u32/16() shouldn't do any resize for this array. Here is Stefan comment on how safe this approach is:
We can expect slight performance boost for small (2-3 digits) integers.
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
No response
Linked PRs
The text was updated successfully, but these errors were encountered: