Skip to content

Avoid temporary allocation in dec_as_long() #129275

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
skirpichev opened this issue Jan 25, 2025 · 7 comments
Open

Avoid temporary allocation in dec_as_long() #129275

skirpichev opened this issue Jan 25, 2025 · 7 comments
Assignees
Labels
extension-modules C modules in the Modules dir pending The issue will be closed if no feedback is provided type-feature A feature request or enhancement

Comments

@skirpichev
Copy link
Member

skirpichev commented Jan 25, 2025

Feature or enhancement

Proposal:

After #127925, new code in the decimal module (using the PEP 757) do temporary allocation with mpdecimal's memory functions (in mpd_qexport_u32/16()), just as before:

uint32_t base = (uint32_t)1 << layout->bits_per_digit;
/* We use a temporary buffer for digits for now, as for nonzero rdata
mpd_qexport_u32/u16() require either space "allocated by one of
libmpdec’s allocation functions" or "rlen MUST be correct" (to avoid
reallocation). This can be further optimized by using rlen from
mpd_sizeinbase(). See gh-127925. */
void *tmp_digits = NULL;
size_t n;
status = 0;
if (layout->digit_size == 4) {
n = mpd_qexport_u32((uint32_t **)&tmp_digits, 0, base, x, &status);
}
else {
n = mpd_qexport_u16((uint16_t **)&tmp_digits, 0, base, x, &status);
}

According to the documentation, we can prepare array of digits, which size could be estimated by mpd_sizeinbase(). Given this, the mpd_qexport_u32/16() shouldn't do any resize for this array. Here is Stefan comment on how safe this approach is:

mpd_sizeinbase() uses log10() from math.h for performance reasons. If log10() is IEEE compliant, the result should be sufficiently large. Resizing is for guarding against broken log10() implementations.

The current code in _decimal.c sets the libmpdec allocation functions to PyMem_Malloc() etc. So if longobject uses PyMem_Free() it is safe even when resizing occurs.

If the new API allows for allocators other that PyMem_Malloc(), it will rely on the IEEE compliance of log10().

Stefan Krah

We can expect slight performance boost for small (2-3 digits) integers.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

@skirpichev skirpichev added extension-modules C modules in the Modules dir type-feature A feature request or enhancement labels Jan 25, 2025
@skirpichev skirpichev self-assigned this Jan 25, 2025
skirpichev added a commit to skirpichev/cpython that referenced this issue Feb 4, 2025
According to the documentation: "If rdata is non-NULL, it MUST be
allocated by one of libmpdec’s allocation functions and rlen MUST be
correct.  If necessary, the function will resize rdata.  Resizing is
slow and should not occur if rlen has been obtained by a call to
mpd_sizeinbase."

So, possible resizing in mpd_qexport_u32/16() is for guarding against
broken log10() implementations (log10 is used in the mpd_sizeinbase()).
@skirpichev skirpichev removed their assignment Feb 4, 2025
@serhiy-storchaka
Copy link
Member

We should look not only at the documentation, but also at what they do in the code to prevent reallocation. Has their code changed since then?

(size_t)(3.321928094887363*((ndigits + shift - 1)/shift))

This may not work for ndigits > 2**53 or close. And your benchmarks show no difference for large ndigits.

On other hand, for small integers (but outside of 64-bit range, which is already optimized) we can use a fixed size array allocated at stack.

@skirpichev
Copy link
Member Author

We should look not only at the documentation, but also at what they do in the code to prevent reallocation.

If we can't rely on the documentation - I don't think we should add optimizations, based on current code behavior.

Docs says: "Resizing is slow and should not occur if rlen has been obtained by a call to mpd_sizeinbase." Are you insist on something more strict like "and must not occur"? If so, we could ask Stefan to adjust docs, I think it's something he actually meant here.

This may not work for ndigits > 2**53 or close.

Such cases we can filter out like mpd_sizeinbase does:

    /* ceil(2711437152599294 / log10(2)) + 4 == 2**53 */
    if (digits > 2711437152599294ULL) {
        return SIZE_MAX;
    }

The mpd_qexport_u32() returns now SIZE_MAX in such cases, that means a memory error on our side.

On other hand, for small integers (but outside of 64-bit range, which is already optimized) we can use a fixed size array allocated at stack.

I'll benchmark this.

@skirpichev skirpichev self-assigned this May 13, 2025
@skirpichev
Copy link
Member Author

skirpichev commented May 17, 2025

we can use a fixed size array allocated at stack.

@serhiy-storchaka, are you talking about using export functions with rdata!=NULL only for small rlen (say, < 15)? Or about using different allocation functions in libmpdec in this case?

In any case, this looks much more complex than proposed approach (which uses dedicated library API).

@serhiy-storchaka
Copy link
Member

The former. This is a pretty small change.

@skirpichev
Copy link
Member Author

This is a pretty small change.

@serhiy-storchaka, I don't think so. It's essentially a third path in the code flow: you need to estimate size (how, if you don't trust the mpd_sizeinbase?), then use export functions with nonzero rdata, then memcpy that to digits array.

I doubt this worth code complications. I'm going to close this, unless someone else want to take over this issue.

@skirpichev skirpichev added the pending The issue will be closed if no feedback is provided label May 17, 2025
@serhiy-storchaka
Copy link
Member

Something like this:

diff --git a/Modules/_decimal/_decimal.c b/Modules/_decimal/_decimal.c
index 602b23cfca8..deda32f26d6 100644
--- a/Modules/_decimal/_decimal.c
+++ b/Modules/_decimal/_decimal.c
@@ -3719,21 +3719,29 @@ dec_as_long(PyObject *dec, PyObject *context, int round)
        libmpdec’s allocation functions" or "rlen MUST be correct" (to avoid
        reallocation).  This can be further optimized by using rlen from
        mpd_sizeinbase().  See gh-127925. */
-    void *tmp_digits = NULL;
-    size_t n;
+    char buffer[100];
+    void *tmp_digits = buffer;
+    size_t n, size = mpd_sizeinbase(x, base);
+    if (size >= sizeof(buffer) / layout->digit_size) {
+        tmp_digits = NULL;
+        size = 0;
+    }
 
     status = 0;
     if (layout->digit_size == 4) {
-        n = mpd_qexport_u32((uint32_t **)&tmp_digits, 0, base, x, &status);
+        n = mpd_qexport_u32((uint32_t **)&tmp_digits, size, base, x, &status);
     }
     else {
-        n = mpd_qexport_u16((uint16_t **)&tmp_digits, 0, base, x, &status);
+        n = mpd_qexport_u16((uint16_t **)&tmp_digits, size, base, x, &status);
     }
+    assert(!size || tmp_digits == buffer);
 
     if (n == SIZE_MAX) {
         PyErr_NoMemory();
         mpd_del(x);
-        mpd_free(tmp_digits);
+        if (tmp_digits != buffer) {
+            mpd_free(tmp_digits);
+        }
         return NULL;
     }
 
@@ -3742,11 +3750,15 @@ dec_as_long(PyObject *dec, PyObject *context, int round)
 
     mpd_del(x);
     if (writer == NULL) {
-        mpd_free(tmp_digits);
+        if (tmp_digits != buffer) {
+            mpd_free(tmp_digits);
+        }
         return NULL;
     }
     memcpy(digits, tmp_digits, layout->digit_size*n);
-    mpd_free(tmp_digits);
+    if (tmp_digits != buffer) {
+        mpd_free(tmp_digits);
+    }
     return PyLongWriter_Finish(writer);
 }
 

We only need to find the threshold. We can trust mpd_sizeinbase() for small values. Or reserve a digit for greater certainty.

@skirpichev
Copy link
Member Author

This is something I meant (my version was roughly same code, except for buffer size and naming things). And:

$ git diff master..dec_as_long-nocopy/129275 --stat  # proposal in my pr
 Modules/_decimal/_decimal.c | 36 +++++++++++++++++-------------------
 1 file changed, 17 insertions(+), 19 deletions(-)
$ git diff master..patch4 --stat  # your version
 Modules/_decimal/_decimal.c | 26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

Your version also miss memset (to zero). With it's addition my benchmark results:

Benchmark ref patch patch4
int(Decimal(1<<7)) 487 ns 496 ns: 1.02x slower 497 ns: 1.02x slower
int(Decimal(1<<38)) 559 ns 569 ns: 1.02x slower 577 ns: 1.03x slower
int(Decimal(1<<50)) 562 ns 576 ns: 1.02x slower 573 ns: 1.02x slower
int(Decimal(1<<100)) 1.06 us 945 ns: 1.12x faster 990 ns: 1.07x faster
int(Decimal(1<<300)) 2.25 us 2.15 us: 1.04x faster 2.18 us: 1.03x faster
int(Decimal(1<<500)) 4.45 us 4.32 us: 1.03x faster 4.40 us: 1.01x faster
int(Decimal(1<<1000)) 14.2 us 14.1 us: 1.01x faster 14.3 us: 1.00x slower
int(Decimal(1<<3000)) 116 us 116 us: 1.00x slower 116 us: 1.00x slower
Geometric mean (ref) 1.02x faster 1.00x faster

So, it's a kinda works, but with more code.

# bench_Decimal-to-int.py

import pyperf
from decimal import Decimal

values = ['1<<7', '1<<38', '1<<50', '1<<100', '1<<300',
          '1<<500', '1<<1000', '1<<3000']

runner = pyperf.Runner()
for v in values:
    d = Decimal(eval(v))
    bn = 'int(Decimal('+v+'))'
    runner.bench_func(bn, int, d)

We can trust mpd_sizeinbase() for small values.

I'm not sure. Though, I can imagine that such broken log10 is a much less probable case.

@serhiy-storchaka, what do you think on using a special code in mpd_sizeinbase to handle 2-bases? If we will have a correct lower bound for size - we can rely on Stefan's reply: "Resizing is for guarding against broken log10() implementations." Here is the patch for the bundled copy, please note the guard for digits size in CONFIG_64 branch:

diff --git a/Modules/_decimal/libmpdec/mpdecimal.c b/Modules/_decimal/libmpdec/mpdecimal.c
index 959934bda7..c7fb18d49a 100644
--- a/Modules/_decimal/libmpdec/mpdecimal.c
+++ b/Modules/_decimal/libmpdec/mpdecimal.c
@@ -8100,23 +8100,29 @@ mpd_sizeinbase(const mpd_t *a, uint32_t base)
 #ifdef CONFIG_64
     /* ceil(2711437152599294 / log10(2)) + 4 == 2**53 */
     if (digits > 2711437152599294ULL) {
         return SIZE_MAX;
     }
 
     upper_bound = (double)((1ULL<<53)-1);
 #else
     upper_bound = (double)(SIZE_MAX-1);
 #endif
 
-    x = (double)digits / log10(base);
+    if ((base & (base - 1)) == 0) {
+        uint32_t shift = _Py_bit_length(base) - 1;
+        x = 3.321928094887363*((digits + shift - 1)/shift);
+    }
+    else {
+        x = (double)digits / log10(base);
+    }
     return (x > upper_bound) ? SIZE_MAX : (size_t)x + 1;
 }
 
 /* Space needed to import a base 'base' integer of length 'srclen'. */
 static mpd_ssize_t
 _mpd_importsize(size_t srclen, uint32_t base)
 {
     double x;
     double upper_bound;
 
     assert(srclen > 0);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension-modules C modules in the Modules dir pending The issue will be closed if no feedback is provided type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants