Skip to content

Improve performance of lazy validation message formatting #6709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 29, 2019

Conversation

bluetech
Copy link
Contributor

Replaces #6644.

As a fix for issue #3354, commit 607e4ed made the
evaluation of some validation error messages lazy. To achieve that,
Django's django.utils.functional.lazy() function was used. However, that
function is extremely heavy and slow, and slows down string validation
significantly (lazy() is evaluated each time for each validator for each
field which has one). We noticed this in our production system.

Use a custom lazy_format() object instead which does the formatting
lazily with less overhead.

Using the benchmark attached to the PR (snipped to tottime>100ms):

Before, model serializer:

         9225123 function calls (9200068 primitive calls) in 8.337 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    25000    1.299    0.000    3.534    0.000 functional.py:125(__prepare_class__)
  2415001    0.954    0.000    0.954    0.000 {built-in method builtins.hasattr}
  1350000    0.901    0.000    0.901    0.000 functional.py:145(__promise__)
  1550001    0.521    0.000    0.521    0.000 {built-in method builtins.setattr}
    25000    0.494    0.000    0.735    0.000 {built-in method builtins.__build_class__}
    30000    0.298    0.000    0.385    0.000 fields.py:297(__init__)
    25000    0.289    0.000    5.895    0.000 fields.py:740(__init__)
    25000    0.264    0.000    0.802    0.000 field_mapping.py:66(get_field_kwargs)
    25000    0.241    0.000    0.241    0.000 functional.py:100(__proxy__)
670003/670001    0.203    0.000    0.211    0.000 {built-in method builtins.getattr}
     5000    0.189    0.000    7.722    0.002 serializers.py:990(get_fields)
    25000    0.186    0.000    0.400    0.000 functools.py:186(total_ordering)
    25000    0.158    0.000    0.299    0.000 functional.py:234(wrapper)
     5000    0.129    0.000    0.136    0.000 serializers.py:1066(get_field_names)
    25000    0.104    0.000    1.002    0.000 serializers.py:1195(build_standard_field)

After, model serializer:

         3265096 function calls (3240059 primitive calls) in 2.645 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    30000    0.237    0.000    0.315    0.000 fields.py:295(__init__)
    25000    0.218    0.000    0.639    0.000 field_mapping.py:66(get_field_kwargs)
    25000    0.214    0.000    0.665    0.000 fields.py:743(__init__)
     5000    0.156    0.000    2.086    0.000 serializers.py:988(get_fields)
    25000    0.107    0.000    0.210    0.000 functional.py:234(wrapper)

Before, regular serializer:

         8060003 function calls (7960003 primitive calls) in 7.123 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    25000    1.569    0.000    3.897    0.000 functional.py:125(__prepare_class__)
  1350000    1.013    0.000    1.013    0.000 functional.py:145(__promise__)
  2365000    0.925    0.000    0.925    0.000 {built-in method builtins.hasattr}
  1550000    0.512    0.000    0.512    0.000 {built-in method builtins.setattr}
    25000    0.378    0.000    0.550    0.000 {built-in method builtins.__build_class__}
    25000    0.307    0.000    5.946    0.000 fields.py:740(__init__)
    30000    0.277    0.000    0.360    0.000 fields.py:297(__init__)
80000/5000    0.202    0.000    6.526    0.001 copy.py:132(deepcopy)
    25000    0.172    0.000    0.172    0.000 functional.py:100(__proxy__)
   540000    0.152    0.000    0.152    0.000 {built-in method builtins.getattr}
    25000    0.119    0.000    6.199    0.000 fields.py:604(__deepcopy__)

After, regular serializer:

         2150003 function calls (2050003 primitive calls) in 1.609 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    30000    0.224    0.000    0.293    0.000 fields.py:295(__init__)
    25000    0.181    0.000    0.607    0.000 fields.py:743(__init__)
80000/5000    0.151    0.000    1.074    0.000 copy.py:132(deepcopy)
    25000    0.102    0.000    0.819    0.000 fields.py:607(__deepcopy__)

@bluetech
Copy link
Contributor Author

For future reference, same benchmark was used: https://gist.github.com/bluetech/cdd3b3d6d9d8cfc7b1dfee3c6482679d

Copy link
Member

@rpkilby rpkilby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this approach. Aside from the suggestion, I think this is good to go.

@rpkilby rpkilby added this to the 3.10 Release milestone May 23, 2019
As a fix for issue encode#3354, commit 607e4ed made the
evaluation of some validation error messages lazy. To achieve that,
Django's django.utils.functional.lazy() function was used. However, that
function is extremely heavy and slow, and slows down string validation
significantly (lazy() is evaluated each time for each validator for each
field which has one). We noticed this in our production system.

Use a custom lazy_format() object instead which does the formatting
lazily with less overhead.

Using the benchmark attached to the PR (snipped to tottime>100ms):

Before, model serializer:

         9225123 function calls (9200068 primitive calls) in 8.337 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    25000    1.299    0.000    3.534    0.000 functional.py:125(__prepare_class__)
  2415001    0.954    0.000    0.954    0.000 {built-in method builtins.hasattr}
  1350000    0.901    0.000    0.901    0.000 functional.py:145(__promise__)
  1550001    0.521    0.000    0.521    0.000 {built-in method builtins.setattr}
    25000    0.494    0.000    0.735    0.000 {built-in method builtins.__build_class__}
    30000    0.298    0.000    0.385    0.000 fields.py:297(__init__)
    25000    0.289    0.000    5.895    0.000 fields.py:740(__init__)
    25000    0.264    0.000    0.802    0.000 field_mapping.py:66(get_field_kwargs)
    25000    0.241    0.000    0.241    0.000 functional.py:100(__proxy__)
670003/670001    0.203    0.000    0.211    0.000 {built-in method builtins.getattr}
     5000    0.189    0.000    7.722    0.002 serializers.py:990(get_fields)
    25000    0.186    0.000    0.400    0.000 functools.py:186(total_ordering)
    25000    0.158    0.000    0.299    0.000 functional.py:234(wrapper)
     5000    0.129    0.000    0.136    0.000 serializers.py:1066(get_field_names)
    25000    0.104    0.000    1.002    0.000 serializers.py:1195(build_standard_field)

After, model serializer:

         3265096 function calls (3240059 primitive calls) in 2.645 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    30000    0.237    0.000    0.315    0.000 fields.py:295(__init__)
    25000    0.218    0.000    0.639    0.000 field_mapping.py:66(get_field_kwargs)
    25000    0.214    0.000    0.665    0.000 fields.py:743(__init__)
     5000    0.156    0.000    2.086    0.000 serializers.py:988(get_fields)
    25000    0.107    0.000    0.210    0.000 functional.py:234(wrapper)

Before, regular serializer:

         8060003 function calls (7960003 primitive calls) in 7.123 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    25000    1.569    0.000    3.897    0.000 functional.py:125(__prepare_class__)
  1350000    1.013    0.000    1.013    0.000 functional.py:145(__promise__)
  2365000    0.925    0.000    0.925    0.000 {built-in method builtins.hasattr}
  1550000    0.512    0.000    0.512    0.000 {built-in method builtins.setattr}
    25000    0.378    0.000    0.550    0.000 {built-in method builtins.__build_class__}
    25000    0.307    0.000    5.946    0.000 fields.py:740(__init__)
    30000    0.277    0.000    0.360    0.000 fields.py:297(__init__)
80000/5000    0.202    0.000    6.526    0.001 copy.py:132(deepcopy)
    25000    0.172    0.000    0.172    0.000 functional.py:100(__proxy__)
   540000    0.152    0.000    0.152    0.000 {built-in method builtins.getattr}
    25000    0.119    0.000    6.199    0.000 fields.py:604(__deepcopy__)

After, regular serializer:

         2150003 function calls (2050003 primitive calls) in 1.609 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    30000    0.224    0.000    0.293    0.000 fields.py:295(__init__)
    25000    0.181    0.000    0.607    0.000 fields.py:743(__init__)
80000/5000    0.151    0.000    1.074    0.000 copy.py:132(deepcopy)
    25000    0.102    0.000    0.819    0.000 fields.py:607(__deepcopy__)
@tomchristie
Copy link
Member

@rpkilby Yeah same. I'm going to leave the final okay with @carltongibson since he's managing the 3.10 release, but I'm happy.

Nice one, @bluetech!

@carltongibson
Copy link
Collaborator

Argh! Don't do that. 🙂

I'm quietly getting knocked sideways by life over here, so any help getting the 3.10 issues in is very welcome.

I've got the schema docs to finish, which I'm on, but beyond that I keep seeing the number of things to pull in, vs the time I'm able to make for it, and I'm starting to panic a little.

(Was going to say that in private, but TBH, given how OSS works, I think these things are better off said.)

Quick look at this: it's seems fine, but I haven't looked deeply. Happy to trust @rpkilby's review.

@rpkilby rpkilby changed the title Avoid Django's lazy() when creating validators because it is too slow (v2) Improve performance of lazy validation message formatting May 29, 2019
@rpkilby rpkilby merged commit c2293e9 into encode:master May 29, 2019
@johnthagen
Copy link
Contributor

Just wanted to say that this fix improved the speed of some deeply nested serializers in our project by 60%! ❤️ Huge real world improvement in DRF 3.10.

Thank you all!

fdintino pushed a commit to theatlantic/django-rest-framework that referenced this pull request Aug 3, 2020
pchiquet pushed a commit to pchiquet/django-rest-framework that referenced this pull request Nov 17, 2020
sigvef pushed a commit to sigvef/django-rest-framework that referenced this pull request Dec 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants