Skip to content

Remove NaturalOrderingManager #3799

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jeremystretch opened this issue Dec 27, 2019 · 12 comments · Fixed by #4122
Closed

Remove NaturalOrderingManager #3799

jeremystretch opened this issue Dec 27, 2019 · 12 comments · Fixed by #4122
Assignees
Labels
status: accepted This issue has been accepted for implementation type: feature Introduction of new functionality to the application

Comments

@jeremystretch
Copy link
Member

jeremystretch commented Dec 27, 2019

Proposed Changes

Replace NaturalOrderingManager with a simpler and more efficient approach: ordering first by string length and then by string. This article provides a concise example.

Justification

NaturalOrderingManager was introduced some time ago to effect the natural ordering of various models. (For example, ensure that Router10 appears in a list before Router2; the default alphabetic ordering does not do this). It works by splitting a field into its leading integers (if any), middle part, and trailing integers (if any), and ordering the three discrete values independently.

Although this approach works, it incurs significant performance penalty, which is visible when inspecting SQL queries. Casual testing with the Device model saw a drop from ~66ms to ~7ms per query using the length-based approach. Further, ordering is removed entirely during COUNT() queries as we are no longer modifying the default queryset returned by the manager.

I believe the only drawback to this approach is that we'll lose natural ordering on integer prefixes, which seems reasonable given that it's not a very common requirement, and ordering on trailing integers is presumably a much more common use case.

@jeremystretch jeremystretch added the status: under review Further discussion is needed to determine this issue's scope and/or implementation label Dec 27, 2019
@steffann
Copy link
Contributor

I like the idea, but it doesn't work in some common cases. For example:

sander=# select * from ordering_test1 order by length(alnum), alnum;
   alnum    |  num
------------+--------
 ge-1/0/2   |  10002
 ge-2/0/2   |  20002
 ge-0/10/0  |   1000
 ge-1/0/10  |  10010
 ge-1/10/0  |  11000

Besides: Juniper EX and QFX change the interface name based on the type of transceiver, so there will be a mix of xe-0/0/0', ge-0/0/1` etc. That will not be sorted properly.

For Cisco gear with both GigabitEthernet1/1 and TenGigabitEthernet1/1 the new ordering would be nice though :)

@steffann
Copy link
Contributor

How about creating an extra field called _sortable_name that transforms the interface name to something the database can easily sort on?

For example left-padding every sequence of integers to a length of 4 or 5 digits (anybody has interface numbers >99999?). Or an ArrayField like:

sander=# select * from ordering_test1 order by arr;
   alnum   |  num  |   arr
-----------+-------+----------
 ge-0/0    |     0 | {0,0}
 ge-0/10/0 |  1000 | {0,10,0}
 ge-1/0/2  | 10002 | {1,0,2}
 ge-1/0/10 | 10010 | {1,0,10}
 ge-1/10/0 | 11000 | {1,10,0}
 ge-2/0/2  | 20002 | {2,0,2}
 ge-3/0    |  3000 | {3,0}

It can be automatically populated on save() and after that sorting on it wouldn't cost any performance.

@jeremystretch
Copy link
Member Author

NaturalOrderingManager isn't used for interfaces; the Interface model has its own jumble of regex that it uses (see #3097). This is for generic natural ordering.

@steffann
Copy link
Contributor

/me was confused. Please ignore 🙂

@jeremystretch
Copy link
Member Author

Come up with a solution for #3097 and all is forgiven! 😆

@jeremystretch jeremystretch added status: accepted This issue has been accepted for implementation type: feature Introduction of new functionality to the application and removed status: under review Further discussion is needed to determine this issue's scope and/or implementation labels Jan 3, 2020
@jeremystretch
Copy link
Member Author

I have the code for this done in the 3799-natural-ordering branch. However, I've run into a suspected bug in Django that causes an exception when ordering by a ForeignKey to a model which orders by a function (e.g. Length()). I've refrained from submitting a PR since the code as it currently stands breaks several views.

This is on hold until I hear back regarding the bug report linked above. Worst case, it should be possible to work around this by cooking up a custom wrapper that provides a __getitem__ method to avoid the exception.

@jeremystretch
Copy link
Member Author

jeremystretch commented Jan 3, 2020

After poking at this some more, it unfortunately doesn't seem that we can work around it with a wrapper. However, we can fall back to referencing each parent model's ordering explicitly. For instance:

    ordering = (Length('device__name'), 'device__name', Length('name'), 'name')

However, great would be needed to ensure the child's ordering for the parent never deviates from the parent's native ordering.

@jeremystretch
Copy link
Member Author

After some experimentation, I haven't been able to come up with a suitable workaround. This is a confirmed bug that has been resolved in Django 3.0 but is not eligible for back-porting to Django 2.2.

Marking this as blocked by #3848.

@jeremystretch jeremystretch added status: blocked Another issue or external requirement is preventing implementation and removed status: accepted This issue has been accepted for implementation labels Jan 6, 2020
@candlerb
Copy link
Contributor

(For example, ensure that Router10 appears in a list before Router2; the default alphabetic ordering does not do this)

Presumably this should say "after" not "before".

However, unless I've missed something, I don't think that sorting by length and then by text works at all. Consider the following example:

  • bar1
  • foo1
  • bar10
  • foo10
  • switch1
  • firewall1

That list is sorted by length first, then by text. However, I would expect "firewall1" to become before "switch1", and also before "foo". I would also expect "bar10" to come before "foo1".

To optimise sorting without explicitly splitting the device name into two parts then index expressions are probably the best approach. But I think you'd need an expression which separates the input into (text part, number part) as a composite type rather than just using the length.

@jeremystretch
Copy link
Member Author

I don't recall exactly what I was doing earlier, but it seemed to work at the time. It's likely I didn't have sufficiently diverse test data.

This should be a common enough problem that I'd expect to find plenty of existing solutions for reference. Might just need to dig a bit more.

@jeremystretch
Copy link
Member Author

After poking at this some more, I think our best chance at an efficient, maintainable solution will be to normalize the sort field at write time and store that value in a separate column. For example, Router12 would become router00000012, which would get stored in a field separate from name.

While this is trivial to achieve inside a model's save() method, I would like to preserve create() and bulk_create() functionality to allow for better performance when populating test data. I haven't tried it yet, but we should be able to leverage pre_save() on the sort field to populate the normalized value on demand.

I envision something like this:

class NaturalSortField(CharField):
    def pre_save(model_instance, add):
        return naturalize(getattr(model_instance, self.source_field))

class Device(Model):
    name = models.CharField(max_length=64)
    _sort = models.NaturalSortField(source_field='name')

This would require a one-time migration to calculate all normalized values in bulk, but that should not be a disruptive process (we've performed similar operations in prior releases).

@jeremystretch jeremystretch added status: accepted This issue has been accepted for implementation and removed status: blocked Another issue or external requirement is preventing implementation labels Feb 4, 2020
@jeremystretch jeremystretch self-assigned this Feb 7, 2020
@jeremystretch
Copy link
Member Author

Testing of the initial implementation is very promising:

$ git checkout develop
$ time curl -s http://localhost:8000/api/dcim/interfaces/ > /dev/null
real	0m8.958s
user	0m0.005s
sys	0m0.005s
$ git checkout 3799-natural-ordering-field
$ time curl -s http://localhost:8000/api/dcim/interfaces/ > /dev/null
real	0m0.896s
user	0m0.006s
sys	0m0.003s

I was also able to fully replicate the existing ordering logic (at least according to the tests we have in place).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
status: accepted This issue has been accepted for implementation type: feature Introduction of new functionality to the application
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants