Skip to content

Optimize performance for queries on recursive relations #6513

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
noahsilas opened this issue Mar 15, 2020 · 7 comments
Closed

Optimize performance for queries on recursive relations #6513

noahsilas opened this issue Mar 15, 2020 · 7 comments

Comments

@noahsilas
Copy link
Contributor

While investigating parse-server performance, we're noticing a lot of time spent doing queries against the _Role and _Join:roles:_Role collections. One interesting feature of our application is that many of our users have roles that have several layers of children, which makes this a good candidate for optimization, and I think I've found a target.

In Auth._getAllRolesNamesForRoleIds, we follow a strategy that at a high level looks sort of like this:

class Auth {
  ...
  
  async function _getAllRolesNamesForRoleIds(ids) {
    const seen = new Set();
    const names = new Set();
    let horizon = [ids];
    
    while (horizon.length) {
      horizon.forEach((id) => seen.add(id));

      const relatedRoles = horizon.map(id => Parse.Role.createWithoutData(id));
      const owningRoles = await new Parse.Query(Role)
        .containedIn('roles', relatedRoles)
        .find();

      owningRoles.forEach((role) => {
        names.add(role.get('name'));
      });

      horizon = owningRoles
        .map(role => role.id)
        .filter(id => !seen.has(id));
    }
    
    return [...names];
  }
}

One thing that is somewhat hidden here is that querying the Role collection through the relation actually translates to two queries: one against the _Role collection, and one against the _Join:roles:_Role collection. This means that the number of queries needed for a user with a role that has depth of N is going to be roughly 2N.

We could improve on this by recognizing that we don't actually need to fetch the intermediate roles separately - we can follow the pointers in the join table directly, which would look something like this (hypothetical mongo version inlined here):

class Auth {
  ...
  
  async function _getAllRolesNamesForRoleIds(ids) {
    const seen = new Set();
    let horizon = [ids];
    
    while (horizon.length) {
      horizon.forEach((id) => seen.add(id));
      
      horizon = await mongo['_Join:roles:_Role']
        .find({ relatedId: { $in: horizon } })
        .map(join => join.owningId)
        .filter(id => !seen.has(id));
    }

    const roles = await new Parse.Query(Parse.Role)
      .select('name')
      .containedIn('objectId', [...seen]);

    return [roles.map(r => r.get('name'))];
  }
}

By doing our recursive step directly through the Join table, we can cut the number of queries down to N+1 (N queries against the join table, and one final query against the Roles table).

I'm starting to take a look at how I could implement something like this, and finding that it's a bit tricky to work through the relations query paths. I was wondering if anyone has suggestions about how we could implement this?

My initial thought is that I could add some methods to the database adapters to directly access join tables, and then pass a reference to the database adapter into the auth component to take advantage of those new methods.

Another option would be to try to add support for this kind of query directly to RestQuery and the JS SDK (which the auth component uses to do these queries). This potentially supports client applications that want a similar optimization on tree-like data stores (category trees, for instance). That would be less invasive in the Auth component, but entail larger changes in the public API. I'm sort of imagining something like role.relation("roles").query().recursive().find(), but I haven't thought this through at all yet.

I'd love to hear thoughts about approaches to this from the team, or if you think that this is just a bad idea for some reason I haven't thought of yet. Thanks!

@acinader
Copy link
Contributor

are you bypassing acl's with this approach?

@omairvaiyani
Copy link
Contributor

Are role ACLs always necessary to check? For example, if we're getting the roles for the purpose of gathering their role profile, then I can't imagine we need to filter by ACL, as we're not getting them on behalf of the user?

@noahsilas
Copy link
Contributor Author

It looks to me like all of the queries for Roles in Auth today are using the master key, and so aren't restricted by ACL. I'm still getting acquainted with that code though, would love pointers if I'm making a mistake here!

@omairvaiyani's comment does match my intuition though. It seems to me that adding a user to a Role shouldn't necessitate that the user matches the Role's read ACL. This should be easy to check, I can give it a go later (and maybe write a test case for it if we don't already have one)

@omairvaiyani
Copy link
Contributor

omairvaiyani commented Mar 18, 2020

I think $graphLookup - https://docs.mongodb.com/manual/reference/operator/aggregation/graphLookup/ might be even more performant here

@noahsilas
Copy link
Contributor Author

$graphLookup seems like it could be a great tool! That sort of suggests adding some kind of method to the storage adapter, because we're getting into pretty specialized territory - translating this to SQL for Postgres seems quite tricky.

I haven't profiled this at all yet, but we also might be able to take advantage of $lookup to get this down to a single query?

// we could probably even get this set of IDs directly from `_Join:users:_Role` table to start our query?
var horizon = ['2GtApWUHe1', 'fu0ckkFVGD']

db.getCollection('_Role').aggregate([
  // Find the Roles that match the horizon, use them as starting points for the graph lookup
  { $match: { _id: { $in: horizon } } },

  // Recursively find Joins from those starting points
  { $graphLookup: {
    from: '_Join:roles:_Role',
    startWith: '$_id',
    connectFromField: 'owningId',
    connectToField: 'relatedId',
    as: 'rolePath',
  }},

  // "Lookup" Role objects based on those role IDs
  { $lookup: {
    from: "_Role",
    localField: "rolePath.owningId",
    foreignField: "_id",
    as: "Roles",
  }},
]).pretty()

I don't see anything in the $graphLookup documentation talking about how it deals with cycles. Testing locally it seems like it stops traversing when it comes to a node that it's already visited? Is there a way to confirm that?

It also looks like it does the recursive search for each input document, so maybe there's something better than this approach of starting by matching a document for each role the user has? If those roles are all situated at different levels of a single tree, that could end up duplicating work along each subtree.

@omairvaiyani
Copy link
Contributor

omairvaiyani commented Mar 19, 2020

I don't see anything in the $graphLookup documentation talking about how it deals with cycles.

MongoDB stops the traversal when it encounters an already visited node, confirmed here

If those roles are all situated at different levels of a single tree, that could end up duplicating work along each subtree.

Not exactly - as soon as a subtree reaches an already visited note, it'll stop there. Follow my reasoning here:

Given a scenario where there are 4 roles ["owner", "manager", "member", "observer"], with a simple ladder-like tree, each role has 1 parent, except "owner" which is the top-level node.

A0 - Get user's direct roles => ["owner", "manager"] : ops 1
A1 - Begin $graphLookup from A0 => ["owner>manager", "manger>member", "member>observer", "observer>null"] => op[0]:false op[1]:true op[2]:true op[3]:true => 3 ops

In the above, the only wasted op would be A1 index 0. However, if I understand $graphLookup correctly, that operation will not be executed, as the "manager' node has already been visited. If we have a slight variation in the user's role like, where the single-tree is entered at non-adjacent levels, let's take a look:

B0 - Get user's direct roles => ["owner", "member"] : ops 1
B1 - Begin $graphLookup from A0 => ["owner>manager", "manger>member", "member>observer", "observer>null"] op[0]:true op[1]:false op[2]:true op[3]:true => 3 ops

Given that $graphLookup avoids following a node that's already visited, there shouldn't be repeat work even if we enter at various levels within the same tree.

@stale
Copy link

stale bot commented May 3, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants