Performance improvements #8

johanobergman · 2015-10-18T20:17:34Z

I've been messing around with this project a bit lately, and I've had some trouble with performance. The GraphQL executor does quite a bit of unnecessary work when it resolves fields. Since it operates recursively, larger lists have a really bad impact on performance. I have tried to fix some of the bottlenecks by memoizing objects and values that can be re-used.

I'd like to hear what you think about this approach. According to the Blackfire profiler the response was 2x faster for a large query and 1.5x faster for one I use in my app.
You can check out my fork here: https://github.com/johanobergman/graphql-php

vladar · 2015-10-19T07:38:13Z

Performance is definitely a priority. I planned to revisit executor code some day. Will take a look at your implementation when I have a chance.

Feel free to submit it as pull request if you already have working implementation.

The roadmap I had in mind so far is:

Try out breadth-first approach (probably without recursion) - this may also help solving N+1 problem we discussed in Use of map() where parent type is a node #6.
Write optional PHP module for the most expensive areas.
Consider implementing parallel execution for HHVM (and probably ReactPHP)

But I do not want to start optimizing stuff prematurely, until we get clear usage patterns and see all weaknesses of GraphQL in php in general.

Obviously PHP execution model is very different from Node, so this project will diverge a lot from GraphQL JS eventually.

vladar · 2015-10-23T07:46:08Z

Hey, I checked your implementation, but memoization you propose affects the result of query execution. When you memoize results of resolve method for a field, you don't take into account possible arguments or resolveInfo, which alter results for different resolve calls.

Test suite obviously fails hard in such approach.

In other words, the work performed by library that seem useless is actually necessary.

The problem is that resolve and even collectSubFields are context-specific (e.g. resolve depends on arguments and resolveInfo), so you can't memoize them effectively at executor level.

There is also a chance that I am missing something in your idea. If you could make test suite pass with your implementation - that would help me to understand it better.

johanobergman · 2015-10-23T15:31:07Z

I am a bit too unfamiliar with the graphql source to fully understand the tests and make them pass (perhaps my approach is fundamentally wrong as you say). I did the memoization under the assumption that resolveInfo and arguments were the same for every iteration of a given portion of the full query. You can see that I key the memoized data by an id constructed from the start and end location of the fieldAST.

This is a bit of a hack anyway - a different architecture which supports PHP better is by far a better approach.

vladar · 2015-10-23T16:11:00Z

Actually resolve should be called only once per field (for the same value and same position in query). If resolve is somehow called multiple times - this is probably a bug. Did you face a situation when it has been called multiple times?

As for collectSubFields memoization - it is tricky because of complex types (interfaces, unions). Consider following query:

query Test {
  content {
    id
    ... on Image {
      width
      height
      ...A
    }
    ... on Attachment {
      size
      ...B
    }
  }
}

fragment A on Image {
  y: a @include(if: $somevariable)
}

fragment B on Attachment {
  z: b
}

Call to collectSubField for content field will return different results depending on current value and query variables. If value is of Image type - subfields will be (id, width, height, y), for Attachment type it will be (id, size, z)

Directives and aliases may also affect actual subfield structures.

I guess the only place where we could memoize such calls is when processing homogeneous lists (lists of ObjectType)

johanobergman · 2015-10-23T18:01:46Z

Actually resolve should be called only once per field (for the same value and same position in query). If resolve is somehow called multiple times - this is probably a bug. Did you face a situation when it has been called multiple times?

I think this happened when the field owner was not a direct sibling of the other values with the same query position:

query Test {
  todos {
    ...
    category {        <- field owner
      translatedName  <- field
    }
  }
}

Let's say there are 10 categories in the system. The todos field may return several todos which in turn returns one category. It is likely that a given category will end up in a lot of todos, with resolve being called on its fields each time even though it has the same position in the query and the same value. In this case, resolving translatedName may cause significant (unnecessary) overhead since it will perform the translation several times.

vladar · 2015-10-24T17:41:00Z

Yeah. Now I see what you mean. Sorry that I didn't get your idea in the first round :)

To make this optimization work library have to somehow be aware of value identity. I see you use SplObjectStorage with object hash as identity.

Will recheck why exactly this optimization breaks the test suit.

johanobergman · 2015-10-24T18:51:37Z

Remember that it was only one of the optimizations. The memoization of resolveInfo, args, and collectSubFields were the ones that yielded the best result (although they too broke the test suite).

smolinari · 2015-10-25T05:58:23Z

Hi,

I'm new to the project and I like what I see already, both from the concept of GraphQL and the work done here, I'd like to say, I would love to see this happen.

Consider implementing parallel execution for HHVM (and probably ReactPHP)

However, I am working with appserver.io.

Scott

vladar · 2015-10-25T08:29:01Z

@johanobergman Merged your tweaks to code base with some modifications to make test suite pass. CollectSubFields are also memoized now, but per ObjectType to make Abstract types work.

Thanks for these ideas!

Also note that I reverted map feature which added some overhead as well. This could slightly improve performance as well.

vladar · 2015-10-25T08:42:45Z

@smolinari Yeah, that would be great. But it's hard to say when I'll have enough time for these big changes, so no estimates yet.

smolinari · 2015-10-25T08:53:50Z

No problem. It is an emerging technology, so there is some time to "get it going".

I sure wish Facebook would open source their server side implementation. It would be neat to see how they use HHVM (and I would assume async()) to build their Type System.

Scott

johanobergman · 2015-10-25T10:55:30Z

@vladar Good job! Something's not quite right though. Check out this comparision of my fork and graphql-php 0.5 https://blackfire.io/profiles/compare/f5f8ce8c-9ff2-4e9e-bbdf-b673719cbeb9/graph. I believe some methods are still called too many times.

johanobergman · 2015-10-25T10:58:01Z

Here's a detailed view of graphql-php 0.5: https://blackfire.io/profiles/e67292c1-5853-451f-85d5-60e64a77c873/graph

vladar · 2015-10-25T13:11:26Z

Try again on latest commit. Changed the way of handling complex types - now field uid also depends on exact type name.

johanobergman · 2015-10-25T14:46:15Z

That works a lot better. Note that you don't have to do add [$runtimeType->name] in $exeContext->memoized['collectSubFields'][$uid][$runtimeType->name] since it's already in the $uid.

I't still about 30% slower than my fork, getFieldUid is called 18000 times instead of 14000 times. $memoized contains the same values though, so it might be some of the recent changes that caused this (removing map or collectSubFields maybe).
https://blackfire.io/profiles/compare/900c6539-0a2d-4d05-9b3b-bf0682f135dd/graph (it's a bit hard to read since method names have been changed)

vladar · 2015-10-25T15:05:22Z

Yes, that's probably because of map removal. In the version with map you only called getFieldUid once per list. Now it's called once per each list value (as in original graphql-js).

Will probably revisit this once I get to N+1 problem.

johanobergman · 2015-10-25T18:40:03Z

That sounds good. Thank you for the work you put into this. The performance gain has been massive!

vladar · 2015-10-26T07:26:57Z

Closing this for now.

timbrandin · 2016-06-08T16:46:00Z

I have a BIG problem with memoized results, in this example with memoizing on I get faulty result on the leaf node:

Query:

{
  surveys(school_category:SCHOOL) {
    nid
    modules {
      survey
    }
  }
}

Faulty result, survey in each module should match the nid from the survey.

{
  "data": {
    "surveys": [
      {
        "nid": 85,
        "modules": [
          {
            "survey": 85
          },
          {
            "survey": 85
          },
          {
            "survey": 85
          },
          {
            "survey": 85
          },
          {
            "survey": 85
          },
          {
            "survey": 85
          },
          {
            "survey": 85
          }
        ]
      },
      {
        "nid": 707,
        "modules": [
          {
            "survey": 85
          },
          {
            "survey": 85
          },
          {
            "survey": 85
          },
          {
            "survey": 85
          },
          {
            "survey": 85
          },
          {
            "survey": 85
          },
          {
            "survey": 85
          }
        ]
      }
    ]
  }
}

And if I turn off memoization I get the correct result:

{
  "data": {
    "surveys": [
      {
        "nid": 85,
        "modules": [
          {
            "survey": 85
          },
          {
            "survey": 85
          },
          {
            "survey": 85
          },
          {
            "survey": 85
          },
          {
            "survey": 85
          },
          {
            "survey": 85
          },
          {
            "survey": 85
          }
        ]
      },
      {
        "nid": 707,
        "modules": [
          {
            "survey": 707
          },
          {
            "survey": 707
          },
          {
            "survey": 707
          },
          {
            "survey": 707
          },
          {
            "survey": 707
          },
          {
            "survey": 707
          },
          {
            "survey": 707
          }
        ]
      }
    ]
  }
}

I'm not sure it's safe to memoize using the AST location and fieldname, my recommendation is to remove this performance optimization now until someone figures out how to look at the parent if that is different than previously memoized, but also that doesn't take into account if say any contextual parameter has changed. Say the user is added in some level of the graph and passed down, although atm it doesn't seem to be possible to reference the context in the resolvers to be able to add stuff (this works in the JavaScript implementation, context is passed as the third parameter). But that seems like another issue.

mcg-web · 2016-06-09T08:46:26Z

hi @timbrandin can you open a new issue for this please? it will be much easier to follow thx

timbrandin · 2016-06-09T09:15:27Z

Sure @mcg-web

vladar added a commit that referenced this issue Oct 25, 2015

Several performance improvements (#8)

3b3da9e

vladar added a commit that referenced this issue Oct 25, 2015

Better approach for latest performance tweaks (#8)

ea4d43f

vladar closed this as completed Oct 26, 2015

Uh oh!

Performance improvements #8

Performance improvements #8

Comments

johanobergman commented Oct 18, 2015

vladar commented Oct 19, 2015

Uh oh!

vladar commented Oct 23, 2015

Uh oh!

johanobergman commented Oct 23, 2015

Uh oh!

vladar commented Oct 23, 2015

Uh oh!

johanobergman commented Oct 23, 2015

Uh oh!

vladar commented Oct 24, 2015

Uh oh!

johanobergman commented Oct 24, 2015

Uh oh!

smolinari commented Oct 25, 2015

Uh oh!

vladar commented Oct 25, 2015

Uh oh!

vladar commented Oct 25, 2015

Uh oh!

smolinari commented Oct 25, 2015

Uh oh!

johanobergman commented Oct 25, 2015

Uh oh!

johanobergman commented Oct 25, 2015

Uh oh!

vladar commented Oct 25, 2015

Uh oh!

johanobergman commented Oct 25, 2015

Uh oh!

vladar commented Oct 25, 2015

Uh oh!

johanobergman commented Oct 25, 2015

Uh oh!

vladar commented Oct 26, 2015

Uh oh!

timbrandin commented Jun 8, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mcg-web commented Jun 9, 2016

Uh oh!

timbrandin commented Jun 9, 2016

Uh oh!

timbrandin commented Jun 8, 2016 •

edited

Loading