feat(discover2) Add more aggregate field support #14321

markstory · 2019-08-08T18:27:43Z

This is an alternate approach to modifying the discover endpoints to fit the requirements of discover2/events search. By extending the existing events endpoint with rollup and additional functions/aliases we can get what we need.

The rollup parameter isn't used right now but will be used when we update events-stats which will be one of the next tasks.

This approach will require some shim code in the UI to convert saved discover queries into a compatible field list and query string, but that work now seems simpler than building out the remaining ancillary endpoints in discover.

Refs SEN-810

This is an alternate approach to modifying the discover endpoints to fit the requirements of discover2/events search. By extending the existing events endpoint with rollup and additional functions/aliases we can get what we need. This approach will require some shim code in the UI to convert saved discover queries into a compatible field list and query string, but that work now seems simpler than building out the remaining ancillary endpoints in discover. Refs SEN-811

markstory · 2019-08-08T18:28:51Z

src/sentry/api/bases/organization_events.py

@@ -147,9 +110,6 @@ def get_snuba_query_args_legacy(self, request, organization):
        except InvalidSearchQuery as exc:
            raise OrganizationEventsError(exc.message)

-        # Filter out special aggregates.
-        self._filter_unspecified_special_fields_in_conditions(snuba_args, set())


This was removed as we won't be supporting filtering on aggregate columns in the short term, and with fully dynamic aggregate columns we would need expand the query parser to accomodate aggregate filtering.

~~Can we go ahead and remove the function completely?~~ sorry, just saw you did this

markstory · 2019-08-08T18:29:30Z

src/sentry/api/endpoints/organization_events.py

-            if any(field for field in groupby if field not in ALLOWED_GROUPINGS):
-                message = ('Invalid groupby value requested. Allowed values are ' +
-                           ', '.join(ALLOWED_GROUPINGS))
-                return Response({'detail': message}, status=400)


This was removed because the endpoint no longer allows the end user to define groupby. Instead groupings are implicit based on the fields requested.

markstory · 2019-08-08T18:30:27Z

src/sentry/api/event_search.py

@@ -215,9 +215,9 @@ class SearchVisitor(NodeVisitor):
        'device.battery_level', 'device.charging', 'device.online',
        'device.simulator', 'error.handled', 'issue.id', 'stack.colno',
        'stack.in_app', 'stack.lineno', 'stack.stack_level',
-        # OrganizationEvents aggregations
-        'event_count', 'user_count',


Removing these as they aren't a thing anymore.

src/sentry/static/sentry/app/views/organizationEventsV2/utils.jsx

lynnagara · 2019-08-08T18:45:50Z

src/sentry/api/event_search.py

-        # OrganizationEvents aggregations
-        'event_count', 'user_count',
-
+        'duration',


Is this going to break regular event/issue search if someone is using these field names?

The removed fields were only really available on events-v2, which no customers have access to.

This code is not just used in events-v2 though, it's used for event search generally. Try typing duration:something into issue or event search and you will see what I mean

I doubt anyone knows about these fields since they were never documented

Unfortunately people do use duration as a tag though. And 500ing on a query with this term in it doesn't seem great

I can remove duration for now. Given that some customers are using this as a tag would we want to expose the real duration as event.duration?

lynnagara · 2019-08-08T18:51:08Z

src/sentry/api/event_search.py

+            u'{}_{}'.format(match.group('function'), match.group('column'))
+        ])
+
+    rollup = snuba_args.get('rollup')


What will rollup be used for?

Rollup gets used by timeseries endpoints, which I'm planning to update soon.

Is rollup synonymous with granularity in snuba?

https://github.com/getsentry/snuba/blob/cc0ddc14fc72548e5eca69b7aa7ed50b04060488/README.md#granularity

lynnagara · 2019-08-08T18:51:15Z

src/sentry/api/event_search.py

+        # Ensure fields we require to build a functioning interface
+        # are present. We don't add fields when using a rollup as the additional fields
+        # would be aggregated away.
+        if 'project.id' not in columns:


Similar to id on the line below, won't the aggregations check be needed?

We always need a project id in order to generate a link to the event in the modal. This complicates cross project aggregation. Another possibility is to use argMax(project_id, timestamp) to get the project for the latest event.

What if the logic in this block that is related to building a functioning interface and modal links lived in the user interface instead instead of here? I think the API could be solely responsible for making sure queries are valid and then running them, not modifying queries so that the UI gets the field it needs. Otherwise this behavior would be super confusing if someone is using the API directly and getting back random fields that they didn't request just because our UI needs it.

My concern with that is we'll be storing field lists in the URL and in the database. Storing all the decomposed fields makes URLs more fragile as removing an expanded field could cause incorrect rendering. It also makes future changes harder. If we want to modify how a field is expanded we can't easily do that because the expansion is saved incorrectly in the database.

lynnagara · 2019-08-08T19:04:16Z

src/sentry/static/sentry/app/views/organizationEventsV2/data.jsx

-    sortField: 'event_count',
+  // TODO generalize this.
+  'count(id)': {
+    sortField: 'count_id',


Can you only sort by one column?

The links only add/change a single sort condition.

Ok. For events we use -timestamp, -event_id or timestamp, event_id so we have a stable order, which we don't have here. Not sure if it matters though.

src/sentry/api/endpoints/organization_events.py

lynnagara · 2019-08-08T19:09:21Z

src/sentry/api/event_search.py

+            columns.append(field)
+            continue
+
+        validate_aggregate(field, match)


It could be nice if all the validations were done in one place up front, but i'm guessing this is not as easy?

This function (resolve_field_list) is called pretty early in the organization_events endpoint processing. I would like to use a DRF serializer for this work but other request validation steps get_filter_params() are embedded in endpoint logic.

lynnagara · 2019-08-08T19:10:00Z

src/sentry/api/event_search.py

+        if aggregations and 'latest_event' not in fields:
+            columns.extend(deepcopy(FIELD_ALIASES['latest_event']['fields']))
+
+    basic_fields = [col for col in columns if isinstance(col, six.string_types)]


When is a column name not a string? Wouldn't that be a validation error if that ever happened?

The column name can be a non-string value when a field alias is used. For example the latest_event field gets expanded into a function using argMax.

I've added validation to disallow end users from providing non-string field names.

Gotcha. IMO it's easier to read if you put the special expanded fields (which are actually aggregations) in a separate list since they have a different type and need to be handled differently

All the field aliases that expand into aggregates now update aggregations. I've had to hack in a string function call for latest_event as snuba doesn't support multiple columns in aggregate functions just yet. I've opened a pull to fix that though.

lynnagara · 2019-08-08T19:16:18Z

src/sentry/api/event_search.py

+
+    basic_fields = [col for col in columns if isinstance(col, six.string_types)]
+    if rollup and basic_fields:
+        raise InvalidSearchQuery('You cannot combine rollup with non-aggregate fields')


Isn't that ok since they get added to groupby clause?

Maybe, my understanding of rollup is that it cuts the data up into timeslices. Getting a non-aggregate column in that kind of result set is not helpful as it breaks the rollup slices.

Why won't it be useful? I could see something like Errors grouped by version rolled up by week being pretty useful.

Fair point. I'll take this out.

On second thought, this scenario is an error if there are no aggregates being included. However, basic fields + aggregates should be valid.

lynnagara · 2019-08-08T19:22:15Z

src/sentry/api/event_search.py

+VALID_AGGREGATES = {
+    'count_unique': {
+        'snuba_name': 'uniq',
+        'fields': '*',


I can imagine also wanting to do all fields excluding some small list

@lynnagara im wondering, is there an aggregate function where we would want to exclude certain columns?

I personally would prefer to use safe-lists rather than ban-lists for the acceptable aggregate columns.

The situation where I was thinking we might need a ban list is where we want to support the aggregate on custom user-defined tag columns (can't use a safe list) but might need to exclude some internal stuff

src/sentry/api/event_search.py

Generally not a useful scenario

Don't inject a project.id field when aggregating across non-project fields. Instead use argMax() to get a project.name from the latest event.

wedamija · 2019-08-08T20:29:31Z

src/sentry/api/event_search.py

+    },
+    # This doesn't work yet, but is an illustration of how it could work
+    'p75': {
+        'snuba_name': 'quantileTiming(0.75)',


I think we wanted to stop passing function calls like this explicitly, and try something like
[quantileTiming, [0.75]]. Not sure if this is something that would be directly passed to snuba though?

I'm fine with an array form as well. The quantile functions are a bit funny as they need to end up in clickhouse like quantileTiming(0.75)(duration) I'm not sure we've done that kind of function call in snuba before.

We might need snuba to translate that for us, not sure. I feel like we had something similar to this before, can't remember exactly what it was though. Could potentially do [quantileTiming, [0.75, 'duration']] and have snuba sort it out. I think other patterns we've done were to extract information from the name, but that's probably not a great practice (something like [quantileTiming_75, ['duration']])

Yeah the implementation of the quantile functions is still imcomplete, but we can figure that out once storage for transactions is sorted.

wedamija · 2019-08-08T21:21:59Z

src/sentry/api/event_search.py

-        # OrganizationEvents aggregations
-        'event_count', 'user_count',
-
+        'duration',


I doubt anyone knows about these fields since they were never documented

* Remove non-deterministic results. * Remove condition test for fields that are no longer supported.

dashed

Looks good to me.

I'll defer to @wedamija and @lynnagara for approval.

We need to use string function names in the short term as snuba's json schema validation prevents aggregates on more than one column. I've also changed the logic to allow rollup with basic fields as long as at least one aggregate is also specified.

We have a use-case in sentry where we want to use `argMax()` which requires multiple columns to work. While this function is allowed in `selected_columns` it is actually an aggregate value and should probably be in `aggregations`. This expands the schema and query builder to allow lists of columns in aggregations functions. Refs getsentry/sentry#14321

Duration is currently in use as a tag by a few customers.

wedamija

lgtm, probably worth lyn having final signoff

lynnagara

Looks good!

We have a use-case in sentry where we want to use `argMax()` which requires multiple columns to work. While this function is allowed in `selected_columns` it is actually an aggregate value and should probably be in `aggregations`. This expands the schema and query builder to allow lists of columns in aggregations functions. Refs getsentry/sentry#14321

markstory added 2 commits August 8, 2019 12:00

Update the UI code to use the new aggregate functions/field aliases.

bf09008

markstory commented Aug 8, 2019

View reviewed changes

markstory requested review from lynnagara, a team and dashed August 8, 2019 18:31

lynnagara reviewed Aug 8, 2019

View reviewed changes

src/sentry/api/endpoints/organization_events.py Outdated Show resolved Hide resolved

lynnagara reviewed Aug 8, 2019

View reviewed changes

src/sentry/api/event_search.py Outdated Show resolved Hide resolved

markstory added 4 commits August 8, 2019 15:42

Don't allow aggregates on issue.id

89be14f

Generally not a useful scenario

Remove redundant conditions

f52a75d

Enable cross project aggregations

d4dc8a8

Don't inject a project.id field when aggregating across non-project fields. Instead use argMax() to get a project.name from the latest event.

Merge branch 'master' into sen-811-server-field-aliases

c1fdee0

wedamija reviewed Aug 8, 2019

View reviewed changes

markstory added 2 commits August 9, 2019 10:08

Fix tests

a897c35

* Remove non-deterministic results. * Remove condition test for fields that are no longer supported.

Remove print

89a7ee8

markstory requested review from lynnagara and wedamija August 9, 2019 16:12

dashed reviewed Aug 9, 2019

View reviewed changes

dashed mentioned this pull request Aug 9, 2019

chore(ts): Initial typescript conversion in preparation of discover v2 #14345

Merged

8 tasks

markstory mentioned this pull request Aug 12, 2019

feat(api) Add support for column list in aggregate functions getsentry/snuba#408

Merged

Remove duration as a field

0fb0235

Duration is currently in use as a tag by a few customers.

wedamija reviewed Aug 12, 2019

View reviewed changes

lynnagara approved these changes Aug 12, 2019

View reviewed changes

markstory merged commit 0605f0e into master Aug 13, 2019

evanpurkhiser deleted the sen-811-server-field-aliases branch August 22, 2019 22:52

github-actions bot locked and limited conversation to collaborators Dec 20, 2020

feat(discover2) Add more aggregate field support #14321

feat(discover2) Add more aggregate field support #14321

Conversation

markstory commented Aug 8, 2019 • edited Loading

Choose a reason for hiding this comment

lynnagara Aug 8, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dashed Aug 9, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lynnagara Aug 8, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lynnagara Aug 8, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wedamija Aug 9, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dashed left a comment

Choose a reason for hiding this comment

wedamija left a comment

Choose a reason for hiding this comment

lynnagara left a comment

Choose a reason for hiding this comment

markstory commented Aug 8, 2019 •

edited

Loading

lynnagara Aug 8, 2019 •

edited

Loading

dashed Aug 9, 2019 •

edited

Loading

lynnagara Aug 8, 2019 •

edited

Loading

lynnagara Aug 8, 2019 •

edited

Loading

wedamija Aug 9, 2019 •

edited

Loading