-
Notifications
You must be signed in to change notification settings - Fork 116
feat: build search_clients_daily_v9 #7744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: build search_clients_daily_v9 #7744
Conversation
sql/moz-fx-data-shared-prod/search_derived/search_clients_daily_v9/query.sql
Outdated
Show resolved
Hide resolved
sql/moz-fx-data-shared-prod/search_derived/search_clients_daily_v9/query.sql
Outdated
Show resolved
Hide resolved
sql/moz-fx-data-shared-prod/search_derived/search_clients_daily_v9/query.sql
Outdated
Show resolved
Hide resolved
sql/moz-fx-data-shared-prod/search_derived/search_clients_daily_v9/query.sql
Outdated
Show resolved
Hide resolved
sql/moz-fx-data-shared-prod/search_derived/search_clients_daily_v9/query.sql
Outdated
Show resolved
Hide resolved
sql/moz-fx-data-shared-prod/search_derived/search_clients_daily_v9/query.sql
Outdated
Show resolved
Hide resolved
sql/moz-fx-data-shared-prod/search_derived/search_clients_daily_v9/query.sql
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this. It is great to see this moving along. Sorry for the delay in responding, there was lots of thinking and checking to do.
sql/moz-fx-data-shared-prod/search_derived/search_clients_daily_v9/query.sql
Show resolved
Hide resolved
sql/moz-fx-data-shared-prod/search_derived/search_clients_daily_v9/query.sql
Show resolved
Hide resolved
sql/moz-fx-data-shared-prod/search_derived/search_clients_daily_v9/query.sql
Outdated
Show resolved
Hide resolved
sql/moz-fx-data-shared-prod/search_derived/search_clients_daily_v9/query.sql
Outdated
Show resolved
Hide resolved
sql/moz-fx-data-shared-prod/search_derived/search_clients_daily_v9/query.sql
Outdated
Show resolved
Hide resolved
sql/moz-fx-data-shared-prod/search_derived/search_clients_daily_v9/query.sql
Outdated
Show resolved
Hide resolved
num_ads_notshowing, | ||
ad_blocker_inferred, | ||
ad_components.component as ad_click_target | ||
FROM `mozdata.firefox_desktop.serp_events` e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realise the current serp_events
doesn't have it, but we could really do with partner_code
here as well (can we get it added there?
This would allow for better breakdowns by partner code, something that we can't analyse today with the existing tables, yet we should really be taking it into account.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also have something like search_engine as provider_id
. Similar to the sap_source
field, the user may be searching different providers which aren't triggered via the normal search access points. Hence we need to include those as well.
Though could we also change "duckduckgo" to "ddg" within the field so that it matches the provider_id
field for SAP and the search_engine_default_provider_id
for the default search engine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to also add the normalized_engine field in here as well - but only filled in from search_engine
(there's no name field for SERPs, since there's a limited amount.
This would make it easier to search for both tagged and organic SERPs using the normalized_engine field, which you might want to do if you're also including SAP data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Standard8: with your first comment, are we understanding correctly that you want partner_code
added to the serp_events
table itself (not just us pulling it in through another CTE)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I realise I wasn't too clear here. Yes, the provider and partner_code data need pulling in from the serp.impression
event (from serp_events
). They're dedicated to that event, you can't pull them from the sap.counts
event, since they're different events with potentially different totals per provider/partner code.
So, in summary, I think this serp_events
cte needs the following adding:
CASE
WHEN search_engine = "duckduckgo" then "ddg"
ELSE search_engine
END as provider_id,
partner_code,
(This will need partner_code
also adding to the mozdata.firefox_desktop.serp_events
table...).
sql/moz-fx-data-shared-prod/search_derived/search_clients_daily_v9/query.sql
Outdated
Show resolved
Hide resolved
normalized_engine, | ||
provider_name, | ||
provider_id, | ||
partner_code, | ||
overridden_by_third_party, | ||
search_access_point, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted above, provider_id
, partner_code
, search_access_point
should be from both sap & serp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Standard8: we don't see the provider_id
and partner_code
fields in the serp_events
table. Do you want us to pull them in from serp_impressions
ping (it has provider
and partner_code
)? Or are you saying they should be added to the serp_events
table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe provider_id
(aka provider
) is in the serp_events
table as search_engine
.
I think the partner_code
should also be added to serp_events
table if possible - That's an important aspect that we're currently missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Standard8 DE can add partner_code to the serp_events table. That is not a big lift. Does it need to be backfilled? I understand that will take a much longer time if needed but that they could add it to their backlog. @kbammarito or @alekhyamoz can correct me as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably we can request backfill at a later time? If so, I'd suggest leaving it for now, and if we need a bit more history when we start using the new table, then we can request it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the plan for "backfill" was to create another versioned table. We'd do a small backfill to ensure metric trends haven't changed, but the final table will concatenate the two tables: v1 if < date and v2 if > date. Stakeholders will keep using legacy data for past results, but future results will be transitioned to the new metrics.
sql/moz-fx-data-shared-prod/search_derived/search_clients_daily_v9/query.sql
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a few responses. I'm assuming the "done" comments are ones that have been done locally, and not yet pushed to this PR.
num_ads_notshowing, | ||
ad_blocker_inferred, | ||
ad_components.component as ad_click_target | ||
FROM `mozdata.firefox_desktop.serp_events` e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I realise I wasn't too clear here. Yes, the provider and partner_code data need pulling in from the serp.impression
event (from serp_events
). They're dedicated to that event, you can't pull them from the sap.counts
event, since they're different events with potentially different totals per provider/partner code.
So, in summary, I think this serp_events
cte needs the following adding:
CASE
WHEN search_engine = "duckduckgo" then "ddg"
ELSE search_engine
END as provider_id,
partner_code,
(This will need partner_code
also adding to the mozdata.firefox_desktop.serp_events
table...).
normalized_engine, | ||
provider_name, | ||
provider_id, | ||
partner_code, | ||
overridden_by_third_party, | ||
search_access_point, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe provider_id
(aka provider
) is in the serp_events
table as search_engine
.
I think the partner_code
should also be added to serp_events
table if possible - That's an important aspect that we're currently missing.
sql/moz-fx-data-shared-prod/search_derived/search_clients_daily_v9/query.sql
Outdated
Show resolved
Hide resolved
sql/moz-fx-data-shared-prod/search_derived/search_clients_daily_v9/query.sql
Outdated
Show resolved
Hide resolved
sql/moz-fx-data-shared-prod/search_derived/search_clients_daily_v9/query.sql
Outdated
Show resolved
Hide resolved
sql/moz-fx-data-shared-prod/search_derived/search_clients_daily_v9/query.sql
Outdated
Show resolved
Hide resolved
sql/moz-fx-data-shared-prod/search_derived/search_clients_daily_v9/query.sql
Outdated
Show resolved
Hide resolved
Integration report for "feat: update query to fix exploding joins and reorganize aggregates"
|
Description
This PR builds
search_clients_daily_v9
from Glean (migrating from legacy telemetry).Related Tickets & Documents
Reviewer, please follow this checklist