Skip to content

Conversation

kbammarito
Copy link
Contributor

@kbammarito kbammarito commented Jul 8, 2025

Description

This PR builds search_clients_daily_v9 from Glean (migrating from legacy telemetry).

Related Tickets & Documents

Reviewer, please follow this checklist

Copy link
Member

@Standard8 Standard8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this. It is great to see this moving along. Sorry for the delay in responding, there was lots of thinking and checking to do.

num_ads_notshowing,
ad_blocker_inferred,
ad_components.component as ad_click_target
FROM `mozdata.firefox_desktop.serp_events` e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realise the current serp_events doesn't have it, but we could really do with partner_code here as well (can we get it added there?

This would allow for better breakdowns by partner code, something that we can't analyse today with the existing tables, yet we should really be taking it into account.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also have something like search_engine as provider_id. Similar to the sap_source field, the user may be searching different providers which aren't triggered via the normal search access points. Hence we need to include those as well.

Though could we also change "duckduckgo" to "ddg" within the field so that it matches the provider_id field for SAP and the search_engine_default_provider_id for the default search engine?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to also add the normalized_engine field in here as well - but only filled in from search_engine (there's no name field for SERPs, since there's a limited amount.

This would make it easier to search for both tagged and organic SERPs using the normalized_engine field, which you might want to do if you're also including SAP data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Standard8: with your first comment, are we understanding correctly that you want partner_code added to the serp_events table itself (not just us pulling it in through another CTE)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I realise I wasn't too clear here. Yes, the provider and partner_code data need pulling in from the serp.impression event (from serp_events). They're dedicated to that event, you can't pull them from the sap.counts event, since they're different events with potentially different totals per provider/partner code.

So, in summary, I think this serp_events cte needs the following adding:

CASE
  WHEN search_engine = "duckduckgo" then "ddg"
  ELSE search_engine
END as provider_id,
partner_code,

(This will need partner_code also adding to the mozdata.firefox_desktop.serp_events table...).

Comment on lines 279 to 284
normalized_engine,
provider_name,
provider_id,
partner_code,
overridden_by_third_party,
search_access_point,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted above, provider_id, partner_code, search_access_point should be from both sap & serp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Standard8: we don't see the provider_id and partner_code fields in the serp_events table. Do you want us to pull them in from serp_impressions ping (it has provider and partner_code)? Or are you saying they should be added to the serp_events table?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe provider_id (aka provider) is in the serp_events table as search_engine.

I think the partner_code should also be added to serp_events table if possible - That's an important aspect that we're currently missing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Standard8 DE can add partner_code to the serp_events table. That is not a big lift. Does it need to be backfilled? I understand that will take a much longer time if needed but that they could add it to their backlog. @kbammarito or @alekhyamoz can correct me as needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably we can request backfill at a later time? If so, I'd suggest leaving it for now, and if we need a bit more history when we start using the new table, then we can request it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the plan for "backfill" was to create another versioned table. We'd do a small backfill to ensure metric trends haven't changed, but the final table will concatenate the two tables: v1 if < date and v2 if > date. Stakeholders will keep using legacy data for past results, but future results will be transitioned to the new metrics.

Copy link
Member

@Standard8 Standard8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a few responses. I'm assuming the "done" comments are ones that have been done locally, and not yet pushed to this PR.

num_ads_notshowing,
ad_blocker_inferred,
ad_components.component as ad_click_target
FROM `mozdata.firefox_desktop.serp_events` e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I realise I wasn't too clear here. Yes, the provider and partner_code data need pulling in from the serp.impression event (from serp_events). They're dedicated to that event, you can't pull them from the sap.counts event, since they're different events with potentially different totals per provider/partner code.

So, in summary, I think this serp_events cte needs the following adding:

CASE
  WHEN search_engine = "duckduckgo" then "ddg"
  ELSE search_engine
END as provider_id,
partner_code,

(This will need partner_code also adding to the mozdata.firefox_desktop.serp_events table...).

Comment on lines 279 to 284
normalized_engine,
provider_name,
provider_id,
partner_code,
overridden_by_third_party,
search_access_point,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe provider_id (aka provider) is in the serp_events table as search_engine.

I think the partner_code should also be added to serp_events table if possible - That's an important aspect that we're currently missing.

@dataops-ci-bot
Copy link

Integration report for "feat: update query to fix exploding joins and reorganize aggregates"

sql.diff

Click to expand!
Only in /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/search_derived: search_clients_daily_v9
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/search_derived/search_clients_daily_v9/metadata.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/search_derived/search_clients_daily_v9/metadata.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/search_derived/search_clients_daily_v9/metadata.yaml	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/search_derived/search_clients_daily_v9/metadata.yaml	2025-08-07 03:25:29.000000000 +0000
@@ -0,0 +1,17 @@
+friendly_name: Search Clients Daily V9
+description: |-
+  Please provide a description for the query
+owners: []
+labels: {}
+bigquery: null
+workgroup_access:
+- role: roles/bigquery.dataViewer
+  members:
+  - workgroup:mozilla-confidential
+references:
+  query.sql:
+  - moz-fx-data-shared-prod.firefox_desktop_stable.metrics_v1
+  - moz-fx-data-shared-prod.revenue.monetization_blocking_addons
+  - mozdata.firefox_desktop.events_stream
+  - mozdata.firefox_desktop.serp_events
+require_column_descriptions: false
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/search_derived/search_clients_daily_v9/query.sql /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/search_derived/search_clients_daily_v9/query.sql
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/search_derived/search_clients_daily_v9/query.sql	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/search_derived/search_clients_daily_v9/query.sql	2025-08-07 03:21:36.000000000 +0000
@@ -0,0 +1,571 @@
+-- Improved timestamp parsing function with better performance
+CREATE TEMP FUNCTION safe_parse_timestamp(ts STRING) AS (
+  COALESCE(
+    SAFE.PARSE_TIMESTAMP("%FT%T%Ez", ts),
+    SAFE.PARSE_TIMESTAMP("%F%Ez", ts),
+    SAFE.PARSE_TIMESTAMP("%FT%T*E%Ez", ts)
+  )
+);
+
+WITH
+-- list of ad blocking addons produced using this logic: https://github.com/mozilla/search-adhoc-analysis/tree/master/monetization-blocking-addons
+adblocker_addons_cte AS (
+  SELECT
+    addon_id,
+    addon_name
+  FROM
+    `moz-fx-data-shared-prod.revenue.monetization_blocking_addons`
+  WHERE
+    blocks_monetization
+),
+-- this table is the new glean adblocker addons metric ping (used to be legacy telemetry)
+clients_with_adblocker_addons_cte AS (
+  SELECT
+    client_info.client_id,
+    DATE(submission_timestamp) AS submission_date,
+    TRUE AS has_adblocker_addon
+  FROM
+    `moz-fx-data-shared-prod.firefox_desktop_stable.metrics_v1`,
+    UNNEST(JSON_QUERY_ARRAY(metrics.object.addons_active_addons)) AS addons
+  INNER JOIN
+    adblocker_addons_cte -- it's an inner join in https://github.com/mozilla/bigquery-etl/blob/main/sql/moz-fx-data-shared-prod/search_derived/search_clients_daily_v8/query.sql
+    -- left join adblocker_addons_cte
+    ON adblocker_addons_cte.addon_id = JSON_VALUE(addons, '$.id')
+  WHERE
+    -- date(submission_timestamp) = @submission_date
+    -- this is for test runs
+    DATE(submission_timestamp) = '2025-06-07'
+    -- and sample_id = 0
+    AND NOT BOOL(JSON_QUERY(addons, '$.userDisabled'))
+    AND NOT BOOL(JSON_QUERY(addons, '$.appDisabled'))
+    AND NOT BOOL(JSON_QUERY(addons, '$.blocklisted'))
+  GROUP BY
+    client_id,
+    DATE(submission_timestamp)
+),
+-- we still need this because of document_id is not null
+is_enterprise_cte AS (
+  SELECT
+    client_info.client_id,
+    DATE(submission_timestamp) AS submission_date,
+    mozfun.stats.mode_last(
+      ARRAY_AGG(metrics.boolean.policies_is_enterprise ORDER BY submission_timestamp)
+    ) AS policies_is_enterprise
+  FROM
+    `moz-fx-data-shared-prod.firefox_desktop_stable.metrics_v1`
+  WHERE
+    -- date(submission_timestamp) = @submission_date
+    -- this is for test runs
+    DATE(submission_timestamp) = '2025-06-07'
+    -- and sample_id = 0
+    AND document_id IS NOT NULL
+  GROUP BY
+    client_id,
+    DATE(submission_timestamp)
+),
+-- this cte gets client_info and sap counts from the sap.counts events ping and aggregates at client_id, day
+sap_clients_events_cte AS (
+  SELECT
+    -- distinct
+    DATE(submission_timestamp) AS submission_date,
+    client_id AS client_id,
+    CASE
+      WHEN JSON_VALUE(event_extra.provider_id) = 'other'
+        THEN `moz-fx-data-shared-prod.udf.normalize_search_engine`(
+            JSON_VALUE(event_extra.provider_name)
+          )
+      ELSE `moz-fx-data-shared-prod.udf.normalize_search_engine`(
+          JSON_VALUE(event_extra.provider_id)
+        )
+    END AS normalized_engine, -- this is "engine" in v8
+    CASE
+      WHEN JSON_VALUE(event_extra.source) = 'urlbar-handoff'
+        THEN 'urlbar_handoff'
+      ELSE JSON_VALUE(event_extra.source)
+    END AS source,
+    -- end as search_access_point, -- rename
+    normalized_country_code AS country,
+    normalized_app_name AS app_version,
+    client_info.app_channel AS channel,
+    ping_info.parsed_start_time AS subsession_start_date,
+    client_info.locale,
+    client_info.os,
+    client_info.os_version,
+    normalized_os_version,
+    UNIX_DATE(DATE(safe_parse_timestamp(client_info.first_run_date))) AS profile_creation_date,
+    [
+      STRUCT(
+        json_keys(experiments)[OFFSET(0)] AS key,
+        STRUCT(
+          REPLACE(
+            TO_JSON_STRING(experiments[json_keys(experiments)[OFFSET(0)]].branch),
+            '"',
+            ''
+          ) AS branch,
+          STRUCT(
+            REPLACE(
+              TO_JSON_STRING(experiments[json_keys(experiments)[OFFSET(0)]].extra.type),
+              '"',
+              ''
+            ) AS type,
+            REPLACE(
+              TO_JSON_STRING(experiments[json_keys(experiments)[OFFSET(0)]].extra.enrollment_id),
+              '"',
+              ''
+            ) AS enrollment_id
+          ) AS extra
+        ) AS value
+      )
+    ] AS experiments,
+    normalized_channel,
+    normalized_os,
+    sample_id,
+    profile_group_id,
+    legacy_telemetry_client_id, -- adding this for now so people can join to it if needed
+    JSON_VALUE(event_extra.provider_name) AS provider_name,
+    JSON_VALUE(event_extra.provider_id) AS provider_id,
+    JSON_VALUE(event_extra.partner_code) AS partner_code,
+    JSON_VALUE(event_extra.overridden_by_third_party) AS overridden_by_third_party
+  FROM
+    `mozdata.firefox_desktop.events_stream`
+  WHERE
+    -- date(submission_timestamp) = @submission_date
+    -- this is for test runs
+    DATE(submission_timestamp) = '2025-06-07'
+    -- and sample_id = 0
+    AND event = 'sap.counts'
+),
+sap_clients_events_adblocker_is_enterprise_cte AS (
+  SELECT
+    sap_clients_events_cte.subsession_start_date,
+    sap_clients_events_cte.submission_date,
+    sap_clients_events_cte.client_id,
+    sap_clients_events_cte.normalized_engine,
+    sap_clients_events_cte.source,
+    -- sap_clients_events_cte.source as search_access_point, -- rename
+    sap_clients_events_cte.country,
+    sap_clients_events_cte.app_version,
+    sap_clients_events_cte.channel,
+    sap_clients_events_cte.normalized_channel,
+    sap_clients_events_cte.locale,
+    sap_clients_events_cte.os,
+    sap_clients_events_cte.normalized_os,
+    sap_clients_events_cte.os_version,
+    sap_clients_events_cte.normalized_os_version,
+    sap_clients_events_cte.profile_creation_date,
+    sap_clients_events_cte.sample_id,
+    sap_clients_events_cte.profile_group_id,
+    sap_clients_events_cte.legacy_telemetry_client_id,
+    sap_clients_events_cte.experiments,
+    sap_clients_events_cte.provider_name,
+    sap_clients_events_cte.provider_id,
+    sap_clients_events_cte.partner_code,
+    sap_clients_events_cte.overridden_by_third_party,
+    clients_with_adblocker_addons_cte.has_adblocker_addon,
+    is_enterprise_cte.policies_is_enterprise
+  FROM
+    sap_clients_events_cte
+  LEFT JOIN
+    clients_with_adblocker_addons_cte
+    USING (client_id, submission_date)
+  LEFT JOIN
+    is_enterprise_cte
+    USING (client_id, submission_date)
+),
+sap_aggregates_cte AS (
+  SELECT
+    DATE(submission_timestamp) AS submission_date,
+    client_id,
+    CASE
+      WHEN JSON_VALUE(event_extra.provider_id) = 'other'
+        THEN `moz-fx-data-shared-prod.udf.normalize_search_engine`(
+            JSON_VALUE(event_extra.provider_name)
+          )
+      ELSE `moz-fx-data-shared-prod.udf.normalize_search_engine`(
+          JSON_VALUE(event_extra.provider_id)
+        )
+    END AS normalized_engine,
+    CASE
+      WHEN JSON_VALUE(event_extra.source) = 'urlbar-handoff'
+        THEN 'urlbar_handoff'
+      ELSE JSON_VALUE(event_extra.source)
+    END AS source,
+  -- end as search_access_point, -- rename
+    MAX(UNIX_DATE(DATE((ping_info.parsed_start_time)))) - MAX(
+      UNIX_DATE(DATE(safe_parse_timestamp(client_info.first_run_date)))
+    ) AS profile_age_in_days,
+    COUNT(*) AS sap,
+    COUNTIF(ping_info.seq = 1) AS sessions_started_on_this_day
+  FROM
+    `mozdata.firefox_desktop.events_stream`
+  WHERE
+  -- date(submission_timestamp) = @submission_date
+  -- this is for test runs
+    DATE(submission_timestamp) = '2025-06-07'
+  -- and sample_id = 0
+    AND event = 'sap.counts'
+  GROUP BY
+    submission_date,
+    client_id,
+    normalized_engine,
+    source
+  -- search_access_point
+),
+final_sap_cte AS (
+  SELECT
+    sap_clients_events_adblocker_is_enterprise_cte.subsession_start_date,
+    sap_clients_events_adblocker_is_enterprise_cte.submission_date,
+    sap_clients_events_adblocker_is_enterprise_cte.client_id,
+    sap_clients_events_adblocker_is_enterprise_cte.normalized_engine,
+    sap_clients_events_adblocker_is_enterprise_cte.source,
+  -- sap_clients_events_adblocker_is_enterprise_cte.search_access_point, -- rename
+    sap_clients_events_adblocker_is_enterprise_cte.country,
+    sap_clients_events_adblocker_is_enterprise_cte.app_version,
+    sap_clients_events_adblocker_is_enterprise_cte.locale,
+    sap_clients_events_adblocker_is_enterprise_cte.os,
+    sap_clients_events_adblocker_is_enterprise_cte.os_version,
+    sap_clients_events_adblocker_is_enterprise_cte.profile_creation_date,
+    sap_clients_events_adblocker_is_enterprise_cte.channel,
+    sap_clients_events_adblocker_is_enterprise_cte.normalized_channel,
+    sap_clients_events_adblocker_is_enterprise_cte.normalized_os,
+    sap_clients_events_adblocker_is_enterprise_cte.normalized_os_version,
+    sap_clients_events_adblocker_is_enterprise_cte.sample_id,
+    sap_clients_events_adblocker_is_enterprise_cte.profile_group_id,
+    sap_clients_events_adblocker_is_enterprise_cte.legacy_telemetry_client_id,
+    sap_clients_events_adblocker_is_enterprise_cte.experiments,
+    sap_clients_events_adblocker_is_enterprise_cte.provider_name,
+    sap_clients_events_adblocker_is_enterprise_cte.provider_id,
+    sap_clients_events_adblocker_is_enterprise_cte.partner_code,
+    sap_clients_events_adblocker_is_enterprise_cte.overridden_by_third_party,
+    sap_clients_events_adblocker_is_enterprise_cte.has_adblocker_addon,
+    sap_clients_events_adblocker_is_enterprise_cte.policies_is_enterprise,
+    sap_aggregates_cte.profile_age_in_days,
+    sap_aggregates_cte.sap,
+    sap_aggregates_cte.sessions_started_on_this_day
+  FROM
+    sap_clients_events_adblocker_is_enterprise_cte
+  LEFT JOIN
+    sap_aggregates_cte
+    USING (client_id, submission_date, normalized_engine, source)
+    -- using(client_id, submission_date, normalized_engine, search_access_point) -- rename
+),
+serp_array_aggs_cte AS (
+  SELECT
+    submission_date,
+    glean_client_id AS client_id,
+    sap_source AS serp_search_access_point,
+    `moz-fx-data-shared-prod.udf.normalize_search_engine`(search_engine) AS serp_provider_id,
+    ARRAY_AGG(partner_code IGNORE NULLS) AS serp_partner_code,
+    ARRAY_AGG(ad_blocker_inferred) AS serp_ad_blocker_inferred
+  FROM
+    `mozdata.firefox_desktop.serp_events`
+  WHERE
+    -- date(submission_timestamp) = @submission_date
+    -- this is for test runs
+    submission_date = '2025-06-07'
+    -- and sample_id = 0
+  GROUP BY
+    submission_date,
+    client_id,
+    serp_provider_id,
+    serp_search_access_point
+),
+serp_aggregates_cte AS (
+  SELECT
+    submission_date,
+    glean_client_id AS client_id,
+    sap_source AS serp_search_access_point,
+    `moz-fx-data-shared-prod.udf.normalize_search_engine`(search_engine) AS serp_provider_id,
+    COUNTIF(
+      (is_tagged IS TRUE)
+      AND (
+        sap_source = 'follow_on_from_refine_on_incontent_search'
+        OR sap_source = 'follow_on_from_refine_on_SERP'
+      )
+    ) AS serp_follow_on_searches_tagged_count,
+    COUNTIF(is_tagged IS TRUE) AS serp_searches_tagged_count,
+    COUNTIF(is_tagged IS FALSE) AS serp_searches_organic_count,
+    COUNTIF(is_tagged IS FALSE AND num_ads_visible > 0) AS serp_with_ads_organic_count,
+    SUM(CASE WHEN is_tagged IS TRUE THEN num_ad_clicks ELSE 0 END) AS serp_ad_clicks_tagged_count,
+    SUM(CASE WHEN is_tagged IS FALSE THEN num_ad_clicks ELSE 0 END) AS serp_ad_clicks_organic_count,
+    SUM(num_ad_clicks) AS num_ad_clicks,
+    SUM(num_non_ad_link_clicks) AS num_non_ad_link_clicks,
+    SUM(num_other_engagements) AS num_other_engagements,
+    SUM(num_ads_loaded) AS num_ads_loaded,
+    SUM(num_ads_visible) AS num_ads_visible,
+    SUM(num_ads_blocked) AS num_ads_blocked,
+    SUM(num_ads_notshowing) AS num_ads_notshowing
+  FROM
+    `mozdata.firefox_desktop.serp_events`
+  WHERE
+  -- date(submission_timestamp) = @submission_date
+  -- this is for test runs
+    submission_date = '2025-06-07'
+  -- and sample_id = 0
+  GROUP BY
+    submission_date,
+    client_id,
+    serp_provider_id,
+    serp_search_access_point
+),
+serp_ad_click_target_cte AS (
+  SELECT
+    submission_date,
+    glean_client_id AS client_id,
+    sap_source AS serp_search_access_point,
+    `moz-fx-data-shared-prod.udf.normalize_search_engine`(search_engine) AS serp_provider_id,
+    ARRAY_AGG(DISTINCT ad_components.component) AS ad_click_target
+  FROM
+    `mozdata.firefox_desktop.serp_events`
+  CROSS JOIN
+    UNNEST(ad_components) AS ad_components
+  WHERE
+    submission_date = '2025-06-07'
+  -- and sample_id = 0
+  GROUP BY
+    submission_date,
+    client_id,
+    serp_provider_id,
+    serp_search_access_point
+),
+final_serp_cte AS (
+  SELECT
+    serp_array_aggs_cte.submission_date,
+    serp_array_aggs_cte.client_id,
+    serp_array_aggs_cte.serp_provider_id,
+    serp_array_aggs_cte.serp_search_access_point,
+    serp_array_aggs_cte.serp_partner_code,
+    serp_array_aggs_cte.serp_ad_blocker_inferred,
+    serp_aggregates_cte.serp_searches_tagged_count,
+    serp_aggregates_cte.serp_searches_organic_count,
+    serp_aggregates_cte.serp_with_ads_organic_count,
+    serp_aggregates_cte.serp_ad_clicks_tagged_count,
+    serp_aggregates_cte.serp_ad_clicks_organic_count,
+    serp_aggregates_cte.num_ad_clicks,
+    serp_aggregates_cte.num_non_ad_link_clicks,
+    serp_aggregates_cte.num_other_engagements,
+    serp_aggregates_cte.num_ads_loaded,
+    serp_aggregates_cte.num_ads_visible,
+    serp_aggregates_cte.num_ads_blocked,
+    serp_aggregates_cte.num_ads_notshowing,
+    serp_ad_click_target_cte.ad_click_target
+  FROM
+    serp_array_aggs_cte
+  LEFT JOIN
+    serp_aggregates_cte
+    USING (client_id, submission_date, serp_provider_id, serp_search_access_point)
+  LEFT JOIN
+    serp_ad_click_target_cte
+    USING (client_id, submission_date, serp_provider_id, serp_search_access_point)
+),
+glean_metrics_cte AS (
+  SELECT
+    client_info.client_id,
+    DATE(submission_timestamp) AS submission_date,
+    sap.normalized_engine,
+    sap.source,
+  -- sap.search_access_point, -- rename
+    client_info.distribution.name AS distribution_id,
+    client_info.windows_build_number AS windows_build_number,
+    metrics.string.region_home_region AS user_pref_browser_search_region,
+    metrics.labeled_counter.browser_is_user_default AS is_default_browser,
+  -- metrics.labeled_counter.browser_is_user_default as browser_is_user_default -- rename
+    metrics.string.search_engine_default_display_name AS default_search_engine,
+  -- metrics.string.search_engine_default_display_name as default_search_engine_display_name, -- rename
+    metrics.string.search_engine_default_load_path AS default_search_engine_data_load_path,
+  -- metrics.string.search_engine_default_load_path as default_search_engine_load_path, -- rename
+    metrics.url2.search_engine_default_submission_url AS default_search_engine_data_submission_url,
+  -- metrics.url2.search_engine_default_submission_url as default_search_engine_submission_url, -- rename
+    metrics.string.search_engine_private_display_name AS default_private_search_engine,
+  -- metrics.string.search_engine_private_display_name as default_private_search_engine_display_name, -- rename
+    metrics.string.search_engine_private_load_path AS default_private_search_engine_data_load_path,
+  -- metrics.string.search_engine_private_load_path as default_private_search_engine_load_path, -- rename
+    metrics.url2.search_engine_private_submission_url AS default_private_search_engine_data_submission_url,
+  -- metrics.url2.search_engine_private_submission_url as default_private_search_engine_submission_url, -- rename
+    metrics.string.search_engine_default_provider_id AS default_search_engine_provider_id,
+    metrics.string.search_engine_default_partner_code AS default_search_engine_partner_code,
+    metrics.boolean.search_engine_default_overridden_by_third_party AS default_search_engine_overridden
+  FROM
+    `moz-fx-data-shared-prod.firefox_desktop_stable.metrics_v1` AS metrics_v1
+  LEFT JOIN
+    sap_clients_events_cte AS sap
+    ON metrics_v1.client_info.client_id = sap.client_id
+    AND DATE(metrics_v1.submission_timestamp) = sap.submission_date
+  WHERE
+  -- date(submission_timestamp) = @submission_date
+  -- this is for test runs
+    DATE(metrics_v1.submission_timestamp) = '2025-06-07'
+  -- and metrics_v1.sample_id = 0
+  QUALIFY
+    ROW_NUMBER() OVER (
+      PARTITION BY
+        client_info.client_id
+      ORDER BY
+        submission_timestamp DESC
+    ) = 1 -- this is to get the last instance
+),
+glean_aggregates_cte AS (
+  SELECT
+    client_info.client_id AS client_id,
+    DATE(submission_timestamp) AS submission_date,
+    sap.normalized_engine,
+    sap.source,
+  -- sap.search_access_point, -- rename
+    SUM(metrics.counter.browser_engagement_active_ticks / (3600 / 5)) AS active_hours_sum,
+    SUM(
+      metrics.counter.browser_engagement_tab_open_event_count
+    ) AS scalar_parent_browser_engagement_tab_open_event_count_sum,
+    SUM(
+      metrics.counter.browser_engagement_uri_count
+    ) AS scalar_parent_browser_engagement_total_uri_count_sum,
+    MAX(
+      metrics.quantity.browser_engagement_max_concurrent_tab_count
+    ) AS max_concurrent_tab_count_max
+  FROM
+    `moz-fx-data-shared-prod.firefox_desktop_stable.metrics_v1` AS metrics_v1
+  LEFT JOIN
+    sap_clients_events_cte AS sap
+    ON metrics_v1.client_info.client_id = sap.client_id
+    AND DATE(metrics_v1.submission_timestamp) = sap.submission_date
+  WHERE
+  -- date(submission_timestamp) = @submission_date
+  -- this is for test runs
+    DATE(submission_timestamp) = '2025-06-07'
+  -- and metrics_v1.sample_id = 0
+  GROUP BY
+    client_id,
+    submission_date,
+    sap.normalized_engine,
+    sap.source
+  -- sap.search_access_point -- rename
+),
+final_glean_cte AS (
+  SELECT
+    glean_metrics_cte.client_id,
+    glean_metrics_cte.submission_date,
+    glean_metrics_cte.normalized_engine,
+    glean_metrics_cte.source,
+  -- glean_metrics_cte.search_access_point, -- rename
+    glean_metrics_cte.windows_build_number,
+    glean_metrics_cte.distribution_id,
+    glean_metrics_cte.user_pref_browser_search_region,
+    glean_metrics_cte.is_default_browser,
+  -- glean_metrics_cte.browser_is_user_default, -- rename
+    glean_metrics_cte.default_search_engine,
+  -- glean_metrics_cte.default_search_engine_display_name, -- rename
+    glean_metrics_cte.default_search_engine_data_load_path,
+  -- glean_metrics_cte.default_search_engine_load_path, -- rename
+    glean_metrics_cte.default_search_engine_data_submission_url,
+  -- glean_metrics_cte.default_search_engine_submission_url, -- rename
+    glean_metrics_cte.default_private_search_engine,
+  -- glean_metrics_cte.default_private_search_engine_display_name, -- rename
+    glean_metrics_cte.default_private_search_engine_data_load_path,
+  -- glean_metrics_cte.default_private_search_engine_load_path, -- rename
+    glean_metrics_cte.default_private_search_engine_data_submission_url,
+  -- glean_metrics_cte.default_private_search_engine_submission_url, -- rename
+    glean_metrics_cte.default_search_engine_provider_id,
+    glean_metrics_cte.default_search_engine_partner_code,
+    glean_metrics_cte.default_search_engine_overridden,
+    glean_aggregates_cte.active_hours_sum,
+    glean_aggregates_cte.scalar_parent_browser_engagement_tab_open_event_count_sum,
+    glean_aggregates_cte.scalar_parent_browser_engagement_total_uri_count_sum,
+    glean_aggregates_cte.max_concurrent_tab_count_max
+  FROM
+    glean_metrics_cte
+  LEFT JOIN
+    glean_aggregates_cte
+    USING (client_id, submission_date, normalized_engine, source)
+    -- using(client_id, submission_date, normalized_engine, search_access_point) -- rename
+),
+final_joined_cte AS (
+  SELECT
+    final_sap_cte.sample_id,
+    final_sap_cte.submission_date,
+    final_sap_cte.client_id,
+    final_sap_cte.legacy_telemetry_client_id,
+    final_sap_cte.normalized_engine,
+    final_sap_cte.source,
+  -- final_sap_cte.search_access_point, -- rename
+    final_serp_cte.serp_search_access_point,
+    final_sap_cte.provider_name,
+    final_sap_cte.provider_id,
+    final_serp_cte.serp_provider_id,
+    final_sap_cte.partner_code,
+    final_serp_cte.serp_partner_code,
+    final_sap_cte.country,
+    final_sap_cte.app_version,
+    final_glean_cte.windows_build_number,
+    final_glean_cte.distribution_id,
+    final_sap_cte.locale,
+    final_glean_cte.user_pref_browser_search_region,
+    final_sap_cte.os,
+    final_sap_cte.normalized_os,
+    final_sap_cte.os_version,
+    final_sap_cte.normalized_os_version,
+    final_sap_cte.channel,
+    final_sap_cte.normalized_channel,
+    final_glean_cte.is_default_browser,
+  -- final_glean_cte.browser_is_user_default, -- rename
+    final_sap_cte.profile_creation_date,
+    final_sap_cte.profile_age_in_days,
+    final_sap_cte.profile_group_id,
+    final_glean_cte.default_search_engine,
+  -- final_glean_cte.default_search_engine_display_name, -- rename
+    final_glean_cte.default_search_engine_data_load_path,
+  -- final_glean_cte.default_search_engine_load_path, -- rename
+    final_glean_cte.default_search_engine_data_submission_url,
+  -- final_glean_cte.default_search_engine_submission_url, -- rename
+    final_glean_cte.default_private_search_engine,
+  -- final_glean_cte.default_private_search_engine_display_name, -- rename
+    final_glean_cte.default_private_search_engine_data_load_path,
+  -- final_glean_cte.default_private_search_engine_load_path, -- rename
+    final_glean_cte.default_private_search_engine_data_submission_url,
+  -- final_glean_cte.default_private_search_engine_submission_url, -- rename
+    final_sap_cte.sessions_started_on_this_day,
+    final_glean_cte.max_concurrent_tab_count_max,
+    final_glean_cte.scalar_parent_browser_engagement_tab_open_event_count_sum,
+    final_glean_cte.active_hours_sum,
+    final_glean_cte.scalar_parent_browser_engagement_total_uri_count_sum,
+    final_sap_cte.experiments,
+    final_sap_cte.policies_is_enterprise,
+    final_glean_cte.default_search_engine_provider_id,
+    final_glean_cte.default_search_engine_partner_code,
+    final_glean_cte.default_search_engine_overridden,
+    final_sap_cte.overridden_by_third_party,
+    final_sap_cte.has_adblocker_addon,
+    final_sap_cte.sap,
+    final_serp_cte.serp_ad_blocker_inferred,
+    final_serp_cte.serp_searches_tagged_count,
+    final_serp_cte.serp_searches_organic_count,
+    final_serp_cte.serp_with_ads_organic_count,
+    final_serp_cte.serp_ad_clicks_tagged_count,
+    final_serp_cte.serp_ad_clicks_organic_count,
+    final_serp_cte.num_ad_clicks,
+    final_serp_cte.num_non_ad_link_clicks,
+    final_serp_cte.num_other_engagements,
+    final_serp_cte.num_ads_loaded,
+    final_serp_cte.num_ads_visible,
+    final_serp_cte.num_ads_blocked,
+    final_serp_cte.num_ads_notshowing,
+    final_serp_cte.ad_click_target
+  FROM
+    final_sap_cte
+  LEFT JOIN
+    final_serp_cte
+    ON final_sap_cte.client_id = final_serp_cte.client_id
+    AND final_sap_cte.submission_date = final_serp_cte.submission_date
+    AND final_sap_cte.normalized_engine = final_serp_cte.serp_provider_id
+    AND final_sap_cte.source = final_serp_cte.serp_search_access_point
+    -- and final_sap_cte.search_access_point = final_serp_cte.serp_search_access_point -- rename
+  LEFT JOIN
+    final_glean_cte
+    ON final_sap_cte.client_id = final_glean_cte.client_id
+    AND final_sap_cte.submission_date = final_glean_cte.submission_date
+    AND final_sap_cte.normalized_engine = final_glean_cte.normalized_engine
+    AND final_sap_cte.source = final_glean_cte.source
+    -- and final_sap_cte.search_access_point = final_glean_cte.search_access_point -- rename
+)
+SELECT
+  *
+FROM
+  final_joined_cte

Link to full diff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants