dbt model definitions for use with Twilio-Segment's Profiles Sync feature.
Profiles Sync lands _update tables, the complete historical record of your
identity graph. The models in this repo are designed to help you produce your
identity graph's most recent state.
This repo is tested for
- Snowflake
- BigQuery
- Redshift
Profiles Sync lands data non-destructively in your warehouse, giving you access to the complete history of merges, traits, and external ID associations. This allows you to monitor identity health, and reconstruct a prior state of a profile (e.g. prior to a merge).
For many practical use cases, you will want to materialize and use your identity graph's current state.
Specifically, we recommend you create:
| Materialized Table | Source Table | Description | 
|---|---|---|
| id_graph | id_graph_updated | a lookup table for identities - i.e. where you can join using segment_idto combine events that then resolve to a singlecanonical_segment_id | 
| external_id_mapping | external_id_mapping_updates | a lookup table for external identifiers such as user_id,email, etc. | 
| profile_traits | profile_traits_updates | a table of people and traits - one row per canonical_segment_id. | 
| profile_traits | identifies | As of August 1st, 2023, profile_traitsis superceded by a materlization that sources fromprofile_traits_updates | 
profile_traits - Columns in this table are automatically populated from the columns of its source table profile_traits_updates.
As we do not delete any rows, this table also includes merged-away profiles with their trait values set to NULL.
To identify an older profile, its merged_to column will be populated with its latest segment_id.
For convenience, this package will produce (or "materialize") all three of those models.
dbt_project.yml contains 4 main configuration variables:
- 
profiles: the name dbt parses to fetch warehouse credentials fromprofiles.yml. Refer dbt docs on how to configure your warehouse connection profilesprofiles.yml.
- 
schema_name: name of the Personas space/schema where the base tables land. Defaults toidentified_events.
- 
etl_overlap: frequency (in hours) of materialization runs. We recommend this number to span 1-2 previous cycles to account for different tables landing at different times. (e.g. if you build every24hrs, we might suggest setting this to24*2+1=49hrs).
- 
materialization: means of materializing above views. Refer dbt docs: materializations for more details.- incremental(default): most efficient use of computation, incrementally-materialized views
- table: less efficient, but also less-complicated materialized tables - each table will be rebuilt from scratch each time.
- view: (non-materialized) view definitions which can also be leveraged if you want to avoid dbt orchestration entirely - simply- dbt run(locally) a single time to establish the definitions (note that you will need to re-run if new traits are instrumented, to add those traits as new columns)
 
You can download this package, standalone, update your configuration as per the section above, and dbt run.
You can also import this dbt package as a module in an existing project. dbt offers excellent (albeit slightly outdated) documentation on how that works:
The steps for module import are:
- 
Specify the package URL in your packages.ymlfile (should be in the main directory of your dbt project; you may need to create the file if it doesn't exist) OR Clone the directory and specify the local directory where you have cloned it (since the repo isn't currently public, this may save some headache)
- 
Copy the requisite bits of configuration from this package's dbt_profiles.ymlto your project'sdbt_profiles.yml. Specifically, you'll want to add theprofiles_syncsections below to thevarsandmodelssections of your owndbt_profiles.yml.
vars:
  profiles_sync:
    schema_name: profiles_v1 # replace this with the name of the schema where Profiles Sync is landing its source tables
    etl_overlap: 28 # should be set to an interval (in hours) that's a bit larger than your materialization cadence
models:
  profiles_sync:
    +on_schema_change: "append_new_columns"
    profile_materializations:
      +materialized: incremental