Skip to content

Jayanth kumar structuredlabs patch 1 #17

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jayanth-kumar-structuredlabs
Copy link
Owner

No description provided.

@structured-bot-beta
Copy link

Thanks for opening this PR!

Total commits: 2
Files changed: 2
Additions: 6
Deletions: 0

Commits:
e0b3ddc: Update dbt_project.yml
0c1ec81: Update notes.md

Changes:
File: jaffle_shop/dbt_project.yml

Original Content:


# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'jaffle_shop'
version: '1.0.0'

# This setting configures which "profile" dbt uses for this project.
profile: 'jaffle_shop'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  jaffle_shop:
    # Config indicated by + and applies to all files under models/example/
    +materialized: table

Changes:

@@ -14,6 +14,9 @@ profile: 'jaffle_shop'
 model-paths: ["models"]
 analysis-paths: ["analyses"]
 test-paths: ["tests"]
+
+
+
 seed-paths: ["seeds"]
 macro-paths: ["macros"]
 snapshot-paths: ["snapshots"]

File: notes.md

Original Content:

Setting up dbt

- used dbt-core, not dbt-cloud
	- warehouse (bigquery in this case) - structured-app-test
	- github repo - shivam-singhal/dbt-tutorial
		- ran `dbt init jaffle_shop` - this create a `jaffle_shop` and `logs` dir in `dbt-tutorial`
- dbt has models (which are select sql statements)
- These models can be composed of other models - hence a DAG structure. Each node can be run independently (given its dependencies are run too)
- Testing is built-in
- Version control is "built-in" via storing dbt configs in git (Github)
- commands
	- `dbt run` - run the sql queries against the data in the warehouse
		- `dbt run --full-refresh`
		- `dbt run --select <>` - to only run (or test specific models)
	- `dbt test` - validate that data has certain properties (e.g. non-null, unique, consists of certain values, etc.)
	- `dbt debug` - test .dbt/profiles.yml configuration (where bigquery connection information is stored)

What's missing are **metrics**. Lightdash takes the dbt models, and each column of the dbt model becomes a dimension of the table. 

How are Lightdash tables different from dbt models? is it 1:1?

`docker-compose -f docker-compose.yml --env-file .env up --detach --remove-orphans` to run lightdash locally from the repo

`docker exec -it <container_id> psql -U postgres -d postgres` to run psql from inside the postgres container on my local machine to inspect the postgres table

`lightdash preview` allows me to update `schema.yml` and have it updated in preview mode - so I don't always have to push to github and refresh

lightdash defines its own metrics - via the `schema.yml` file in the dbt project - these are other metas, like dimensions. The other way to add metrics is via dbt's semantic layer (availabe in dbt cloud).
- This is what we'd be replacing with our own metrics layer
- This is done using the `meta` tag in the `schema.yml` file that dbt uses.
    - this kinda sucks - it's mixing sql w/ yaml


MetricFlow
`dbt-metricflow`

Changes:

@@ -1,5 +1,8 @@
 Setting up dbt
 
+
+
+
 - used dbt-core, not dbt-cloud
 	- warehouse (bigquery in this case) - structured-app-test
 	- github repo - shivam-singhal/dbt-tutorial

All filepaths in the repo:
README.md
jaffle_shop/.gitignore
jaffle_shop/README.md
jaffle_shop/analyses/.gitkeep
jaffle_shop/dbt_project.yml
jaffle_shop/macros/.gitkeep
jaffle_shop/models/customers.sql
jaffle_shop/models/schema.yml
jaffle_shop/models/staging/stg_customers.sql
jaffle_shop/models/staging/stg_orders.sql
jaffle_shop/seeds/.gitkeep
jaffle_shop/snapshots/.gitkeep
jaffle_shop/tests/.gitkeep
logs/dbt.log
notes.md

6 similar comments
@structured-bot-beta
Copy link

Thanks for opening this PR!

Total commits: 2
Files changed: 2
Additions: 6
Deletions: 0

Commits:
e0b3ddc: Update dbt_project.yml
0c1ec81: Update notes.md

Changes:
File: jaffle_shop/dbt_project.yml

Original Content:


# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'jaffle_shop'
version: '1.0.0'

# This setting configures which "profile" dbt uses for this project.
profile: 'jaffle_shop'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  jaffle_shop:
    # Config indicated by + and applies to all files under models/example/
    +materialized: table

Changes:

@@ -14,6 +14,9 @@ profile: 'jaffle_shop'
 model-paths: ["models"]
 analysis-paths: ["analyses"]
 test-paths: ["tests"]
+
+
+
 seed-paths: ["seeds"]
 macro-paths: ["macros"]
 snapshot-paths: ["snapshots"]

File: notes.md

Original Content:

Setting up dbt

- used dbt-core, not dbt-cloud
	- warehouse (bigquery in this case) - structured-app-test
	- github repo - shivam-singhal/dbt-tutorial
		- ran `dbt init jaffle_shop` - this create a `jaffle_shop` and `logs` dir in `dbt-tutorial`
- dbt has models (which are select sql statements)
- These models can be composed of other models - hence a DAG structure. Each node can be run independently (given its dependencies are run too)
- Testing is built-in
- Version control is "built-in" via storing dbt configs in git (Github)
- commands
	- `dbt run` - run the sql queries against the data in the warehouse
		- `dbt run --full-refresh`
		- `dbt run --select <>` - to only run (or test specific models)
	- `dbt test` - validate that data has certain properties (e.g. non-null, unique, consists of certain values, etc.)
	- `dbt debug` - test .dbt/profiles.yml configuration (where bigquery connection information is stored)

What's missing are **metrics**. Lightdash takes the dbt models, and each column of the dbt model becomes a dimension of the table. 

How are Lightdash tables different from dbt models? is it 1:1?

`docker-compose -f docker-compose.yml --env-file .env up --detach --remove-orphans` to run lightdash locally from the repo

`docker exec -it <container_id> psql -U postgres -d postgres` to run psql from inside the postgres container on my local machine to inspect the postgres table

`lightdash preview` allows me to update `schema.yml` and have it updated in preview mode - so I don't always have to push to github and refresh

lightdash defines its own metrics - via the `schema.yml` file in the dbt project - these are other metas, like dimensions. The other way to add metrics is via dbt's semantic layer (availabe in dbt cloud).
- This is what we'd be replacing with our own metrics layer
- This is done using the `meta` tag in the `schema.yml` file that dbt uses.
    - this kinda sucks - it's mixing sql w/ yaml


MetricFlow
`dbt-metricflow`

Changes:

@@ -1,5 +1,8 @@
 Setting up dbt
 
+
+
+
 - used dbt-core, not dbt-cloud
 	- warehouse (bigquery in this case) - structured-app-test
 	- github repo - shivam-singhal/dbt-tutorial

All filepaths in the repo:
README.md
jaffle_shop/.gitignore
jaffle_shop/README.md
jaffle_shop/analyses/.gitkeep
jaffle_shop/dbt_project.yml
jaffle_shop/macros/.gitkeep
jaffle_shop/models/customers.sql
jaffle_shop/models/schema.yml
jaffle_shop/models/staging/stg_customers.sql
jaffle_shop/models/staging/stg_orders.sql
jaffle_shop/seeds/.gitkeep
jaffle_shop/snapshots/.gitkeep
jaffle_shop/tests/.gitkeep
logs/dbt.log
notes.md

@structured-bot-beta
Copy link

Thanks for opening this PR!

Total commits: 2
Files changed: 2
Additions: 6
Deletions: 0

Commits:
e0b3ddc: Update dbt_project.yml
0c1ec81: Update notes.md

Changes:
File: jaffle_shop/dbt_project.yml

Original Content:


# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'jaffle_shop'
version: '1.0.0'

# This setting configures which "profile" dbt uses for this project.
profile: 'jaffle_shop'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  jaffle_shop:
    # Config indicated by + and applies to all files under models/example/
    +materialized: table

Changes:

@@ -14,6 +14,9 @@ profile: 'jaffle_shop'
 model-paths: ["models"]
 analysis-paths: ["analyses"]
 test-paths: ["tests"]
+
+
+
 seed-paths: ["seeds"]
 macro-paths: ["macros"]
 snapshot-paths: ["snapshots"]

File: notes.md

Original Content:

Setting up dbt

- used dbt-core, not dbt-cloud
	- warehouse (bigquery in this case) - structured-app-test
	- github repo - shivam-singhal/dbt-tutorial
		- ran `dbt init jaffle_shop` - this create a `jaffle_shop` and `logs` dir in `dbt-tutorial`
- dbt has models (which are select sql statements)
- These models can be composed of other models - hence a DAG structure. Each node can be run independently (given its dependencies are run too)
- Testing is built-in
- Version control is "built-in" via storing dbt configs in git (Github)
- commands
	- `dbt run` - run the sql queries against the data in the warehouse
		- `dbt run --full-refresh`
		- `dbt run --select <>` - to only run (or test specific models)
	- `dbt test` - validate that data has certain properties (e.g. non-null, unique, consists of certain values, etc.)
	- `dbt debug` - test .dbt/profiles.yml configuration (where bigquery connection information is stored)

What's missing are **metrics**. Lightdash takes the dbt models, and each column of the dbt model becomes a dimension of the table. 

How are Lightdash tables different from dbt models? is it 1:1?

`docker-compose -f docker-compose.yml --env-file .env up --detach --remove-orphans` to run lightdash locally from the repo

`docker exec -it <container_id> psql -U postgres -d postgres` to run psql from inside the postgres container on my local machine to inspect the postgres table

`lightdash preview` allows me to update `schema.yml` and have it updated in preview mode - so I don't always have to push to github and refresh

lightdash defines its own metrics - via the `schema.yml` file in the dbt project - these are other metas, like dimensions. The other way to add metrics is via dbt's semantic layer (availabe in dbt cloud).
- This is what we'd be replacing with our own metrics layer
- This is done using the `meta` tag in the `schema.yml` file that dbt uses.
    - this kinda sucks - it's mixing sql w/ yaml


MetricFlow
`dbt-metricflow`

Changes:

@@ -1,5 +1,8 @@
 Setting up dbt
 
+
+
+
 - used dbt-core, not dbt-cloud
 	- warehouse (bigquery in this case) - structured-app-test
 	- github repo - shivam-singhal/dbt-tutorial

All filepaths in the repo:
README.md
jaffle_shop/.gitignore
jaffle_shop/README.md
jaffle_shop/analyses/.gitkeep
jaffle_shop/dbt_project.yml
jaffle_shop/macros/.gitkeep
jaffle_shop/models/customers.sql
jaffle_shop/models/schema.yml
jaffle_shop/models/staging/stg_customers.sql
jaffle_shop/models/staging/stg_orders.sql
jaffle_shop/seeds/.gitkeep
jaffle_shop/snapshots/.gitkeep
jaffle_shop/tests/.gitkeep
logs/dbt.log
notes.md

@structured-bot-beta
Copy link

Thanks for opening this PR!

Total commits: 2
Files changed: 2
Additions: 6
Deletions: 0

Commits:
e0b3ddc: Update dbt_project.yml
0c1ec81: Update notes.md

Changes:
File: jaffle_shop/dbt_project.yml

Original Content:


# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'jaffle_shop'
version: '1.0.0'

# This setting configures which "profile" dbt uses for this project.
profile: 'jaffle_shop'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  jaffle_shop:
    # Config indicated by + and applies to all files under models/example/
    +materialized: table

Changes:

@@ -14,6 +14,9 @@ profile: 'jaffle_shop'
 model-paths: ["models"]
 analysis-paths: ["analyses"]
 test-paths: ["tests"]
+
+
+
 seed-paths: ["seeds"]
 macro-paths: ["macros"]
 snapshot-paths: ["snapshots"]

File: notes.md

Original Content:

Setting up dbt

- used dbt-core, not dbt-cloud
	- warehouse (bigquery in this case) - structured-app-test
	- github repo - shivam-singhal/dbt-tutorial
		- ran `dbt init jaffle_shop` - this create a `jaffle_shop` and `logs` dir in `dbt-tutorial`
- dbt has models (which are select sql statements)
- These models can be composed of other models - hence a DAG structure. Each node can be run independently (given its dependencies are run too)
- Testing is built-in
- Version control is "built-in" via storing dbt configs in git (Github)
- commands
	- `dbt run` - run the sql queries against the data in the warehouse
		- `dbt run --full-refresh`
		- `dbt run --select <>` - to only run (or test specific models)
	- `dbt test` - validate that data has certain properties (e.g. non-null, unique, consists of certain values, etc.)
	- `dbt debug` - test .dbt/profiles.yml configuration (where bigquery connection information is stored)

What's missing are **metrics**. Lightdash takes the dbt models, and each column of the dbt model becomes a dimension of the table. 

How are Lightdash tables different from dbt models? is it 1:1?

`docker-compose -f docker-compose.yml --env-file .env up --detach --remove-orphans` to run lightdash locally from the repo

`docker exec -it <container_id> psql -U postgres -d postgres` to run psql from inside the postgres container on my local machine to inspect the postgres table

`lightdash preview` allows me to update `schema.yml` and have it updated in preview mode - so I don't always have to push to github and refresh

lightdash defines its own metrics - via the `schema.yml` file in the dbt project - these are other metas, like dimensions. The other way to add metrics is via dbt's semantic layer (availabe in dbt cloud).
- This is what we'd be replacing with our own metrics layer
- This is done using the `meta` tag in the `schema.yml` file that dbt uses.
    - this kinda sucks - it's mixing sql w/ yaml


MetricFlow
`dbt-metricflow`

Changes:

@@ -1,5 +1,8 @@
 Setting up dbt
 
+
+
+
 - used dbt-core, not dbt-cloud
 	- warehouse (bigquery in this case) - structured-app-test
 	- github repo - shivam-singhal/dbt-tutorial

All filepaths in the repo:
README.md
jaffle_shop/.gitignore
jaffle_shop/README.md
jaffle_shop/analyses/.gitkeep
jaffle_shop/dbt_project.yml
jaffle_shop/macros/.gitkeep
jaffle_shop/models/customers.sql
jaffle_shop/models/schema.yml
jaffle_shop/models/staging/stg_customers.sql
jaffle_shop/models/staging/stg_orders.sql
jaffle_shop/seeds/.gitkeep
jaffle_shop/snapshots/.gitkeep
jaffle_shop/tests/.gitkeep
logs/dbt.log
notes.md

@structured-bot-beta
Copy link

Thanks for opening this PR!

Total commits: 2
Files changed: 2
Additions: 6
Deletions: 0

Commits:
e0b3ddc: Update dbt_project.yml
0c1ec81: Update notes.md

Changes:
File: jaffle_shop/dbt_project.yml

Original Content:


# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'jaffle_shop'
version: '1.0.0'

# This setting configures which "profile" dbt uses for this project.
profile: 'jaffle_shop'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  jaffle_shop:
    # Config indicated by + and applies to all files under models/example/
    +materialized: table

Changes:

@@ -14,6 +14,9 @@ profile: 'jaffle_shop'
 model-paths: ["models"]
 analysis-paths: ["analyses"]
 test-paths: ["tests"]
+
+
+
 seed-paths: ["seeds"]
 macro-paths: ["macros"]
 snapshot-paths: ["snapshots"]

File: notes.md

Original Content:

Setting up dbt

- used dbt-core, not dbt-cloud
	- warehouse (bigquery in this case) - structured-app-test
	- github repo - shivam-singhal/dbt-tutorial
		- ran `dbt init jaffle_shop` - this create a `jaffle_shop` and `logs` dir in `dbt-tutorial`
- dbt has models (which are select sql statements)
- These models can be composed of other models - hence a DAG structure. Each node can be run independently (given its dependencies are run too)
- Testing is built-in
- Version control is "built-in" via storing dbt configs in git (Github)
- commands
	- `dbt run` - run the sql queries against the data in the warehouse
		- `dbt run --full-refresh`
		- `dbt run --select <>` - to only run (or test specific models)
	- `dbt test` - validate that data has certain properties (e.g. non-null, unique, consists of certain values, etc.)
	- `dbt debug` - test .dbt/profiles.yml configuration (where bigquery connection information is stored)

What's missing are **metrics**. Lightdash takes the dbt models, and each column of the dbt model becomes a dimension of the table. 

How are Lightdash tables different from dbt models? is it 1:1?

`docker-compose -f docker-compose.yml --env-file .env up --detach --remove-orphans` to run lightdash locally from the repo

`docker exec -it <container_id> psql -U postgres -d postgres` to run psql from inside the postgres container on my local machine to inspect the postgres table

`lightdash preview` allows me to update `schema.yml` and have it updated in preview mode - so I don't always have to push to github and refresh

lightdash defines its own metrics - via the `schema.yml` file in the dbt project - these are other metas, like dimensions. The other way to add metrics is via dbt's semantic layer (availabe in dbt cloud).
- This is what we'd be replacing with our own metrics layer
- This is done using the `meta` tag in the `schema.yml` file that dbt uses.
    - this kinda sucks - it's mixing sql w/ yaml


MetricFlow
`dbt-metricflow`

Changes:

@@ -1,5 +1,8 @@
 Setting up dbt
 
+
+
+
 - used dbt-core, not dbt-cloud
 	- warehouse (bigquery in this case) - structured-app-test
 	- github repo - shivam-singhal/dbt-tutorial

All filepaths in the repo:
README.md
jaffle_shop/.gitignore
jaffle_shop/README.md
jaffle_shop/analyses/.gitkeep
jaffle_shop/dbt_project.yml
jaffle_shop/macros/.gitkeep
jaffle_shop/models/customers.sql
jaffle_shop/models/schema.yml
jaffle_shop/models/staging/stg_customers.sql
jaffle_shop/models/staging/stg_orders.sql
jaffle_shop/seeds/.gitkeep
jaffle_shop/snapshots/.gitkeep
jaffle_shop/tests/.gitkeep
logs/dbt.log
notes.md

@structured-bot-beta
Copy link

Thanks for opening this PR!

Total commits: 2
Files changed: 2
Additions: 6
Deletions: 0

Commits:
e0b3ddc: Update dbt_project.yml
0c1ec81: Update notes.md

Changes:
File: jaffle_shop/dbt_project.yml

Original Content:


# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'jaffle_shop'
version: '1.0.0'

# This setting configures which "profile" dbt uses for this project.
profile: 'jaffle_shop'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  jaffle_shop:
    # Config indicated by + and applies to all files under models/example/
    +materialized: table

Changes:

@@ -14,6 +14,9 @@ profile: 'jaffle_shop'
 model-paths: ["models"]
 analysis-paths: ["analyses"]
 test-paths: ["tests"]
+
+
+
 seed-paths: ["seeds"]
 macro-paths: ["macros"]
 snapshot-paths: ["snapshots"]

File: notes.md

Original Content:

Setting up dbt

- used dbt-core, not dbt-cloud
	- warehouse (bigquery in this case) - structured-app-test
	- github repo - shivam-singhal/dbt-tutorial
		- ran `dbt init jaffle_shop` - this create a `jaffle_shop` and `logs` dir in `dbt-tutorial`
- dbt has models (which are select sql statements)
- These models can be composed of other models - hence a DAG structure. Each node can be run independently (given its dependencies are run too)
- Testing is built-in
- Version control is "built-in" via storing dbt configs in git (Github)
- commands
	- `dbt run` - run the sql queries against the data in the warehouse
		- `dbt run --full-refresh`
		- `dbt run --select <>` - to only run (or test specific models)
	- `dbt test` - validate that data has certain properties (e.g. non-null, unique, consists of certain values, etc.)
	- `dbt debug` - test .dbt/profiles.yml configuration (where bigquery connection information is stored)

What's missing are **metrics**. Lightdash takes the dbt models, and each column of the dbt model becomes a dimension of the table. 

How are Lightdash tables different from dbt models? is it 1:1?

`docker-compose -f docker-compose.yml --env-file .env up --detach --remove-orphans` to run lightdash locally from the repo

`docker exec -it <container_id> psql -U postgres -d postgres` to run psql from inside the postgres container on my local machine to inspect the postgres table

`lightdash preview` allows me to update `schema.yml` and have it updated in preview mode - so I don't always have to push to github and refresh

lightdash defines its own metrics - via the `schema.yml` file in the dbt project - these are other metas, like dimensions. The other way to add metrics is via dbt's semantic layer (availabe in dbt cloud).
- This is what we'd be replacing with our own metrics layer
- This is done using the `meta` tag in the `schema.yml` file that dbt uses.
    - this kinda sucks - it's mixing sql w/ yaml


MetricFlow
`dbt-metricflow`

Changes:

@@ -1,5 +1,8 @@
 Setting up dbt
 
+
+
+
 - used dbt-core, not dbt-cloud
 	- warehouse (bigquery in this case) - structured-app-test
 	- github repo - shivam-singhal/dbt-tutorial

All filepaths in the repo:
README.md
jaffle_shop/.gitignore
jaffle_shop/README.md
jaffle_shop/analyses/.gitkeep
jaffle_shop/dbt_project.yml
jaffle_shop/macros/.gitkeep
jaffle_shop/models/customers.sql
jaffle_shop/models/schema.yml
jaffle_shop/models/staging/stg_customers.sql
jaffle_shop/models/staging/stg_orders.sql
jaffle_shop/seeds/.gitkeep
jaffle_shop/snapshots/.gitkeep
jaffle_shop/tests/.gitkeep
logs/dbt.log
notes.md

@structured-bot-beta
Copy link

Thanks for opening this PR!

Total commits: 2
Files changed: 2
Additions: 6
Deletions: 0

Commits:
e0b3ddc: Update dbt_project.yml
0c1ec81: Update notes.md

Changes:
File: jaffle_shop/dbt_project.yml

Original Content:


# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'jaffle_shop'
version: '1.0.0'

# This setting configures which "profile" dbt uses for this project.
profile: 'jaffle_shop'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  jaffle_shop:
    # Config indicated by + and applies to all files under models/example/
    +materialized: table

Changes:

@@ -14,6 +14,9 @@ profile: 'jaffle_shop'
 model-paths: ["models"]
 analysis-paths: ["analyses"]
 test-paths: ["tests"]
+
+
+
 seed-paths: ["seeds"]
 macro-paths: ["macros"]
 snapshot-paths: ["snapshots"]

File: notes.md

Original Content:

Setting up dbt

- used dbt-core, not dbt-cloud
	- warehouse (bigquery in this case) - structured-app-test
	- github repo - shivam-singhal/dbt-tutorial
		- ran `dbt init jaffle_shop` - this create a `jaffle_shop` and `logs` dir in `dbt-tutorial`
- dbt has models (which are select sql statements)
- These models can be composed of other models - hence a DAG structure. Each node can be run independently (given its dependencies are run too)
- Testing is built-in
- Version control is "built-in" via storing dbt configs in git (Github)
- commands
	- `dbt run` - run the sql queries against the data in the warehouse
		- `dbt run --full-refresh`
		- `dbt run --select <>` - to only run (or test specific models)
	- `dbt test` - validate that data has certain properties (e.g. non-null, unique, consists of certain values, etc.)
	- `dbt debug` - test .dbt/profiles.yml configuration (where bigquery connection information is stored)

What's missing are **metrics**. Lightdash takes the dbt models, and each column of the dbt model becomes a dimension of the table. 

How are Lightdash tables different from dbt models? is it 1:1?

`docker-compose -f docker-compose.yml --env-file .env up --detach --remove-orphans` to run lightdash locally from the repo

`docker exec -it <container_id> psql -U postgres -d postgres` to run psql from inside the postgres container on my local machine to inspect the postgres table

`lightdash preview` allows me to update `schema.yml` and have it updated in preview mode - so I don't always have to push to github and refresh

lightdash defines its own metrics - via the `schema.yml` file in the dbt project - these are other metas, like dimensions. The other way to add metrics is via dbt's semantic layer (availabe in dbt cloud).
- This is what we'd be replacing with our own metrics layer
- This is done using the `meta` tag in the `schema.yml` file that dbt uses.
    - this kinda sucks - it's mixing sql w/ yaml


MetricFlow
`dbt-metricflow`

Changes:

@@ -1,5 +1,8 @@
 Setting up dbt
 
+
+
+
 - used dbt-core, not dbt-cloud
 	- warehouse (bigquery in this case) - structured-app-test
 	- github repo - shivam-singhal/dbt-tutorial

All filepaths in the repo:
README.md
jaffle_shop/.gitignore
jaffle_shop/README.md
jaffle_shop/analyses/.gitkeep
jaffle_shop/dbt_project.yml
jaffle_shop/macros/.gitkeep
jaffle_shop/models/customers.sql
jaffle_shop/models/schema.yml
jaffle_shop/models/staging/stg_customers.sql
jaffle_shop/models/staging/stg_orders.sql
jaffle_shop/seeds/.gitkeep
jaffle_shop/snapshots/.gitkeep
jaffle_shop/tests/.gitkeep
logs/dbt.log
notes.md

@jayanth-kumar-structuredlabs
Copy link
Owner Author

@structured

1 similar comment
@jayanth-kumar-structuredlabs
Copy link
Owner Author

@structured

@structured-bot-beta
Copy link

Thanks for opening this PR!

Total commits: 2
Files changed: 2
Additions: 6
Deletions: 0

Commits:
e0b3ddc: Update dbt_project.yml
0c1ec81: Update notes.md

Changes:
File: jaffle_shop/dbt_project.yml

Original Content:


# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'jaffle_shop'
version: '1.0.0'

# This setting configures which "profile" dbt uses for this project.
profile: 'jaffle_shop'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  jaffle_shop:
    # Config indicated by + and applies to all files under models/example/
    +materialized: table

Changes:

@@ -14,6 +14,9 @@ profile: 'jaffle_shop'
 model-paths: ["models"]
 analysis-paths: ["analyses"]
 test-paths: ["tests"]
+
+
+
 seed-paths: ["seeds"]
 macro-paths: ["macros"]
 snapshot-paths: ["snapshots"]

File: notes.md

Original Content:

Setting up dbt

- used dbt-core, not dbt-cloud
	- warehouse (bigquery in this case) - structured-app-test
	- github repo - shivam-singhal/dbt-tutorial
		- ran `dbt init jaffle_shop` - this create a `jaffle_shop` and `logs` dir in `dbt-tutorial`
- dbt has models (which are select sql statements)
- These models can be composed of other models - hence a DAG structure. Each node can be run independently (given its dependencies are run too)
- Testing is built-in
- Version control is "built-in" via storing dbt configs in git (Github)
- commands
	- `dbt run` - run the sql queries against the data in the warehouse
		- `dbt run --full-refresh`
		- `dbt run --select <>` - to only run (or test specific models)
	- `dbt test` - validate that data has certain properties (e.g. non-null, unique, consists of certain values, etc.)
	- `dbt debug` - test .dbt/profiles.yml configuration (where bigquery connection information is stored)

What's missing are **metrics**. Lightdash takes the dbt models, and each column of the dbt model becomes a dimension of the table. 

How are Lightdash tables different from dbt models? is it 1:1?

`docker-compose -f docker-compose.yml --env-file .env up --detach --remove-orphans` to run lightdash locally from the repo

`docker exec -it <container_id> psql -U postgres -d postgres` to run psql from inside the postgres container on my local machine to inspect the postgres table

`lightdash preview` allows me to update `schema.yml` and have it updated in preview mode - so I don't always have to push to github and refresh

lightdash defines its own metrics - via the `schema.yml` file in the dbt project - these are other metas, like dimensions. The other way to add metrics is via dbt's semantic layer (availabe in dbt cloud).
- This is what we'd be replacing with our own metrics layer
- This is done using the `meta` tag in the `schema.yml` file that dbt uses.
    - this kinda sucks - it's mixing sql w/ yaml


MetricFlow
`dbt-metricflow`

Changes:

@@ -1,5 +1,8 @@
 Setting up dbt
 
+
+
+
 - used dbt-core, not dbt-cloud
 	- warehouse (bigquery in this case) - structured-app-test
 	- github repo - shivam-singhal/dbt-tutorial

All filepaths in the repo:
README.md
jaffle_shop/.gitignore
jaffle_shop/README.md
jaffle_shop/analyses/.gitkeep
jaffle_shop/dbt_project.yml
jaffle_shop/macros/.gitkeep
jaffle_shop/models/customers.sql
jaffle_shop/models/schema.yml
jaffle_shop/models/staging/stg_customers.sql
jaffle_shop/models/staging/stg_orders.sql
jaffle_shop/seeds/.gitkeep
jaffle_shop/snapshots/.gitkeep
jaffle_shop/tests/.gitkeep
logs/dbt.log
notes.md

9 similar comments
@structured-bot-beta
Copy link

Thanks for opening this PR!

Total commits: 2
Files changed: 2
Additions: 6
Deletions: 0

Commits:
e0b3ddc: Update dbt_project.yml
0c1ec81: Update notes.md

Changes:
File: jaffle_shop/dbt_project.yml

Original Content:


# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'jaffle_shop'
version: '1.0.0'

# This setting configures which "profile" dbt uses for this project.
profile: 'jaffle_shop'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  jaffle_shop:
    # Config indicated by + and applies to all files under models/example/
    +materialized: table

Changes:

@@ -14,6 +14,9 @@ profile: 'jaffle_shop'
 model-paths: ["models"]
 analysis-paths: ["analyses"]
 test-paths: ["tests"]
+
+
+
 seed-paths: ["seeds"]
 macro-paths: ["macros"]
 snapshot-paths: ["snapshots"]

File: notes.md

Original Content:

Setting up dbt

- used dbt-core, not dbt-cloud
	- warehouse (bigquery in this case) - structured-app-test
	- github repo - shivam-singhal/dbt-tutorial
		- ran `dbt init jaffle_shop` - this create a `jaffle_shop` and `logs` dir in `dbt-tutorial`
- dbt has models (which are select sql statements)
- These models can be composed of other models - hence a DAG structure. Each node can be run independently (given its dependencies are run too)
- Testing is built-in
- Version control is "built-in" via storing dbt configs in git (Github)
- commands
	- `dbt run` - run the sql queries against the data in the warehouse
		- `dbt run --full-refresh`
		- `dbt run --select <>` - to only run (or test specific models)
	- `dbt test` - validate that data has certain properties (e.g. non-null, unique, consists of certain values, etc.)
	- `dbt debug` - test .dbt/profiles.yml configuration (where bigquery connection information is stored)

What's missing are **metrics**. Lightdash takes the dbt models, and each column of the dbt model becomes a dimension of the table. 

How are Lightdash tables different from dbt models? is it 1:1?

`docker-compose -f docker-compose.yml --env-file .env up --detach --remove-orphans` to run lightdash locally from the repo

`docker exec -it <container_id> psql -U postgres -d postgres` to run psql from inside the postgres container on my local machine to inspect the postgres table

`lightdash preview` allows me to update `schema.yml` and have it updated in preview mode - so I don't always have to push to github and refresh

lightdash defines its own metrics - via the `schema.yml` file in the dbt project - these are other metas, like dimensions. The other way to add metrics is via dbt's semantic layer (availabe in dbt cloud).
- This is what we'd be replacing with our own metrics layer
- This is done using the `meta` tag in the `schema.yml` file that dbt uses.
    - this kinda sucks - it's mixing sql w/ yaml


MetricFlow
`dbt-metricflow`

Changes:

@@ -1,5 +1,8 @@
 Setting up dbt
 
+
+
+
 - used dbt-core, not dbt-cloud
 	- warehouse (bigquery in this case) - structured-app-test
 	- github repo - shivam-singhal/dbt-tutorial

All filepaths in the repo:
README.md
jaffle_shop/.gitignore
jaffle_shop/README.md
jaffle_shop/analyses/.gitkeep
jaffle_shop/dbt_project.yml
jaffle_shop/macros/.gitkeep
jaffle_shop/models/customers.sql
jaffle_shop/models/schema.yml
jaffle_shop/models/staging/stg_customers.sql
jaffle_shop/models/staging/stg_orders.sql
jaffle_shop/seeds/.gitkeep
jaffle_shop/snapshots/.gitkeep
jaffle_shop/tests/.gitkeep
logs/dbt.log
notes.md

@structured-bot-beta
Copy link

Thanks for opening this PR!

Total commits: 2
Files changed: 2
Additions: 6
Deletions: 0

Commits:
e0b3ddc: Update dbt_project.yml
0c1ec81: Update notes.md

Changes:
File: jaffle_shop/dbt_project.yml

Original Content:


# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'jaffle_shop'
version: '1.0.0'

# This setting configures which "profile" dbt uses for this project.
profile: 'jaffle_shop'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  jaffle_shop:
    # Config indicated by + and applies to all files under models/example/
    +materialized: table

Changes:

@@ -14,6 +14,9 @@ profile: 'jaffle_shop'
 model-paths: ["models"]
 analysis-paths: ["analyses"]
 test-paths: ["tests"]
+
+
+
 seed-paths: ["seeds"]
 macro-paths: ["macros"]
 snapshot-paths: ["snapshots"]

File: notes.md

Original Content:

Setting up dbt

- used dbt-core, not dbt-cloud
	- warehouse (bigquery in this case) - structured-app-test
	- github repo - shivam-singhal/dbt-tutorial
		- ran `dbt init jaffle_shop` - this create a `jaffle_shop` and `logs` dir in `dbt-tutorial`
- dbt has models (which are select sql statements)
- These models can be composed of other models - hence a DAG structure. Each node can be run independently (given its dependencies are run too)
- Testing is built-in
- Version control is "built-in" via storing dbt configs in git (Github)
- commands
	- `dbt run` - run the sql queries against the data in the warehouse
		- `dbt run --full-refresh`
		- `dbt run --select <>` - to only run (or test specific models)
	- `dbt test` - validate that data has certain properties (e.g. non-null, unique, consists of certain values, etc.)
	- `dbt debug` - test .dbt/profiles.yml configuration (where bigquery connection information is stored)

What's missing are **metrics**. Lightdash takes the dbt models, and each column of the dbt model becomes a dimension of the table. 

How are Lightdash tables different from dbt models? is it 1:1?

`docker-compose -f docker-compose.yml --env-file .env up --detach --remove-orphans` to run lightdash locally from the repo

`docker exec -it <container_id> psql -U postgres -d postgres` to run psql from inside the postgres container on my local machine to inspect the postgres table

`lightdash preview` allows me to update `schema.yml` and have it updated in preview mode - so I don't always have to push to github and refresh

lightdash defines its own metrics - via the `schema.yml` file in the dbt project - these are other metas, like dimensions. The other way to add metrics is via dbt's semantic layer (availabe in dbt cloud).
- This is what we'd be replacing with our own metrics layer
- This is done using the `meta` tag in the `schema.yml` file that dbt uses.
    - this kinda sucks - it's mixing sql w/ yaml


MetricFlow
`dbt-metricflow`

Changes:

@@ -1,5 +1,8 @@
 Setting up dbt
 
+
+
+
 - used dbt-core, not dbt-cloud
 	- warehouse (bigquery in this case) - structured-app-test
 	- github repo - shivam-singhal/dbt-tutorial

All filepaths in the repo:
README.md
jaffle_shop/.gitignore
jaffle_shop/README.md
jaffle_shop/analyses/.gitkeep
jaffle_shop/dbt_project.yml
jaffle_shop/macros/.gitkeep
jaffle_shop/models/customers.sql
jaffle_shop/models/schema.yml
jaffle_shop/models/staging/stg_customers.sql
jaffle_shop/models/staging/stg_orders.sql
jaffle_shop/seeds/.gitkeep
jaffle_shop/snapshots/.gitkeep
jaffle_shop/tests/.gitkeep
logs/dbt.log
notes.md

@structured-bot-beta
Copy link

Thanks for opening this PR!

Total commits: 2
Files changed: 2
Additions: 6
Deletions: 0

Commits:
e0b3ddc: Update dbt_project.yml
0c1ec81: Update notes.md

Changes:
File: jaffle_shop/dbt_project.yml

Original Content:


# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'jaffle_shop'
version: '1.0.0'

# This setting configures which "profile" dbt uses for this project.
profile: 'jaffle_shop'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  jaffle_shop:
    # Config indicated by + and applies to all files under models/example/
    +materialized: table

Changes:

@@ -14,6 +14,9 @@ profile: 'jaffle_shop'
 model-paths: ["models"]
 analysis-paths: ["analyses"]
 test-paths: ["tests"]
+
+
+
 seed-paths: ["seeds"]
 macro-paths: ["macros"]
 snapshot-paths: ["snapshots"]

File: notes.md

Original Content:

Setting up dbt

- used dbt-core, not dbt-cloud
	- warehouse (bigquery in this case) - structured-app-test
	- github repo - shivam-singhal/dbt-tutorial
		- ran `dbt init jaffle_shop` - this create a `jaffle_shop` and `logs` dir in `dbt-tutorial`
- dbt has models (which are select sql statements)
- These models can be composed of other models - hence a DAG structure. Each node can be run independently (given its dependencies are run too)
- Testing is built-in
- Version control is "built-in" via storing dbt configs in git (Github)
- commands
	- `dbt run` - run the sql queries against the data in the warehouse
		- `dbt run --full-refresh`
		- `dbt run --select <>` - to only run (or test specific models)
	- `dbt test` - validate that data has certain properties (e.g. non-null, unique, consists of certain values, etc.)
	- `dbt debug` - test .dbt/profiles.yml configuration (where bigquery connection information is stored)

What's missing are **metrics**. Lightdash takes the dbt models, and each column of the dbt model becomes a dimension of the table. 

How are Lightdash tables different from dbt models? is it 1:1?

`docker-compose -f docker-compose.yml --env-file .env up --detach --remove-orphans` to run lightdash locally from the repo

`docker exec -it <container_id> psql -U postgres -d postgres` to run psql from inside the postgres container on my local machine to inspect the postgres table

`lightdash preview` allows me to update `schema.yml` and have it updated in preview mode - so I don't always have to push to github and refresh

lightdash defines its own metrics - via the `schema.yml` file in the dbt project - these are other metas, like dimensions. The other way to add metrics is via dbt's semantic layer (availabe in dbt cloud).
- This is what we'd be replacing with our own metrics layer
- This is done using the `meta` tag in the `schema.yml` file that dbt uses.
    - this kinda sucks - it's mixing sql w/ yaml


MetricFlow
`dbt-metricflow`

Changes:

@@ -1,5 +1,8 @@
 Setting up dbt
 
+
+
+
 - used dbt-core, not dbt-cloud
 	- warehouse (bigquery in this case) - structured-app-test
 	- github repo - shivam-singhal/dbt-tutorial

All filepaths in the repo:
README.md
jaffle_shop/.gitignore
jaffle_shop/README.md
jaffle_shop/analyses/.gitkeep
jaffle_shop/dbt_project.yml
jaffle_shop/macros/.gitkeep
jaffle_shop/models/customers.sql
jaffle_shop/models/schema.yml
jaffle_shop/models/staging/stg_customers.sql
jaffle_shop/models/staging/stg_orders.sql
jaffle_shop/seeds/.gitkeep
jaffle_shop/snapshots/.gitkeep
jaffle_shop/tests/.gitkeep
logs/dbt.log
notes.md

@structured-bot-beta
Copy link

Thanks for opening this PR!

Total commits: 2
Files changed: 2
Additions: 6
Deletions: 0

Commits:
e0b3ddc: Update dbt_project.yml
0c1ec81: Update notes.md

Changes:
File: jaffle_shop/dbt_project.yml

Original Content:


# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'jaffle_shop'
version: '1.0.0'

# This setting configures which "profile" dbt uses for this project.
profile: 'jaffle_shop'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  jaffle_shop:
    # Config indicated by + and applies to all files under models/example/
    +materialized: table

Changes:

@@ -14,6 +14,9 @@ profile: 'jaffle_shop'
 model-paths: ["models"]
 analysis-paths: ["analyses"]
 test-paths: ["tests"]
+
+
+
 seed-paths: ["seeds"]
 macro-paths: ["macros"]
 snapshot-paths: ["snapshots"]

File: notes.md

Original Content:

Setting up dbt

- used dbt-core, not dbt-cloud
	- warehouse (bigquery in this case) - structured-app-test
	- github repo - shivam-singhal/dbt-tutorial
		- ran `dbt init jaffle_shop` - this create a `jaffle_shop` and `logs` dir in `dbt-tutorial`
- dbt has models (which are select sql statements)
- These models can be composed of other models - hence a DAG structure. Each node can be run independently (given its dependencies are run too)
- Testing is built-in
- Version control is "built-in" via storing dbt configs in git (Github)
- commands
	- `dbt run` - run the sql queries against the data in the warehouse
		- `dbt run --full-refresh`
		- `dbt run --select <>` - to only run (or test specific models)
	- `dbt test` - validate that data has certain properties (e.g. non-null, unique, consists of certain values, etc.)
	- `dbt debug` - test .dbt/profiles.yml configuration (where bigquery connection information is stored)

What's missing are **metrics**. Lightdash takes the dbt models, and each column of the dbt model becomes a dimension of the table. 

How are Lightdash tables different from dbt models? is it 1:1?

`docker-compose -f docker-compose.yml --env-file .env up --detach --remove-orphans` to run lightdash locally from the repo

`docker exec -it <container_id> psql -U postgres -d postgres` to run psql from inside the postgres container on my local machine to inspect the postgres table

`lightdash preview` allows me to update `schema.yml` and have it updated in preview mode - so I don't always have to push to github and refresh

lightdash defines its own metrics - via the `schema.yml` file in the dbt project - these are other metas, like dimensions. The other way to add metrics is via dbt's semantic layer (availabe in dbt cloud).
- This is what we'd be replacing with our own metrics layer
- This is done using the `meta` tag in the `schema.yml` file that dbt uses.
    - this kinda sucks - it's mixing sql w/ yaml


MetricFlow
`dbt-metricflow`

Changes:

@@ -1,5 +1,8 @@
 Setting up dbt
 
+
+
+
 - used dbt-core, not dbt-cloud
 	- warehouse (bigquery in this case) - structured-app-test
 	- github repo - shivam-singhal/dbt-tutorial

All filepaths in the repo:
README.md
jaffle_shop/.gitignore
jaffle_shop/README.md
jaffle_shop/analyses/.gitkeep
jaffle_shop/dbt_project.yml
jaffle_shop/macros/.gitkeep
jaffle_shop/models/customers.sql
jaffle_shop/models/schema.yml
jaffle_shop/models/staging/stg_customers.sql
jaffle_shop/models/staging/stg_orders.sql
jaffle_shop/seeds/.gitkeep
jaffle_shop/snapshots/.gitkeep
jaffle_shop/tests/.gitkeep
logs/dbt.log
notes.md

@structured-bot-beta
Copy link

Thanks for opening this PR!

Total commits: 2
Files changed: 2
Additions: 6
Deletions: 0

Commits:
e0b3ddc: Update dbt_project.yml
0c1ec81: Update notes.md

Changes:
File: jaffle_shop/dbt_project.yml

Original Content:


# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'jaffle_shop'
version: '1.0.0'

# This setting configures which "profile" dbt uses for this project.
profile: 'jaffle_shop'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  jaffle_shop:
    # Config indicated by + and applies to all files under models/example/
    +materialized: table

Changes:

@@ -14,6 +14,9 @@ profile: 'jaffle_shop'
 model-paths: ["models"]
 analysis-paths: ["analyses"]
 test-paths: ["tests"]
+
+
+
 seed-paths: ["seeds"]
 macro-paths: ["macros"]
 snapshot-paths: ["snapshots"]

File: notes.md

Original Content:

Setting up dbt

- used dbt-core, not dbt-cloud
	- warehouse (bigquery in this case) - structured-app-test
	- github repo - shivam-singhal/dbt-tutorial
		- ran `dbt init jaffle_shop` - this create a `jaffle_shop` and `logs` dir in `dbt-tutorial`
- dbt has models (which are select sql statements)
- These models can be composed of other models - hence a DAG structure. Each node can be run independently (given its dependencies are run too)
- Testing is built-in
- Version control is "built-in" via storing dbt configs in git (Github)
- commands
	- `dbt run` - run the sql queries against the data in the warehouse
		- `dbt run --full-refresh`
		- `dbt run --select <>` - to only run (or test specific models)
	- `dbt test` - validate that data has certain properties (e.g. non-null, unique, consists of certain values, etc.)
	- `dbt debug` - test .dbt/profiles.yml configuration (where bigquery connection information is stored)

What's missing are **metrics**. Lightdash takes the dbt models, and each column of the dbt model becomes a dimension of the table. 

How are Lightdash tables different from dbt models? is it 1:1?

`docker-compose -f docker-compose.yml --env-file .env up --detach --remove-orphans` to run lightdash locally from the repo

`docker exec -it <container_id> psql -U postgres -d postgres` to run psql from inside the postgres container on my local machine to inspect the postgres table

`lightdash preview` allows me to update `schema.yml` and have it updated in preview mode - so I don't always have to push to github and refresh

lightdash defines its own metrics - via the `schema.yml` file in the dbt project - these are other metas, like dimensions. The other way to add metrics is via dbt's semantic layer (availabe in dbt cloud).
- This is what we'd be replacing with our own metrics layer
- This is done using the `meta` tag in the `schema.yml` file that dbt uses.
    - this kinda sucks - it's mixing sql w/ yaml


MetricFlow
`dbt-metricflow`

Changes:

@@ -1,5 +1,8 @@
 Setting up dbt
 
+
+
+
 - used dbt-core, not dbt-cloud
 	- warehouse (bigquery in this case) - structured-app-test
 	- github repo - shivam-singhal/dbt-tutorial

All filepaths in the repo:
README.md
jaffle_shop/.gitignore
jaffle_shop/README.md
jaffle_shop/analyses/.gitkeep
jaffle_shop/dbt_project.yml
jaffle_shop/macros/.gitkeep
jaffle_shop/models/customers.sql
jaffle_shop/models/schema.yml
jaffle_shop/models/staging/stg_customers.sql
jaffle_shop/models/staging/stg_orders.sql
jaffle_shop/seeds/.gitkeep
jaffle_shop/snapshots/.gitkeep
jaffle_shop/tests/.gitkeep
logs/dbt.log
notes.md

@structured-bot-beta
Copy link

Thanks for opening this PR!

Total commits: 2
Files changed: 2
Additions: 6
Deletions: 0

Commits:
e0b3ddc: Update dbt_project.yml
0c1ec81: Update notes.md

Changes:
File: jaffle_shop/dbt_project.yml

Original Content:


# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'jaffle_shop'
version: '1.0.0'

# This setting configures which "profile" dbt uses for this project.
profile: 'jaffle_shop'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  jaffle_shop:
    # Config indicated by + and applies to all files under models/example/
    +materialized: table

Changes:

@@ -14,6 +14,9 @@ profile: 'jaffle_shop'
 model-paths: ["models"]
 analysis-paths: ["analyses"]
 test-paths: ["tests"]
+
+
+
 seed-paths: ["seeds"]
 macro-paths: ["macros"]
 snapshot-paths: ["snapshots"]

File: notes.md

Original Content:

Setting up dbt

- used dbt-core, not dbt-cloud
	- warehouse (bigquery in this case) - structured-app-test
	- github repo - shivam-singhal/dbt-tutorial
		- ran `dbt init jaffle_shop` - this create a `jaffle_shop` and `logs` dir in `dbt-tutorial`
- dbt has models (which are select sql statements)
- These models can be composed of other models - hence a DAG structure. Each node can be run independently (given its dependencies are run too)
- Testing is built-in
- Version control is "built-in" via storing dbt configs in git (Github)
- commands
	- `dbt run` - run the sql queries against the data in the warehouse
		- `dbt run --full-refresh`
		- `dbt run --select <>` - to only run (or test specific models)
	- `dbt test` - validate that data has certain properties (e.g. non-null, unique, consists of certain values, etc.)
	- `dbt debug` - test .dbt/profiles.yml configuration (where bigquery connection information is stored)

What's missing are **metrics**. Lightdash takes the dbt models, and each column of the dbt model becomes a dimension of the table. 

How are Lightdash tables different from dbt models? is it 1:1?

`docker-compose -f docker-compose.yml --env-file .env up --detach --remove-orphans` to run lightdash locally from the repo

`docker exec -it <container_id> psql -U postgres -d postgres` to run psql from inside the postgres container on my local machine to inspect the postgres table

`lightdash preview` allows me to update `schema.yml` and have it updated in preview mode - so I don't always have to push to github and refresh

lightdash defines its own metrics - via the `schema.yml` file in the dbt project - these are other metas, like dimensions. The other way to add metrics is via dbt's semantic layer (availabe in dbt cloud).
- This is what we'd be replacing with our own metrics layer
- This is done using the `meta` tag in the `schema.yml` file that dbt uses.
    - this kinda sucks - it's mixing sql w/ yaml


MetricFlow
`dbt-metricflow`

Changes:

@@ -1,5 +1,8 @@
 Setting up dbt
 
+
+
+
 - used dbt-core, not dbt-cloud
 	- warehouse (bigquery in this case) - structured-app-test
 	- github repo - shivam-singhal/dbt-tutorial

All filepaths in the repo:
README.md
jaffle_shop/.gitignore
jaffle_shop/README.md
jaffle_shop/analyses/.gitkeep
jaffle_shop/dbt_project.yml
jaffle_shop/macros/.gitkeep
jaffle_shop/models/customers.sql
jaffle_shop/models/schema.yml
jaffle_shop/models/staging/stg_customers.sql
jaffle_shop/models/staging/stg_orders.sql
jaffle_shop/seeds/.gitkeep
jaffle_shop/snapshots/.gitkeep
jaffle_shop/tests/.gitkeep
logs/dbt.log
notes.md

@structured-bot-beta
Copy link

Thanks for opening this PR!

Total commits: 2
Files changed: 2
Additions: 6
Deletions: 0

Commits:
e0b3ddc: Update dbt_project.yml
0c1ec81: Update notes.md

Changes:
File: jaffle_shop/dbt_project.yml

Original Content:


# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'jaffle_shop'
version: '1.0.0'

# This setting configures which "profile" dbt uses for this project.
profile: 'jaffle_shop'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  jaffle_shop:
    # Config indicated by + and applies to all files under models/example/
    +materialized: table

Changes:

@@ -14,6 +14,9 @@ profile: 'jaffle_shop'
 model-paths: ["models"]
 analysis-paths: ["analyses"]
 test-paths: ["tests"]
+
+
+
 seed-paths: ["seeds"]
 macro-paths: ["macros"]
 snapshot-paths: ["snapshots"]

File: notes.md

Original Content:

Setting up dbt

- used dbt-core, not dbt-cloud
	- warehouse (bigquery in this case) - structured-app-test
	- github repo - shivam-singhal/dbt-tutorial
		- ran `dbt init jaffle_shop` - this create a `jaffle_shop` and `logs` dir in `dbt-tutorial`
- dbt has models (which are select sql statements)
- These models can be composed of other models - hence a DAG structure. Each node can be run independently (given its dependencies are run too)
- Testing is built-in
- Version control is "built-in" via storing dbt configs in git (Github)
- commands
	- `dbt run` - run the sql queries against the data in the warehouse
		- `dbt run --full-refresh`
		- `dbt run --select <>` - to only run (or test specific models)
	- `dbt test` - validate that data has certain properties (e.g. non-null, unique, consists of certain values, etc.)
	- `dbt debug` - test .dbt/profiles.yml configuration (where bigquery connection information is stored)

What's missing are **metrics**. Lightdash takes the dbt models, and each column of the dbt model becomes a dimension of the table. 

How are Lightdash tables different from dbt models? is it 1:1?

`docker-compose -f docker-compose.yml --env-file .env up --detach --remove-orphans` to run lightdash locally from the repo

`docker exec -it <container_id> psql -U postgres -d postgres` to run psql from inside the postgres container on my local machine to inspect the postgres table

`lightdash preview` allows me to update `schema.yml` and have it updated in preview mode - so I don't always have to push to github and refresh

lightdash defines its own metrics - via the `schema.yml` file in the dbt project - these are other metas, like dimensions. The other way to add metrics is via dbt's semantic layer (availabe in dbt cloud).
- This is what we'd be replacing with our own metrics layer
- This is done using the `meta` tag in the `schema.yml` file that dbt uses.
    - this kinda sucks - it's mixing sql w/ yaml


MetricFlow
`dbt-metricflow`

Changes:

@@ -1,5 +1,8 @@
 Setting up dbt
 
+
+
+
 - used dbt-core, not dbt-cloud
 	- warehouse (bigquery in this case) - structured-app-test
 	- github repo - shivam-singhal/dbt-tutorial

All filepaths in the repo:
README.md
jaffle_shop/.gitignore
jaffle_shop/README.md
jaffle_shop/analyses/.gitkeep
jaffle_shop/dbt_project.yml
jaffle_shop/macros/.gitkeep
jaffle_shop/models/customers.sql
jaffle_shop/models/schema.yml
jaffle_shop/models/staging/stg_customers.sql
jaffle_shop/models/staging/stg_orders.sql
jaffle_shop/seeds/.gitkeep
jaffle_shop/snapshots/.gitkeep
jaffle_shop/tests/.gitkeep
logs/dbt.log
notes.md

@structured-bot-beta
Copy link

Thanks for opening this PR!

Total commits: 2
Files changed: 2
Additions: 6
Deletions: 0

Commits:
e0b3ddc: Update dbt_project.yml
0c1ec81: Update notes.md

Changes:
File: jaffle_shop/dbt_project.yml

Original Content:


# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'jaffle_shop'
version: '1.0.0'

# This setting configures which "profile" dbt uses for this project.
profile: 'jaffle_shop'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  jaffle_shop:
    # Config indicated by + and applies to all files under models/example/
    +materialized: table

Changes:

@@ -14,6 +14,9 @@ profile: 'jaffle_shop'
 model-paths: ["models"]
 analysis-paths: ["analyses"]
 test-paths: ["tests"]
+
+
+
 seed-paths: ["seeds"]
 macro-paths: ["macros"]
 snapshot-paths: ["snapshots"]

File: notes.md

Original Content:

Setting up dbt

- used dbt-core, not dbt-cloud
	- warehouse (bigquery in this case) - structured-app-test
	- github repo - shivam-singhal/dbt-tutorial
		- ran `dbt init jaffle_shop` - this create a `jaffle_shop` and `logs` dir in `dbt-tutorial`
- dbt has models (which are select sql statements)
- These models can be composed of other models - hence a DAG structure. Each node can be run independently (given its dependencies are run too)
- Testing is built-in
- Version control is "built-in" via storing dbt configs in git (Github)
- commands
	- `dbt run` - run the sql queries against the data in the warehouse
		- `dbt run --full-refresh`
		- `dbt run --select <>` - to only run (or test specific models)
	- `dbt test` - validate that data has certain properties (e.g. non-null, unique, consists of certain values, etc.)
	- `dbt debug` - test .dbt/profiles.yml configuration (where bigquery connection information is stored)

What's missing are **metrics**. Lightdash takes the dbt models, and each column of the dbt model becomes a dimension of the table. 

How are Lightdash tables different from dbt models? is it 1:1?

`docker-compose -f docker-compose.yml --env-file .env up --detach --remove-orphans` to run lightdash locally from the repo

`docker exec -it <container_id> psql -U postgres -d postgres` to run psql from inside the postgres container on my local machine to inspect the postgres table

`lightdash preview` allows me to update `schema.yml` and have it updated in preview mode - so I don't always have to push to github and refresh

lightdash defines its own metrics - via the `schema.yml` file in the dbt project - these are other metas, like dimensions. The other way to add metrics is via dbt's semantic layer (availabe in dbt cloud).
- This is what we'd be replacing with our own metrics layer
- This is done using the `meta` tag in the `schema.yml` file that dbt uses.
    - this kinda sucks - it's mixing sql w/ yaml


MetricFlow
`dbt-metricflow`

Changes:

@@ -1,5 +1,8 @@
 Setting up dbt
 
+
+
+
 - used dbt-core, not dbt-cloud
 	- warehouse (bigquery in this case) - structured-app-test
 	- github repo - shivam-singhal/dbt-tutorial

All filepaths in the repo:
README.md
jaffle_shop/.gitignore
jaffle_shop/README.md
jaffle_shop/analyses/.gitkeep
jaffle_shop/dbt_project.yml
jaffle_shop/macros/.gitkeep
jaffle_shop/models/customers.sql
jaffle_shop/models/schema.yml
jaffle_shop/models/staging/stg_customers.sql
jaffle_shop/models/staging/stg_orders.sql
jaffle_shop/seeds/.gitkeep
jaffle_shop/snapshots/.gitkeep
jaffle_shop/tests/.gitkeep
logs/dbt.log
notes.md

@structured-bot-beta
Copy link

Thanks for opening this PR!

Total commits: 2
Files changed: 2
Additions: 6
Deletions: 0

Commits:
e0b3ddc: Update dbt_project.yml
0c1ec81: Update notes.md

Changes:
File: jaffle_shop/dbt_project.yml

Original Content:


# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'jaffle_shop'
version: '1.0.0'

# This setting configures which "profile" dbt uses for this project.
profile: 'jaffle_shop'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  jaffle_shop:
    # Config indicated by + and applies to all files under models/example/
    +materialized: table

Changes:

@@ -14,6 +14,9 @@ profile: 'jaffle_shop'
 model-paths: ["models"]
 analysis-paths: ["analyses"]
 test-paths: ["tests"]
+
+
+
 seed-paths: ["seeds"]
 macro-paths: ["macros"]
 snapshot-paths: ["snapshots"]

File: notes.md

Original Content:

Setting up dbt

- used dbt-core, not dbt-cloud
	- warehouse (bigquery in this case) - structured-app-test
	- github repo - shivam-singhal/dbt-tutorial
		- ran `dbt init jaffle_shop` - this create a `jaffle_shop` and `logs` dir in `dbt-tutorial`
- dbt has models (which are select sql statements)
- These models can be composed of other models - hence a DAG structure. Each node can be run independently (given its dependencies are run too)
- Testing is built-in
- Version control is "built-in" via storing dbt configs in git (Github)
- commands
	- `dbt run` - run the sql queries against the data in the warehouse
		- `dbt run --full-refresh`
		- `dbt run --select <>` - to only run (or test specific models)
	- `dbt test` - validate that data has certain properties (e.g. non-null, unique, consists of certain values, etc.)
	- `dbt debug` - test .dbt/profiles.yml configuration (where bigquery connection information is stored)

What's missing are **metrics**. Lightdash takes the dbt models, and each column of the dbt model becomes a dimension of the table. 

How are Lightdash tables different from dbt models? is it 1:1?

`docker-compose -f docker-compose.yml --env-file .env up --detach --remove-orphans` to run lightdash locally from the repo

`docker exec -it <container_id> psql -U postgres -d postgres` to run psql from inside the postgres container on my local machine to inspect the postgres table

`lightdash preview` allows me to update `schema.yml` and have it updated in preview mode - so I don't always have to push to github and refresh

lightdash defines its own metrics - via the `schema.yml` file in the dbt project - these are other metas, like dimensions. The other way to add metrics is via dbt's semantic layer (availabe in dbt cloud).
- This is what we'd be replacing with our own metrics layer
- This is done using the `meta` tag in the `schema.yml` file that dbt uses.
    - this kinda sucks - it's mixing sql w/ yaml


MetricFlow
`dbt-metricflow`

Changes:

@@ -1,5 +1,8 @@
 Setting up dbt
 
+
+
+
 - used dbt-core, not dbt-cloud
 	- warehouse (bigquery in this case) - structured-app-test
 	- github repo - shivam-singhal/dbt-tutorial

All filepaths in the repo:
README.md
jaffle_shop/.gitignore
jaffle_shop/README.md
jaffle_shop/analyses/.gitkeep
jaffle_shop/dbt_project.yml
jaffle_shop/macros/.gitkeep
jaffle_shop/models/customers.sql
jaffle_shop/models/schema.yml
jaffle_shop/models/staging/stg_customers.sql
jaffle_shop/models/staging/stg_orders.sql
jaffle_shop/seeds/.gitkeep
jaffle_shop/snapshots/.gitkeep
jaffle_shop/tests/.gitkeep
logs/dbt.log
notes.md

@jayanth-kumar-structuredlabs
Copy link
Owner Author

@structured

@structured-bot-beta
Copy link

Thanks for opening this PR!

Total commits: 2
Files changed: 2
Additions: 6
Deletions: 0

Commits:
e0b3ddc: Update dbt_project.yml
0c1ec81: Update notes.md

Changes:
File: jaffle_shop/dbt_project.yml

Original Content:


# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'jaffle_shop'
version: '1.0.0'

# This setting configures which "profile" dbt uses for this project.
profile: 'jaffle_shop'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  jaffle_shop:
    # Config indicated by + and applies to all files under models/example/
    +materialized: table

Changes:

@@ -14,6 +14,9 @@ profile: 'jaffle_shop'
 model-paths: ["models"]
 analysis-paths: ["analyses"]
 test-paths: ["tests"]
+
+
+
 seed-paths: ["seeds"]
 macro-paths: ["macros"]
 snapshot-paths: ["snapshots"]

File: notes.md

Original Content:

Setting up dbt

- used dbt-core, not dbt-cloud
	- warehouse (bigquery in this case) - structured-app-test
	- github repo - shivam-singhal/dbt-tutorial
		- ran `dbt init jaffle_shop` - this create a `jaffle_shop` and `logs` dir in `dbt-tutorial`
- dbt has models (which are select sql statements)
- These models can be composed of other models - hence a DAG structure. Each node can be run independently (given its dependencies are run too)
- Testing is built-in
- Version control is "built-in" via storing dbt configs in git (Github)
- commands
	- `dbt run` - run the sql queries against the data in the warehouse
		- `dbt run --full-refresh`
		- `dbt run --select <>` - to only run (or test specific models)
	- `dbt test` - validate that data has certain properties (e.g. non-null, unique, consists of certain values, etc.)
	- `dbt debug` - test .dbt/profiles.yml configuration (where bigquery connection information is stored)

What's missing are **metrics**. Lightdash takes the dbt models, and each column of the dbt model becomes a dimension of the table. 

How are Lightdash tables different from dbt models? is it 1:1?

`docker-compose -f docker-compose.yml --env-file .env up --detach --remove-orphans` to run lightdash locally from the repo

`docker exec -it <container_id> psql -U postgres -d postgres` to run psql from inside the postgres container on my local machine to inspect the postgres table

`lightdash preview` allows me to update `schema.yml` and have it updated in preview mode - so I don't always have to push to github and refresh

lightdash defines its own metrics - via the `schema.yml` file in the dbt project - these are other metas, like dimensions. The other way to add metrics is via dbt's semantic layer (availabe in dbt cloud).
- This is what we'd be replacing with our own metrics layer
- This is done using the `meta` tag in the `schema.yml` file that dbt uses.
    - this kinda sucks - it's mixing sql w/ yaml


MetricFlow
`dbt-metricflow`

Changes:

@@ -1,5 +1,8 @@
 Setting up dbt
 
+
+
+
 - used dbt-core, not dbt-cloud
 	- warehouse (bigquery in this case) - structured-app-test
 	- github repo - shivam-singhal/dbt-tutorial

All filepaths in the repo:
README.md
jaffle_shop/.gitignore
jaffle_shop/README.md
jaffle_shop/analyses/.gitkeep
jaffle_shop/dbt_project.yml
jaffle_shop/macros/.gitkeep
jaffle_shop/models/customers.sql
jaffle_shop/models/schema.yml
jaffle_shop/models/staging/stg_customers.sql
jaffle_shop/models/staging/stg_orders.sql
jaffle_shop/seeds/.gitkeep
jaffle_shop/snapshots/.gitkeep
jaffle_shop/tests/.gitkeep
logs/dbt.log
notes.md

@jayanth-kumar-structuredlabs
Copy link
Owner Author

@structured

@structured-bot-beta
Copy link

Thanks for opening this PR!

Total commits: 2
Files changed: 2
Additions: 6
Deletions: 0

Commits:
e0b3ddc: Update dbt_project.yml
0c1ec81: Update notes.md

Changes:
File: jaffle_shop/dbt_project.yml

Original Content:


# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'jaffle_shop'
version: '1.0.0'

# This setting configures which "profile" dbt uses for this project.
profile: 'jaffle_shop'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  jaffle_shop:
    # Config indicated by + and applies to all files under models/example/
    +materialized: table

Changes:

@@ -14,6 +14,9 @@ profile: 'jaffle_shop'
 model-paths: ["models"]
 analysis-paths: ["analyses"]
 test-paths: ["tests"]
+
+
+
 seed-paths: ["seeds"]
 macro-paths: ["macros"]
 snapshot-paths: ["snapshots"]

File: notes.md

Original Content:

Setting up dbt

- used dbt-core, not dbt-cloud
	- warehouse (bigquery in this case) - structured-app-test
	- github repo - shivam-singhal/dbt-tutorial
		- ran `dbt init jaffle_shop` - this create a `jaffle_shop` and `logs` dir in `dbt-tutorial`
- dbt has models (which are select sql statements)
- These models can be composed of other models - hence a DAG structure. Each node can be run independently (given its dependencies are run too)
- Testing is built-in
- Version control is "built-in" via storing dbt configs in git (Github)
- commands
	- `dbt run` - run the sql queries against the data in the warehouse
		- `dbt run --full-refresh`
		- `dbt run --select <>` - to only run (or test specific models)
	- `dbt test` - validate that data has certain properties (e.g. non-null, unique, consists of certain values, etc.)
	- `dbt debug` - test .dbt/profiles.yml configuration (where bigquery connection information is stored)

What's missing are **metrics**. Lightdash takes the dbt models, and each column of the dbt model becomes a dimension of the table. 

How are Lightdash tables different from dbt models? is it 1:1?

`docker-compose -f docker-compose.yml --env-file .env up --detach --remove-orphans` to run lightdash locally from the repo

`docker exec -it <container_id> psql -U postgres -d postgres` to run psql from inside the postgres container on my local machine to inspect the postgres table

`lightdash preview` allows me to update `schema.yml` and have it updated in preview mode - so I don't always have to push to github and refresh

lightdash defines its own metrics - via the `schema.yml` file in the dbt project - these are other metas, like dimensions. The other way to add metrics is via dbt's semantic layer (availabe in dbt cloud).
- This is what we'd be replacing with our own metrics layer
- This is done using the `meta` tag in the `schema.yml` file that dbt uses.
    - this kinda sucks - it's mixing sql w/ yaml


MetricFlow
`dbt-metricflow`

Changes:

@@ -1,5 +1,8 @@
 Setting up dbt
 
+
+
+
 - used dbt-core, not dbt-cloud
 	- warehouse (bigquery in this case) - structured-app-test
 	- github repo - shivam-singhal/dbt-tutorial

All filepaths in the repo:
README.md
jaffle_shop/.gitignore
jaffle_shop/README.md
jaffle_shop/analyses/.gitkeep
jaffle_shop/dbt_project.yml
jaffle_shop/macros/.gitkeep
jaffle_shop/models/customers.sql
jaffle_shop/models/schema.yml
jaffle_shop/models/staging/stg_customers.sql
jaffle_shop/models/staging/stg_orders.sql
jaffle_shop/seeds/.gitkeep
jaffle_shop/snapshots/.gitkeep
jaffle_shop/tests/.gitkeep
logs/dbt.log
notes.md

5 similar comments
@structured-bot-beta
Copy link

Thanks for opening this PR!

Total commits: 2
Files changed: 2
Additions: 6
Deletions: 0

Commits:
e0b3ddc: Update dbt_project.yml
0c1ec81: Update notes.md

Changes:
File: jaffle_shop/dbt_project.yml

Original Content:


# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'jaffle_shop'
version: '1.0.0'

# This setting configures which "profile" dbt uses for this project.
profile: 'jaffle_shop'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  jaffle_shop:
    # Config indicated by + and applies to all files under models/example/
    +materialized: table

Changes:

@@ -14,6 +14,9 @@ profile: 'jaffle_shop'
 model-paths: ["models"]
 analysis-paths: ["analyses"]
 test-paths: ["tests"]
+
+
+
 seed-paths: ["seeds"]
 macro-paths: ["macros"]
 snapshot-paths: ["snapshots"]

File: notes.md

Original Content:

Setting up dbt

- used dbt-core, not dbt-cloud
	- warehouse (bigquery in this case) - structured-app-test
	- github repo - shivam-singhal/dbt-tutorial
		- ran `dbt init jaffle_shop` - this create a `jaffle_shop` and `logs` dir in `dbt-tutorial`
- dbt has models (which are select sql statements)
- These models can be composed of other models - hence a DAG structure. Each node can be run independently (given its dependencies are run too)
- Testing is built-in
- Version control is "built-in" via storing dbt configs in git (Github)
- commands
	- `dbt run` - run the sql queries against the data in the warehouse
		- `dbt run --full-refresh`
		- `dbt run --select <>` - to only run (or test specific models)
	- `dbt test` - validate that data has certain properties (e.g. non-null, unique, consists of certain values, etc.)
	- `dbt debug` - test .dbt/profiles.yml configuration (where bigquery connection information is stored)

What's missing are **metrics**. Lightdash takes the dbt models, and each column of the dbt model becomes a dimension of the table. 

How are Lightdash tables different from dbt models? is it 1:1?

`docker-compose -f docker-compose.yml --env-file .env up --detach --remove-orphans` to run lightdash locally from the repo

`docker exec -it <container_id> psql -U postgres -d postgres` to run psql from inside the postgres container on my local machine to inspect the postgres table

`lightdash preview` allows me to update `schema.yml` and have it updated in preview mode - so I don't always have to push to github and refresh

lightdash defines its own metrics - via the `schema.yml` file in the dbt project - these are other metas, like dimensions. The other way to add metrics is via dbt's semantic layer (availabe in dbt cloud).
- This is what we'd be replacing with our own metrics layer
- This is done using the `meta` tag in the `schema.yml` file that dbt uses.
    - this kinda sucks - it's mixing sql w/ yaml


MetricFlow
`dbt-metricflow`

Changes:

@@ -1,5 +1,8 @@
 Setting up dbt
 
+
+
+
 - used dbt-core, not dbt-cloud
 	- warehouse (bigquery in this case) - structured-app-test
 	- github repo - shivam-singhal/dbt-tutorial

All filepaths in the repo:
README.md
jaffle_shop/.gitignore
jaffle_shop/README.md
jaffle_shop/analyses/.gitkeep
jaffle_shop/dbt_project.yml
jaffle_shop/macros/.gitkeep
jaffle_shop/models/customers.sql
jaffle_shop/models/schema.yml
jaffle_shop/models/staging/stg_customers.sql
jaffle_shop/models/staging/stg_orders.sql
jaffle_shop/seeds/.gitkeep
jaffle_shop/snapshots/.gitkeep
jaffle_shop/tests/.gitkeep
logs/dbt.log
notes.md

@structured-bot-beta
Copy link

Thanks for opening this PR!

Total commits: 2
Files changed: 2
Additions: 6
Deletions: 0

Commits:
e0b3ddc: Update dbt_project.yml
0c1ec81: Update notes.md

Changes:
File: jaffle_shop/dbt_project.yml

Original Content:


# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'jaffle_shop'
version: '1.0.0'

# This setting configures which "profile" dbt uses for this project.
profile: 'jaffle_shop'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  jaffle_shop:
    # Config indicated by + and applies to all files under models/example/
    +materialized: table

Changes:

@@ -14,6 +14,9 @@ profile: 'jaffle_shop'
 model-paths: ["models"]
 analysis-paths: ["analyses"]
 test-paths: ["tests"]
+
+
+
 seed-paths: ["seeds"]
 macro-paths: ["macros"]
 snapshot-paths: ["snapshots"]

File: notes.md

Original Content:

Setting up dbt

- used dbt-core, not dbt-cloud
	- warehouse (bigquery in this case) - structured-app-test
	- github repo - shivam-singhal/dbt-tutorial
		- ran `dbt init jaffle_shop` - this create a `jaffle_shop` and `logs` dir in `dbt-tutorial`
- dbt has models (which are select sql statements)
- These models can be composed of other models - hence a DAG structure. Each node can be run independently (given its dependencies are run too)
- Testing is built-in
- Version control is "built-in" via storing dbt configs in git (Github)
- commands
	- `dbt run` - run the sql queries against the data in the warehouse
		- `dbt run --full-refresh`
		- `dbt run --select <>` - to only run (or test specific models)
	- `dbt test` - validate that data has certain properties (e.g. non-null, unique, consists of certain values, etc.)
	- `dbt debug` - test .dbt/profiles.yml configuration (where bigquery connection information is stored)

What's missing are **metrics**. Lightdash takes the dbt models, and each column of the dbt model becomes a dimension of the table. 

How are Lightdash tables different from dbt models? is it 1:1?

`docker-compose -f docker-compose.yml --env-file .env up --detach --remove-orphans` to run lightdash locally from the repo

`docker exec -it <container_id> psql -U postgres -d postgres` to run psql from inside the postgres container on my local machine to inspect the postgres table

`lightdash preview` allows me to update `schema.yml` and have it updated in preview mode - so I don't always have to push to github and refresh

lightdash defines its own metrics - via the `schema.yml` file in the dbt project - these are other metas, like dimensions. The other way to add metrics is via dbt's semantic layer (availabe in dbt cloud).
- This is what we'd be replacing with our own metrics layer
- This is done using the `meta` tag in the `schema.yml` file that dbt uses.
    - this kinda sucks - it's mixing sql w/ yaml


MetricFlow
`dbt-metricflow`

Changes:

@@ -1,5 +1,8 @@
 Setting up dbt
 
+
+
+
 - used dbt-core, not dbt-cloud
 	- warehouse (bigquery in this case) - structured-app-test
 	- github repo - shivam-singhal/dbt-tutorial

All filepaths in the repo:
README.md
jaffle_shop/.gitignore
jaffle_shop/README.md
jaffle_shop/analyses/.gitkeep
jaffle_shop/dbt_project.yml
jaffle_shop/macros/.gitkeep
jaffle_shop/models/customers.sql
jaffle_shop/models/schema.yml
jaffle_shop/models/staging/stg_customers.sql
jaffle_shop/models/staging/stg_orders.sql
jaffle_shop/seeds/.gitkeep
jaffle_shop/snapshots/.gitkeep
jaffle_shop/tests/.gitkeep
logs/dbt.log
notes.md

@structured-bot-beta
Copy link

Thanks for opening this PR!

Total commits: 2
Files changed: 2
Additions: 6
Deletions: 0

Commits:
e0b3ddc: Update dbt_project.yml
0c1ec81: Update notes.md

Changes:
File: jaffle_shop/dbt_project.yml

Original Content:


# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'jaffle_shop'
version: '1.0.0'

# This setting configures which "profile" dbt uses for this project.
profile: 'jaffle_shop'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  jaffle_shop:
    # Config indicated by + and applies to all files under models/example/
    +materialized: table

Changes:

@@ -14,6 +14,9 @@ profile: 'jaffle_shop'
 model-paths: ["models"]
 analysis-paths: ["analyses"]
 test-paths: ["tests"]
+
+
+
 seed-paths: ["seeds"]
 macro-paths: ["macros"]
 snapshot-paths: ["snapshots"]

File: notes.md

Original Content:

Setting up dbt

- used dbt-core, not dbt-cloud
	- warehouse (bigquery in this case) - structured-app-test
	- github repo - shivam-singhal/dbt-tutorial
		- ran `dbt init jaffle_shop` - this create a `jaffle_shop` and `logs` dir in `dbt-tutorial`
- dbt has models (which are select sql statements)
- These models can be composed of other models - hence a DAG structure. Each node can be run independently (given its dependencies are run too)
- Testing is built-in
- Version control is "built-in" via storing dbt configs in git (Github)
- commands
	- `dbt run` - run the sql queries against the data in the warehouse
		- `dbt run --full-refresh`
		- `dbt run --select <>` - to only run (or test specific models)
	- `dbt test` - validate that data has certain properties (e.g. non-null, unique, consists of certain values, etc.)
	- `dbt debug` - test .dbt/profiles.yml configuration (where bigquery connection information is stored)

What's missing are **metrics**. Lightdash takes the dbt models, and each column of the dbt model becomes a dimension of the table. 

How are Lightdash tables different from dbt models? is it 1:1?

`docker-compose -f docker-compose.yml --env-file .env up --detach --remove-orphans` to run lightdash locally from the repo

`docker exec -it <container_id> psql -U postgres -d postgres` to run psql from inside the postgres container on my local machine to inspect the postgres table

`lightdash preview` allows me to update `schema.yml` and have it updated in preview mode - so I don't always have to push to github and refresh

lightdash defines its own metrics - via the `schema.yml` file in the dbt project - these are other metas, like dimensions. The other way to add metrics is via dbt's semantic layer (availabe in dbt cloud).
- This is what we'd be replacing with our own metrics layer
- This is done using the `meta` tag in the `schema.yml` file that dbt uses.
    - this kinda sucks - it's mixing sql w/ yaml


MetricFlow
`dbt-metricflow`

Changes:

@@ -1,5 +1,8 @@
 Setting up dbt
 
+
+
+
 - used dbt-core, not dbt-cloud
 	- warehouse (bigquery in this case) - structured-app-test
 	- github repo - shivam-singhal/dbt-tutorial

All filepaths in the repo:
README.md
jaffle_shop/.gitignore
jaffle_shop/README.md
jaffle_shop/analyses/.gitkeep
jaffle_shop/dbt_project.yml
jaffle_shop/macros/.gitkeep
jaffle_shop/models/customers.sql
jaffle_shop/models/schema.yml
jaffle_shop/models/staging/stg_customers.sql
jaffle_shop/models/staging/stg_orders.sql
jaffle_shop/seeds/.gitkeep
jaffle_shop/snapshots/.gitkeep
jaffle_shop/tests/.gitkeep
logs/dbt.log
notes.md

@structured-bot-beta
Copy link

Thanks for opening this PR!

Total commits: 2
Files changed: 2
Additions: 6
Deletions: 0

Commits:
e0b3ddc: Update dbt_project.yml
0c1ec81: Update notes.md

Changes:
File: jaffle_shop/dbt_project.yml

Original Content:


# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'jaffle_shop'
version: '1.0.0'

# This setting configures which "profile" dbt uses for this project.
profile: 'jaffle_shop'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  jaffle_shop:
    # Config indicated by + and applies to all files under models/example/
    +materialized: table

Changes:

@@ -14,6 +14,9 @@ profile: 'jaffle_shop'
 model-paths: ["models"]
 analysis-paths: ["analyses"]
 test-paths: ["tests"]
+
+
+
 seed-paths: ["seeds"]
 macro-paths: ["macros"]
 snapshot-paths: ["snapshots"]

File: notes.md

Original Content:

Setting up dbt

- used dbt-core, not dbt-cloud
	- warehouse (bigquery in this case) - structured-app-test
	- github repo - shivam-singhal/dbt-tutorial
		- ran `dbt init jaffle_shop` - this create a `jaffle_shop` and `logs` dir in `dbt-tutorial`
- dbt has models (which are select sql statements)
- These models can be composed of other models - hence a DAG structure. Each node can be run independently (given its dependencies are run too)
- Testing is built-in
- Version control is "built-in" via storing dbt configs in git (Github)
- commands
	- `dbt run` - run the sql queries against the data in the warehouse
		- `dbt run --full-refresh`
		- `dbt run --select <>` - to only run (or test specific models)
	- `dbt test` - validate that data has certain properties (e.g. non-null, unique, consists of certain values, etc.)
	- `dbt debug` - test .dbt/profiles.yml configuration (where bigquery connection information is stored)

What's missing are **metrics**. Lightdash takes the dbt models, and each column of the dbt model becomes a dimension of the table. 

How are Lightdash tables different from dbt models? is it 1:1?

`docker-compose -f docker-compose.yml --env-file .env up --detach --remove-orphans` to run lightdash locally from the repo

`docker exec -it <container_id> psql -U postgres -d postgres` to run psql from inside the postgres container on my local machine to inspect the postgres table

`lightdash preview` allows me to update `schema.yml` and have it updated in preview mode - so I don't always have to push to github and refresh

lightdash defines its own metrics - via the `schema.yml` file in the dbt project - these are other metas, like dimensions. The other way to add metrics is via dbt's semantic layer (availabe in dbt cloud).
- This is what we'd be replacing with our own metrics layer
- This is done using the `meta` tag in the `schema.yml` file that dbt uses.
    - this kinda sucks - it's mixing sql w/ yaml


MetricFlow
`dbt-metricflow`

Changes:

@@ -1,5 +1,8 @@
 Setting up dbt
 
+
+
+
 - used dbt-core, not dbt-cloud
 	- warehouse (bigquery in this case) - structured-app-test
 	- github repo - shivam-singhal/dbt-tutorial

All filepaths in the repo:
README.md
jaffle_shop/.gitignore
jaffle_shop/README.md
jaffle_shop/analyses/.gitkeep
jaffle_shop/dbt_project.yml
jaffle_shop/macros/.gitkeep
jaffle_shop/models/customers.sql
jaffle_shop/models/schema.yml
jaffle_shop/models/staging/stg_customers.sql
jaffle_shop/models/staging/stg_orders.sql
jaffle_shop/seeds/.gitkeep
jaffle_shop/snapshots/.gitkeep
jaffle_shop/tests/.gitkeep
logs/dbt.log
notes.md

@structured-bot-beta
Copy link

Thanks for opening this PR!

Total commits: 2
Files changed: 2
Additions: 6
Deletions: 0

Commits:
e0b3ddc: Update dbt_project.yml
0c1ec81: Update notes.md

Changes:
File: jaffle_shop/dbt_project.yml

Original Content:


# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'jaffle_shop'
version: '1.0.0'

# This setting configures which "profile" dbt uses for this project.
profile: 'jaffle_shop'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  jaffle_shop:
    # Config indicated by + and applies to all files under models/example/
    +materialized: table

Changes:

@@ -14,6 +14,9 @@ profile: 'jaffle_shop'
 model-paths: ["models"]
 analysis-paths: ["analyses"]
 test-paths: ["tests"]
+
+
+
 seed-paths: ["seeds"]
 macro-paths: ["macros"]
 snapshot-paths: ["snapshots"]

File: notes.md

Original Content:

Setting up dbt

- used dbt-core, not dbt-cloud
	- warehouse (bigquery in this case) - structured-app-test
	- github repo - shivam-singhal/dbt-tutorial
		- ran `dbt init jaffle_shop` - this create a `jaffle_shop` and `logs` dir in `dbt-tutorial`
- dbt has models (which are select sql statements)
- These models can be composed of other models - hence a DAG structure. Each node can be run independently (given its dependencies are run too)
- Testing is built-in
- Version control is "built-in" via storing dbt configs in git (Github)
- commands
	- `dbt run` - run the sql queries against the data in the warehouse
		- `dbt run --full-refresh`
		- `dbt run --select <>` - to only run (or test specific models)
	- `dbt test` - validate that data has certain properties (e.g. non-null, unique, consists of certain values, etc.)
	- `dbt debug` - test .dbt/profiles.yml configuration (where bigquery connection information is stored)

What's missing are **metrics**. Lightdash takes the dbt models, and each column of the dbt model becomes a dimension of the table. 

How are Lightdash tables different from dbt models? is it 1:1?

`docker-compose -f docker-compose.yml --env-file .env up --detach --remove-orphans` to run lightdash locally from the repo

`docker exec -it <container_id> psql -U postgres -d postgres` to run psql from inside the postgres container on my local machine to inspect the postgres table

`lightdash preview` allows me to update `schema.yml` and have it updated in preview mode - so I don't always have to push to github and refresh

lightdash defines its own metrics - via the `schema.yml` file in the dbt project - these are other metas, like dimensions. The other way to add metrics is via dbt's semantic layer (availabe in dbt cloud).
- This is what we'd be replacing with our own metrics layer
- This is done using the `meta` tag in the `schema.yml` file that dbt uses.
    - this kinda sucks - it's mixing sql w/ yaml


MetricFlow
`dbt-metricflow`

Changes:

@@ -1,5 +1,8 @@
 Setting up dbt
 
+
+
+
 - used dbt-core, not dbt-cloud
 	- warehouse (bigquery in this case) - structured-app-test
 	- github repo - shivam-singhal/dbt-tutorial

All filepaths in the repo:
README.md
jaffle_shop/.gitignore
jaffle_shop/README.md
jaffle_shop/analyses/.gitkeep
jaffle_shop/dbt_project.yml
jaffle_shop/macros/.gitkeep
jaffle_shop/models/customers.sql
jaffle_shop/models/schema.yml
jaffle_shop/models/staging/stg_customers.sql
jaffle_shop/models/staging/stg_orders.sql
jaffle_shop/seeds/.gitkeep
jaffle_shop/snapshots/.gitkeep
jaffle_shop/tests/.gitkeep
logs/dbt.log
notes.md

@structured-bot-beta
Copy link

Thanks for opening this PR!

Total commits: 2
Files changed: 2
Additions: 6
Deletions: 0

Commits:
e0b3ddc: Update dbt_project.yml
0c1ec81: Update notes.md

Changes:
File: jaffle_shop/dbt_project.yml

Original Content:


# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'jaffle_shop'
version: '1.0.0'

# This setting configures which "profile" dbt uses for this project.
profile: 'jaffle_shop'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  jaffle_shop:
    # Config indicated by + and applies to all files under models/example/
    +materialized: table

Changes:

@@ -14,6 +14,9 @@ profile: 'jaffle_shop'
 model-paths: ["models"]
 analysis-paths: ["analyses"]
 test-paths: ["tests"]
+
+
+
 seed-paths: ["seeds"]
 macro-paths: ["macros"]
 snapshot-paths: ["snapshots"]

File: notes.md

Original Content:

Setting up dbt

- used dbt-core, not dbt-cloud
	- warehouse (bigquery in this case) - structured-app-test
	- github repo - shivam-singhal/dbt-tutorial
		- ran `dbt init jaffle_shop` - this create a `jaffle_shop` and `logs` dir in `dbt-tutorial`
- dbt has models (which are select sql statements)
- These models can be composed of other models - hence a DAG structure. Each node can be run independently (given its dependencies are run too)
- Testing is built-in
- Version control is "built-in" via storing dbt configs in git (Github)
- commands
	- `dbt run` - run the sql queries against the data in the warehouse
		- `dbt run --full-refresh`
		- `dbt run --select <>` - to only run (or test specific models)
	- `dbt test` - validate that data has certain properties (e.g. non-null, unique, consists of certain values, etc.)
	- `dbt debug` - test .dbt/profiles.yml configuration (where bigquery connection information is stored)

What's missing are **metrics**. Lightdash takes the dbt models, and each column of the dbt model becomes a dimension of the table. 

How are Lightdash tables different from dbt models? is it 1:1?

`docker-compose -f docker-compose.yml --env-file .env up --detach --remove-orphans` to run lightdash locally from the repo

`docker exec -it <container_id> psql -U postgres -d postgres` to run psql from inside the postgres container on my local machine to inspect the postgres table

`lightdash preview` allows me to update `schema.yml` and have it updated in preview mode - so I don't always have to push to github and refresh

lightdash defines its own metrics - via the `schema.yml` file in the dbt project - these are other metas, like dimensions. The other way to add metrics is via dbt's semantic layer (availabe in dbt cloud).
- This is what we'd be replacing with our own metrics layer
- This is done using the `meta` tag in the `schema.yml` file that dbt uses.
    - this kinda sucks - it's mixing sql w/ yaml


MetricFlow
`dbt-metricflow`

Changes:

@@ -1,5 +1,8 @@
 Setting up dbt
 
+
+
+
 - used dbt-core, not dbt-cloud
 	- warehouse (bigquery in this case) - structured-app-test
 	- github repo - shivam-singhal/dbt-tutorial

All filepaths in the repo:
README.md
jaffle_shop/.gitignore
jaffle_shop/README.md
jaffle_shop/analyses/.gitkeep
jaffle_shop/dbt_project.yml
jaffle_shop/macros/.gitkeep
jaffle_shop/models/customers.sql
jaffle_shop/models/schema.yml
jaffle_shop/models/staging/stg_customers.sql
jaffle_shop/models/staging/stg_orders.sql
jaffle_shop/seeds/.gitkeep
jaffle_shop/snapshots/.gitkeep
jaffle_shop/tests/.gitkeep
logs/dbt.log
notes.md

@structured-bot-beta
Copy link

Modularity Analysis

The changes to the dbt_project.yml file do not significantly impact modularity. However, reviewing the existing models (customers.sql, stg_customers.sql, stg_orders.sql) shows potential for improvement. Consider breaking down the customers.sql model into smaller, reusable components, particularly the customer_orders CTE, which could be extracted into a separate intermediate model.


Versioning Analysis

The changes in dbt_project.yml and notes.md are minor and don't affect model versioning. However, it's good practice to consider adding version tags or comments to models (e.g., customers.sql) when making significant changes to track iterations and maintain clarity.


Grouping And Folder Structure Analysis

The project structure follows DBT best practices with staging models in the staging/ directory and the final customers model in the root models/ directory. However, consider creating a marts/ directory for the final customers model to better separate staging and transformed data.


Access Control Analysis

The changes in the dbt_project.yml and notes.md files do not directly impact access control. However, it's important to ensure that any new models or changes to existing models, especially those dealing with customer data in customers.sql and stg_customers.sql, have appropriate access controls in place to protect sensitive information.


Naming Conventions Analysis

The naming conventions appear to be consistent across the project, following snake_case for model and column names. Staging models are prefixed with 'stg_', which is a good practice. No significant issues or improvements are needed regarding naming conventions in these changes.


Testing Coverage Analysis

The changes in the dbt_project.yml and notes.md files don't impact test coverage. However, the existing models (customers.sql, stg_customers.sql, stg_orders.sql) lack sufficient tests for critical fields. Consider adding not_null and unique tests for customer_id in customers.sql, and additional tests for important fields like order_date and status in stg_orders.sql to ensure data quality.


Config Best Practices Analysis

The current configuration uses table materialization for all models in jaffle_shop. Consider using view materialization for staging models (stg_customers, stg_orders) and incremental materialization for the customers model to optimize performance and resource usage.


Sql Performance And Efficiency Analysis

The use of SELECT * in the customers.sql model can potentially impact performance. Consider explicitly listing only the required columns. Also, ensure that appropriate indexes are in place, particularly on the customer_id column used in the LEFT JOIN, to optimize query execution.


Avoiding Anti Patterns In S Q L Analysis

The models in customers.sql and staging files (stg_customers.sql, stg_orders.sql) use SELECT * or select all columns, which is an anti-pattern. Consider explicitly listing required columns to improve performance and maintainability. Also, ensure transformations are done in the appropriate layer (staging vs modeling).


Adherence To Data Contracts Analysis

The removal of 'customers.last_name' field in customers.sql may break existing data contracts. Ensure this change is communicated to downstream teams and update relevant documentation. Consider the impact on data quality and completeness before finalizing this change.


Data Lineage Tracking Analysis

The changes do not introduce new transformations or key metrics. However, it's worth noting that the existing models (customers, stg_customers, stg_orders) could benefit from additional data lineage documentation. Consider adding comments or descriptions to track the flow of key metrics like customer_id, order_date, and number_of_orders from source to final output.


Handling Nulls And Defaults Analysis

The changes in the customers.sql file introduce proper null handling for the number_of_orders field using COALESCE. However, other fields like first_order_date and most_recent_order_date might still contain null values. Consider using COALESCE or IFNULL for these fields as well to ensure consistent null handling across the model.


Jinja And Macro Reusability Analysis

No significant changes related to Jinja and macro reusability were observed in this update. The existing models (customers, stg_customers, stg_orders) appear to be using basic Jinja templating for table references, but there's potential to extract common logic into macros for better maintainability.


Managing Data Freshness And Validity Analysis

The source models (stg_customers and stg_orders) lack freshness checks. Consider adding freshness configurations in the schema.yml file for these models to ensure data is up-to-date. For example, add a daily freshness check for the orders table as it's likely time-sensitive.


Incremental Model Optimization Analysis

The changes do not appear to introduce or modify any incremental models. However, it's worth noting that the existing models (customers, stg_customers, stg_orders) are configured as views, which are recomputed on each query. Consider using incremental models with appropriate filters for large datasets to improve performance.


Dependency Management Analysis

The changes appear to be minor and don't directly affect dependency management. However, it's worth noting that the existing models (customers.sql, stg_customers.sql, stg_orders.sql) are already using the ref() function correctly, which is a good practice for managing dependencies in DBT.


Documentation And Descriptions Analysis

The changes in the dbt_project.yml file added empty lines, which don't impact functionality. However, the models in schema.yml lack comprehensive descriptions for some columns (e.g., first_name, most_recent_order_date, number_of_orders). Consider adding detailed descriptions to improve documentation and clarity for other team members.


Semantic Layer Consistency Analysis

The changes in dbt_project.yml and notes.md are minor and do not affect the semantic layer. However, it's important to note that the total_order_count metric is defined only for the stg_orders model. Consider standardizing this metric across relevant models for consistency in the semantic layer.

@structured-bot-beta
Copy link

Thanks for opening this PR!

Total commits: 2
Files changed: 2
Additions: 6
Deletions: 0

Commits:
e0b3ddc: Update dbt_project.yml
0c1ec81: Update notes.md

Changes:
File: jaffle_shop/dbt_project.yml

Original Content:


# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'jaffle_shop'
version: '1.0.0'

# This setting configures which "profile" dbt uses for this project.
profile: 'jaffle_shop'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  jaffle_shop:
    # Config indicated by + and applies to all files under models/example/
    +materialized: table

Changes:

@@ -14,6 +14,9 @@ profile: 'jaffle_shop'
 model-paths: ["models"]
 analysis-paths: ["analyses"]
 test-paths: ["tests"]
+
+
+
 seed-paths: ["seeds"]
 macro-paths: ["macros"]
 snapshot-paths: ["snapshots"]

File: notes.md

Original Content:

Setting up dbt

- used dbt-core, not dbt-cloud
	- warehouse (bigquery in this case) - structured-app-test
	- github repo - shivam-singhal/dbt-tutorial
		- ran `dbt init jaffle_shop` - this create a `jaffle_shop` and `logs` dir in `dbt-tutorial`
- dbt has models (which are select sql statements)
- These models can be composed of other models - hence a DAG structure. Each node can be run independently (given its dependencies are run too)
- Testing is built-in
- Version control is "built-in" via storing dbt configs in git (Github)
- commands
	- `dbt run` - run the sql queries against the data in the warehouse
		- `dbt run --full-refresh`
		- `dbt run --select <>` - to only run (or test specific models)
	- `dbt test` - validate that data has certain properties (e.g. non-null, unique, consists of certain values, etc.)
	- `dbt debug` - test .dbt/profiles.yml configuration (where bigquery connection information is stored)

What's missing are **metrics**. Lightdash takes the dbt models, and each column of the dbt model becomes a dimension of the table. 

How are Lightdash tables different from dbt models? is it 1:1?

`docker-compose -f docker-compose.yml --env-file .env up --detach --remove-orphans` to run lightdash locally from the repo

`docker exec -it <container_id> psql -U postgres -d postgres` to run psql from inside the postgres container on my local machine to inspect the postgres table

`lightdash preview` allows me to update `schema.yml` and have it updated in preview mode - so I don't always have to push to github and refresh

lightdash defines its own metrics - via the `schema.yml` file in the dbt project - these are other metas, like dimensions. The other way to add metrics is via dbt's semantic layer (availabe in dbt cloud).
- This is what we'd be replacing with our own metrics layer
- This is done using the `meta` tag in the `schema.yml` file that dbt uses.
    - this kinda sucks - it's mixing sql w/ yaml


MetricFlow
`dbt-metricflow`

Changes:

@@ -1,5 +1,8 @@
 Setting up dbt
 
+
+
+
 - used dbt-core, not dbt-cloud
 	- warehouse (bigquery in this case) - structured-app-test
 	- github repo - shivam-singhal/dbt-tutorial

All filepaths in the repo:
README.md
jaffle_shop/.gitignore
jaffle_shop/README.md
jaffle_shop/analyses/.gitkeep
jaffle_shop/dbt_project.yml
jaffle_shop/macros/.gitkeep
jaffle_shop/models/customers.sql
jaffle_shop/models/schema.yml
jaffle_shop/models/staging/stg_customers.sql
jaffle_shop/models/staging/stg_orders.sql
jaffle_shop/seeds/.gitkeep
jaffle_shop/snapshots/.gitkeep
jaffle_shop/tests/.gitkeep
logs/dbt.log
notes.md

@structured-bot-beta
Copy link

Modularity Analysis

The changes in dbt_project.yml and notes.md appear to be minor whitespace adjustments and don't affect modularity. However, the existing models (customers.sql, stg_customers.sql, stg_orders.sql) could benefit from further modularization. Consider breaking down the customers.sql model into smaller, reusable components for improved maintainability.


Versioning Analysis

The changes in this PR do not directly affect model versioning. However, it's a good practice to consider adding version tags or comments to models, especially when making significant changes. This helps track model evolution over time.


Grouping And Folder Structure Analysis

The current folder structure follows DBT best practices by separating staging models into a dedicated 'staging' subdirectory. However, consider creating a 'marts' or 'core' directory for the 'customers' model, which appears to be a more refined, business-facing model. This separation will improve scalability and clarity as the project grows.


Access Control Analysis

The changes in the project configuration and notes file don't directly impact data access control. However, it's important to ensure that any sensitive customer or order data in the models (customers.sql, stg_customers.sql, stg_orders.sql) have appropriate access controls. Consider implementing column-level security or row-level security if needed.


Naming Conventions Analysis

The naming conventions are generally consistent, using snake_case for model and column names. The 'stg_' prefix is appropriately used for staging models. However, consider standardizing capitalization in comments and descriptions for improved readability.


Testing Coverage Analysis

The customer_id field in stg_customers.sql and stg_orders.sql lacks a not_null test. Consider adding this test to ensure data integrity. Additionally, the status field in stg_orders.sql could benefit from a not_null test to validate order status consistency.


Config Best Practices Analysis

The current configuration uses 'table' materialization for all models in the jaffle_shop project. Consider using 'view' for staging models and 'incremental' for the customers model to optimize performance. Evaluate each model's update frequency and query patterns to determine the most appropriate materialization strategy.


Sql Performance And Efficiency Analysis

The use of SELECT * in the customers CTE of the customers.sql model is not recommended for performance. Consider explicitly listing required columns. Additionally, the LEFT JOIN in the final CTE could potentially be optimized if an inner join is sufficient based on the business logic.


Avoiding Anti Patterns In S Q L Analysis

The customers.sql model uses SELECT * in its CTE definitions, which is an anti-pattern that can impact performance and readability. Consider explicitly listing needed columns in the CTEs. Also, ensure that any transformations are done in the appropriate layer (staging vs. modeling).


Adherence To Data Contracts Analysis

The removal of customers.last_name field in customers.sql might break existing data contracts. Ensure this change is communicated to downstream teams and update any dependent models or data contracts accordingly. Also, verify if the added blank lines in dbt_project.yml and notes.md have any unintended consequences.


Data Lineage Tracking Analysis

The changes in the dbt_project.yml and notes.md files don't introduce any new transformations or key metrics. However, it's a good practice to document data lineage for existing models like customers.sql and stg_orders.sql. Consider adding comments or descriptions in schema.yml to track the flow of key metrics such as number_of_orders and total_order_count from source to final output.


Handling Nulls And Defaults Analysis

The changes don't directly impact null handling, but it's worth noting that the customers.sql model uses COALESCE for number_of_orders, which is a good practice. However, other fields like first_order_date and most_recent_order_date might benefit from similar null handling to prevent potential issues in downstream transformations.


Jinja And Macro Reusability Analysis

No significant changes related to Jinja and macro reusability were observed in this update. The existing models and schema files don't show any duplicate logic that could be extracted into macros. Consider reviewing the entire project for opportunities to implement reusable macros for common calculations or transformations.


Managing Data Freshness And Validity Analysis

The changes don't address data freshness or validity checks. Consider adding freshness checks to the staging models (stg_customers and stg_orders) to ensure data is up-to-date. For example, you could add a 'loaded_at' timestamp column and set up freshness tests in the schema.yml file.


Incremental Model Optimization Analysis

The changes do not directly impact incremental model optimization. However, it's worth noting that none of the models shown (customers, stg_customers, stg_orders) are configured as incremental. Consider implementing incremental models with appropriate filters for large datasets to improve performance.


Dependency Management Analysis

The changes appear to be minor and do not directly impact dependency management. However, it's worth noting that the existing models (customers.sql, stg_customers.sql, and stg_orders.sql) are already using the ref() function appropriately, which is a good practice for managing dependencies in DBT.


Documentation And Descriptions Analysis

The changes don't address documentation issues. The 'customers' model in schema.yml lacks descriptions for 'first_name' and 'number_of_orders' columns. Consider adding meaningful descriptions to improve model clarity and maintainability.


Semantic Layer Consistency Analysis

The total_order_count metric in stg_orders.sql is defined as a simple count, which may not align with standard definitions across other models. It's recommended to review and standardize this metric definition to ensure consistency in the semantic layer.

@structured-bot-beta
Copy link

Thanks for opening this PR!

Total commits: 2
Files changed: 2
Additions: 6
Deletions: 0

Commits:
e0b3ddc: Update dbt_project.yml
0c1ec81: Update notes.md

Changes:
File: jaffle_shop/dbt_project.yml

Original Content:


# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'jaffle_shop'
version: '1.0.0'

# This setting configures which "profile" dbt uses for this project.
profile: 'jaffle_shop'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  jaffle_shop:
    # Config indicated by + and applies to all files under models/example/
    +materialized: table

Changes:

@@ -14,6 +14,9 @@ profile: 'jaffle_shop'
 model-paths: ["models"]
 analysis-paths: ["analyses"]
 test-paths: ["tests"]
+
+
+
 seed-paths: ["seeds"]
 macro-paths: ["macros"]
 snapshot-paths: ["snapshots"]

File: notes.md

Original Content:

Setting up dbt

- used dbt-core, not dbt-cloud
	- warehouse (bigquery in this case) - structured-app-test
	- github repo - shivam-singhal/dbt-tutorial
		- ran `dbt init jaffle_shop` - this create a `jaffle_shop` and `logs` dir in `dbt-tutorial`
- dbt has models (which are select sql statements)
- These models can be composed of other models - hence a DAG structure. Each node can be run independently (given its dependencies are run too)
- Testing is built-in
- Version control is "built-in" via storing dbt configs in git (Github)
- commands
	- `dbt run` - run the sql queries against the data in the warehouse
		- `dbt run --full-refresh`
		- `dbt run --select <>` - to only run (or test specific models)
	- `dbt test` - validate that data has certain properties (e.g. non-null, unique, consists of certain values, etc.)
	- `dbt debug` - test .dbt/profiles.yml configuration (where bigquery connection information is stored)

What's missing are **metrics**. Lightdash takes the dbt models, and each column of the dbt model becomes a dimension of the table. 

How are Lightdash tables different from dbt models? is it 1:1?

`docker-compose -f docker-compose.yml --env-file .env up --detach --remove-orphans` to run lightdash locally from the repo

`docker exec -it <container_id> psql -U postgres -d postgres` to run psql from inside the postgres container on my local machine to inspect the postgres table

`lightdash preview` allows me to update `schema.yml` and have it updated in preview mode - so I don't always have to push to github and refresh

lightdash defines its own metrics - via the `schema.yml` file in the dbt project - these are other metas, like dimensions. The other way to add metrics is via dbt's semantic layer (availabe in dbt cloud).
- This is what we'd be replacing with our own metrics layer
- This is done using the `meta` tag in the `schema.yml` file that dbt uses.
    - this kinda sucks - it's mixing sql w/ yaml


MetricFlow
`dbt-metricflow`

Changes:

@@ -1,5 +1,8 @@
 Setting up dbt
 
+
+
+
 - used dbt-core, not dbt-cloud
 	- warehouse (bigquery in this case) - structured-app-test
 	- github repo - shivam-singhal/dbt-tutorial

All filepaths in the repo:
README.md
jaffle_shop/.gitignore
jaffle_shop/README.md
jaffle_shop/analyses/.gitkeep
jaffle_shop/dbt_project.yml
jaffle_shop/macros/.gitkeep
jaffle_shop/models/customers.sql
jaffle_shop/models/schema.yml
jaffle_shop/models/staging/stg_customers.sql
jaffle_shop/models/staging/stg_orders.sql
jaffle_shop/seeds/.gitkeep
jaffle_shop/snapshots/.gitkeep
jaffle_shop/tests/.gitkeep
logs/dbt.log
notes.md

@structured-bot-beta
Copy link

Modularity Analysis

The changes in the dbt_project.yml file are minor and don't affect modularity. However, the existing customers.sql model could be refactored into smaller components. Consider creating separate models for customer_orders and final calculations to improve modularity and reusability.


Versioning Analysis

The changes in dbt_project.yml and notes.md are minor and don't require versioning. However, it's a good practice to consider adding version tags or comments to models (e.g., customers.sql) when making significant changes to track iterations and maintain clarity between versions.


Grouping And Folder Structure Analysis

The current folder structure follows DBT best practices by separating staging models (stg_customers.sql, stg_orders.sql) from the final model (customers.sql). This organization promotes clarity and scalability. Consider creating a 'marts' directory for the customers.sql model to further improve the structure.


Access Control Analysis

The changes in dbt_project.yml and notes.md appear to be minor and do not directly impact access control. However, it's important to review the existing models (customers.sql, stg_customers.sql, stg_orders.sql) to ensure they don't expose sensitive customer information without proper access controls. Consider adding appropriate GRANT statements or role-based access controls to these models if they contain sensitive data.


Naming Conventions Analysis

The naming conventions in the project are generally consistent, using snake_case for model and column names. The 'stg_' prefix is appropriately used for staging models. However, consider adding a prefix like 'int_' or 'fct_' to the 'customers' model to indicate its role in the project structure.


Testing Coverage Analysis

The customer_id field in stg_orders.sql lacks a unique test, which is important for ensuring data integrity. Consider adding a unique test for this field. Additionally, the status field in stg_orders.sql could benefit from a not_null test to ensure all orders have a valid status.


Config Best Practices Analysis

The current configuration uses table materialization for all models in the jaffle_shop project. Consider using view materialization for staging models (stg_customers and stg_orders) to improve performance and reduce storage costs. For the customers model, evaluate if incremental materialization would be beneficial based on data volume and update frequency.


Sql Performance And Efficiency Analysis

The use of SELECT * in the customers CTE of the customers.sql model can potentially impact performance. Consider explicitly listing only the required columns. Additionally, ensure that appropriate indexes are in place on the customer_id column in both customers and orders tables to optimize join performance.


Avoiding Anti Patterns In S Q L Analysis

The customers.sql model uses SELECT * in CTEs, which can lead to performance issues and make the code less maintainable. Consider explicitly listing only the required columns in the CTEs. Also, ensure that the final SELECT * is necessary; if not, replace it with specific column selections.


Adherence To Data Contracts Analysis

The removal of customers.last_name from the final select statement in customers.sql may break existing data contracts. Ensure this change is communicated to downstream teams and update any dependent processes. Consider adding a deprecation notice if this field is being phased out.


Data Lineage Tracking Analysis

The changes don't introduce new transformations or key metrics. However, it's a good practice to document data lineage for existing models. Consider adding comments in the SQL files (e.g., customers.sql, stg_orders.sql) to describe the data flow and any important transformations.


Handling Nulls And Defaults Analysis

The changes do not directly impact null handling or default values. However, in the customers.sql file, there's already good use of COALESCE for number_of_orders. Consider applying similar null handling to other fields like first_order_date and most_recent_order_date to ensure consistency across the model.


Jinja And Macro Reusability Analysis

The changes don't introduce any new Jinja templates or macros. However, consider creating a macro for date-related operations (e.g., min, max) used in the customers.sql model to improve reusability across the project.


Managing Data Freshness And Validity Analysis

The current changes do not address data freshness and validity checks. Consider adding freshness configurations to the source models, especially for time-sensitive data like orders. For example, add a freshness check to stg_orders.sql with an appropriate warning or error threshold based on the expected update frequency of the source data.


Incremental Model Optimization Analysis

The changes do not directly affect incremental model optimization. However, it's worth noting that none of the models (customers, stg_customers, stg_orders) are configured as incremental. Consider implementing incremental strategies for these models if they process large datasets to improve efficiency.


Dependency Management Analysis

The changes appear minimal and don't affect dependency management. However, it's worth noting that the existing models (stg_customers.sql and stg_orders.sql) use direct table references. Consider using the ref() function for better dependency tracking and flexibility.


Documentation And Descriptions Analysis

The changes lack documentation updates. Consider adding descriptions for new models or columns in schema.yml, especially for 'stg_customers' and 'stg_orders'. Also, improve existing descriptions like "This model cleans up customer data" to provide more context on the data transformations performed.


Semantic Layer Consistency Analysis

The total_order_count metric is defined only in the stg_orders model. For consistency across the semantic layer, consider defining this metric in a centralized location or ensuring it's consistently applied across relevant models. Also, review if this metric aligns with any existing total order count definitions in other models.

@structured-bot-beta
Copy link

Thanks for opening this PR!

Total commits: 2
Files changed: 2
Additions: 6
Deletions: 0

Commits:
e0b3ddc: Update dbt_project.yml
0c1ec81: Update notes.md

Changes:
File: jaffle_shop/dbt_project.yml

Original Content:


# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'jaffle_shop'
version: '1.0.0'

# This setting configures which "profile" dbt uses for this project.
profile: 'jaffle_shop'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  jaffle_shop:
    # Config indicated by + and applies to all files under models/example/
    +materialized: table

Changes:

@@ -14,6 +14,9 @@ profile: 'jaffle_shop'
 model-paths: ["models"]
 analysis-paths: ["analyses"]
 test-paths: ["tests"]
+
+
+
 seed-paths: ["seeds"]
 macro-paths: ["macros"]
 snapshot-paths: ["snapshots"]

File: notes.md

Original Content:

Setting up dbt

- used dbt-core, not dbt-cloud
	- warehouse (bigquery in this case) - structured-app-test
	- github repo - shivam-singhal/dbt-tutorial
		- ran `dbt init jaffle_shop` - this create a `jaffle_shop` and `logs` dir in `dbt-tutorial`
- dbt has models (which are select sql statements)
- These models can be composed of other models - hence a DAG structure. Each node can be run independently (given its dependencies are run too)
- Testing is built-in
- Version control is "built-in" via storing dbt configs in git (Github)
- commands
	- `dbt run` - run the sql queries against the data in the warehouse
		- `dbt run --full-refresh`
		- `dbt run --select <>` - to only run (or test specific models)
	- `dbt test` - validate that data has certain properties (e.g. non-null, unique, consists of certain values, etc.)
	- `dbt debug` - test .dbt/profiles.yml configuration (where bigquery connection information is stored)

What's missing are **metrics**. Lightdash takes the dbt models, and each column of the dbt model becomes a dimension of the table. 

How are Lightdash tables different from dbt models? is it 1:1?

`docker-compose -f docker-compose.yml --env-file .env up --detach --remove-orphans` to run lightdash locally from the repo

`docker exec -it <container_id> psql -U postgres -d postgres` to run psql from inside the postgres container on my local machine to inspect the postgres table

`lightdash preview` allows me to update `schema.yml` and have it updated in preview mode - so I don't always have to push to github and refresh

lightdash defines its own metrics - via the `schema.yml` file in the dbt project - these are other metas, like dimensions. The other way to add metrics is via dbt's semantic layer (availabe in dbt cloud).
- This is what we'd be replacing with our own metrics layer
- This is done using the `meta` tag in the `schema.yml` file that dbt uses.
    - this kinda sucks - it's mixing sql w/ yaml


MetricFlow
`dbt-metricflow`

Changes:

@@ -1,5 +1,8 @@
 Setting up dbt
 
+
+
+
 - used dbt-core, not dbt-cloud
 	- warehouse (bigquery in this case) - structured-app-test
 	- github repo - shivam-singhal/dbt-tutorial

All filepaths in the repo:
README.md
jaffle_shop/.gitignore
jaffle_shop/README.md
jaffle_shop/analyses/.gitkeep
jaffle_shop/dbt_project.yml
jaffle_shop/macros/.gitkeep
jaffle_shop/models/customers.sql
jaffle_shop/models/schema.yml
jaffle_shop/models/staging/stg_customers.sql
jaffle_shop/models/staging/stg_orders.sql
jaffle_shop/seeds/.gitkeep
jaffle_shop/snapshots/.gitkeep
jaffle_shop/tests/.gitkeep
logs/dbt.log
notes.md

@structured-bot-beta
Copy link

Modularity Analysis

The changes in dbt_project.yml and notes.md are minor and don't affect modularity. The existing models (customers.sql, stg_customers.sql, stg_orders.sql) appear to follow good modularity practices, with separate staging models and a final customer model. No immediate refactoring needed.


Versioning Analysis

The changes made to the dbt_project.yml file do not include any versioning updates. Consider adding version tags or comments to models that have been modified to track changes over time. This helps in maintaining clarity between iterations of models and facilitates easier tracking of modifications.


Grouping And Folder Structure Analysis

The current folder structure follows DBT best practices by separating staging models (stg_customers.sql, stg_orders.sql) from the final model (customers.sql). This organization promotes clarity and scalability. Consider creating a 'marts' directory for the customers.sql model to further improve the structure.


Access Control Analysis

No significant access control changes detected in this update. However, it's recommended to review the 'customers' model in jaffle_shop/models/customers.sql to ensure sensitive customer information is properly protected, especially if it contains personally identifiable information (PII).


Naming Conventions Analysis

The naming conventions in the provided files generally follow best practices, using snake_case for model and column names. The use of prefixes like 'stg_' for staging models is consistent. However, consider using a prefix for the 'customers' model to indicate its role in the project structure.


Testing Coverage Analysis

The stg_orders model lacks tests for critical fields like customer_id and order_date. Consider adding not_null tests for these fields. Additionally, the customers model could benefit from a unique test on the customer_id field to ensure data integrity.


Config Best Practices Analysis

The current configuration uses table materialization for all models. Consider using view materialization for staging models (stg_customers, stg_orders) to reduce storage and improve flexibility. For the customers model, evaluate if incremental materialization would be beneficial based on data volume and update frequency.


Sql Performance And Efficiency Analysis

The use of SELECT * in the customers.sql model can negatively impact query performance. Consider explicitly listing required columns instead. Additionally, ensure that appropriate indexes are in place for the join conditions and grouped columns to optimize query execution.


Avoiding Anti Patterns In S Q L Analysis

The customers.sql model uses SELECT * in CTE queries, which is an anti-pattern. Consider explicitly listing required columns for better performance and maintainability. Additionally, the final SELECT * could be replaced with specific column selection to improve query clarity and efficiency.


Adherence To Data Contracts Analysis

The removal of the last_name field from the customers model in customers.sql may break existing data contracts. Ensure this change is communicated to downstream teams and update any affected data contracts. Consider the impact on data completeness and user identification.


Data Lineage Tracking Analysis

The changes in jaffle_shop/dbt_project.yml and notes.md appear to be minor formatting adjustments. However, it's important to note that no new transformations or key metrics were introduced in this update. To improve data lineage, consider adding documentation for existing key metrics in the models, especially in the stg_orders.sql file where a total_order_count metric is defined.


Handling Nulls And Defaults Analysis

The changes do not directly affect null handling or default values. However, in the existing customers.sql model, the COALESCE function is used for number_of_orders, which is a good practice. Consider applying similar null handling to other fields like first_order_date and most_recent_order_date to ensure consistency.


Jinja And Macro Reusability Analysis

No significant changes related to Jinja templates or macros were observed in this update. The modifications appear to be minor spacing adjustments in configuration files. Consider exploring opportunities to implement reusable macros or Jinja templates in future updates to improve code maintainability.


Managing Data Freshness And Validity Analysis

The current changes do not address data freshness and validity checks. Consider adding freshness configurations to source models, especially for time-sensitive data like 'stg_orders'. Implement checks to ensure data is up-to-date and accurate, such as daily freshness checks for the 'order_date' column in 'stg_orders'.


Incremental Model Optimization Analysis

The changes in dbt_project.yml and notes.md appear to be minor formatting updates. However, there are no visible changes to the incremental models or their configurations. It's recommended to review the incremental strategy for models like customers.sql and stg_orders.sql to ensure they're only processing new or updated records for optimal performance.


Dependency Management Analysis

The changes in the dbt_project.yml file do not affect dependency management. However, it's worth noting that the existing models (customers.sql, stg_customers.sql, and stg_orders.sql) already use the ref() function correctly for managing dependencies, which is a good practice.


Documentation And Descriptions Analysis

The changes to jaffle_shop/dbt_project.yml and notes.md appear to be minor whitespace additions. However, it's important to note that the models (customers.sql, stg_customers.sql, stg_orders.sql) and schema.yml lack comprehensive documentation. Consider adding descriptions for models and key fields to improve clarity and maintainability.


Semantic Layer Consistency Analysis

The metric 'total_order_count' in stg_orders.sql is defined as a simple count, which may not align with the standard definition of total revenue across other models. Consider standardizing this metric or clarifying its purpose if it's intentionally different.

@structured-bot-beta
Copy link

Thanks for opening this PR!

Total commits: 2
Files changed: 2
Additions: 6
Deletions: 0

Commits:
e0b3ddc: Update dbt_project.yml
0c1ec81: Update notes.md

Changes:
File: jaffle_shop/dbt_project.yml

Original Content:


# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'jaffle_shop'
version: '1.0.0'

# This setting configures which "profile" dbt uses for this project.
profile: 'jaffle_shop'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  jaffle_shop:
    # Config indicated by + and applies to all files under models/example/
    +materialized: table

Changes:

@@ -14,6 +14,9 @@ profile: 'jaffle_shop'
 model-paths: ["models"]
 analysis-paths: ["analyses"]
 test-paths: ["tests"]
+
+
+
 seed-paths: ["seeds"]
 macro-paths: ["macros"]
 snapshot-paths: ["snapshots"]

File: notes.md

Original Content:

Setting up dbt

- used dbt-core, not dbt-cloud
	- warehouse (bigquery in this case) - structured-app-test
	- github repo - shivam-singhal/dbt-tutorial
		- ran `dbt init jaffle_shop` - this create a `jaffle_shop` and `logs` dir in `dbt-tutorial`
- dbt has models (which are select sql statements)
- These models can be composed of other models - hence a DAG structure. Each node can be run independently (given its dependencies are run too)
- Testing is built-in
- Version control is "built-in" via storing dbt configs in git (Github)
- commands
	- `dbt run` - run the sql queries against the data in the warehouse
		- `dbt run --full-refresh`
		- `dbt run --select <>` - to only run (or test specific models)
	- `dbt test` - validate that data has certain properties (e.g. non-null, unique, consists of certain values, etc.)
	- `dbt debug` - test .dbt/profiles.yml configuration (where bigquery connection information is stored)

What's missing are **metrics**. Lightdash takes the dbt models, and each column of the dbt model becomes a dimension of the table. 

How are Lightdash tables different from dbt models? is it 1:1?

`docker-compose -f docker-compose.yml --env-file .env up --detach --remove-orphans` to run lightdash locally from the repo

`docker exec -it <container_id> psql -U postgres -d postgres` to run psql from inside the postgres container on my local machine to inspect the postgres table

`lightdash preview` allows me to update `schema.yml` and have it updated in preview mode - so I don't always have to push to github and refresh

lightdash defines its own metrics - via the `schema.yml` file in the dbt project - these are other metas, like dimensions. The other way to add metrics is via dbt's semantic layer (availabe in dbt cloud).
- This is what we'd be replacing with our own metrics layer
- This is done using the `meta` tag in the `schema.yml` file that dbt uses.
    - this kinda sucks - it's mixing sql w/ yaml


MetricFlow
`dbt-metricflow`

Changes:

@@ -1,5 +1,8 @@
 Setting up dbt
 
+
+
+
 - used dbt-core, not dbt-cloud
 	- warehouse (bigquery in this case) - structured-app-test
 	- github repo - shivam-singhal/dbt-tutorial

All filepaths in the repo:
README.md
jaffle_shop/.gitignore
jaffle_shop/README.md
jaffle_shop/analyses/.gitkeep
jaffle_shop/dbt_project.yml
jaffle_shop/macros/.gitkeep
jaffle_shop/models/customers.sql
jaffle_shop/models/schema.yml
jaffle_shop/models/staging/stg_customers.sql
jaffle_shop/models/staging/stg_orders.sql
jaffle_shop/seeds/.gitkeep
jaffle_shop/snapshots/.gitkeep
jaffle_shop/tests/.gitkeep
logs/dbt.log
notes.md

@structured-bot-beta
Copy link

Modularity Analysis

The changes don't directly impact modularity. However, the existing 'customers.sql' model could benefit from being split into smaller, more focused models. Consider creating separate models for customer details and order summaries to improve modularity and reusability.


Versioning Analysis

The changes in dbt_project.yml and notes.md are minor and don't affect model logic. No versioning updates are required for these files. However, it's a good practice to consider adding version tags or comments to track significant changes in models when they occur.


Grouping And Folder Structure Analysis

The current folder structure follows DBT best practices by separating staging models into a 'staging' subdirectory. However, consider creating a 'marts' directory for the 'customers' model, as it appears to be a transformed/aggregated model rather than a staging model.


Access Control Analysis

The changes do not appear to introduce any new access control measures or expose sensitive data. However, it's recommended to review the existing models, particularly 'customers' and 'stg_customers', to ensure appropriate access controls are in place for potentially sensitive customer information.


Naming Conventions Analysis

The naming conventions in the reviewed files generally adhere to best practices, using snake_case for model and column names. However, consider adding prefixes like 'stg_' consistently for staging models (e.g., stg_customers, stg_orders) to improve clarity and organization in the project structure.


Testing Coverage Analysis

The changes in dbt_project.yml and notes.md are minor and don't affect model logic or testing. However, it's worth noting that the existing models (customers.sql, stg_customers.sql, stg_orders.sql) could benefit from additional tests, especially for critical fields like order_date and status in stg_orders.sql. Consider adding not_null tests for these fields to ensure data quality.


Config Best Practices Analysis

The current project-wide materialization strategy of 'table' in dbt_project.yml may not be optimal for all models. Consider using 'view' as the default and overriding to 'table' or 'incremental' for specific models that require it, based on their size and update frequency. This can improve query performance and resource utilization.


Sql Performance And Efficiency Analysis

The changes to dbt_project.yml and notes.md appear to be whitespace modifications and don't impact SQL performance. However, it's worth noting that the existing models (customers.sql, stg_customers.sql, stg_orders.sql) use SELECT * in some cases, which can impact performance. Consider specifying only needed columns to optimize query execution.


Avoiding Anti Patterns In S Q L Analysis

The changes in customers.sql include a SELECT * statement, which is an anti-pattern in SQL. It's recommended to explicitly list the required columns instead of using SELECT *. This improves query performance and readability. Consider refactoring this to specify only the needed columns.


Adherence To Data Contracts Analysis

The changes in jaffle_shop/dbt_project.yml and notes.md are minor and do not affect the data contracts. No schema changes or model output modifications were made that could impact existing data contracts or downstream dependencies.


Data Lineage Tracking Analysis

While there are no significant changes to the data lineage in this update, it's worth noting that the project structure remains consistent. However, for future changes, consider adding more comprehensive lineage documentation, especially when introducing new models or key business metrics, to enhance traceability and understanding of data flow.


Handling Nulls And Defaults Analysis

The changes don't directly affect null handling or default values. However, it's worth noting that the 'customers' model uses COALESCE for 'number_of_orders', which is a good practice. Consider applying similar null handling to other fields where appropriate.


Jinja And Macro Reusability Analysis

There are no significant changes related to Jinja templates or macro reusability in this update. The modifications are mainly whitespace changes in dbt_project.yml and notes.md. No new macros or templates were introduced, and no existing logic was refactored for improved reuse.


Managing Data Freshness And Validity Analysis

The changes in dbt_project.yml and notes.md don't address data freshness or validity. Consider adding freshness checks to the stg_orders source model, especially for the order_date column, to ensure timely data processing. Also, implement validity checks for critical fields like status in stg_orders.


Incremental Model Optimization Analysis

No changes related to incremental model optimization were observed in this update. The modifications appear to be minor adjustments to formatting and whitespace in the dbt_project.yml and notes.md files. No incremental models or filters were added or modified.


Dependency Management Analysis

The changes do not introduce any modifications to the SQL logic or model dependencies. The dbt_project.yml file has some minor whitespace changes, but these do not affect the project's functionality or dependency management.


Documentation And Descriptions Analysis

The changes don't address documentation issues. The jaffle_shop/models/schema.yml file lacks comprehensive descriptions for models and columns. Consider adding detailed descriptions to improve clarity and usability for other team members.


Semantic Layer Consistency Analysis

The changes in jaffle_shop/dbt_project.yml and notes.md do not affect semantic layer consistency. However, it's worth noting that the stg_orders model in schema.yml defines a total_order_count metric, which should be reviewed to ensure it aligns with any existing definitions of order count metrics across the project.

@structured-bot-beta
Copy link

Thanks for opening this PR!

Total commits: 2
Files changed: 2
Additions: 6
Deletions: 0

Commits:
e0b3ddc: Update dbt_project.yml
0c1ec81: Update notes.md

Changes:
File: jaffle_shop/dbt_project.yml

Original Content:


# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'jaffle_shop'
version: '1.0.0'

# This setting configures which "profile" dbt uses for this project.
profile: 'jaffle_shop'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  jaffle_shop:
    # Config indicated by + and applies to all files under models/example/
    +materialized: table

Changes:

@@ -14,6 +14,9 @@ profile: 'jaffle_shop'
 model-paths: ["models"]
 analysis-paths: ["analyses"]
 test-paths: ["tests"]
+
+
+
 seed-paths: ["seeds"]
 macro-paths: ["macros"]
 snapshot-paths: ["snapshots"]

File: notes.md

Original Content:

Setting up dbt

- used dbt-core, not dbt-cloud
	- warehouse (bigquery in this case) - structured-app-test
	- github repo - shivam-singhal/dbt-tutorial
		- ran `dbt init jaffle_shop` - this create a `jaffle_shop` and `logs` dir in `dbt-tutorial`
- dbt has models (which are select sql statements)
- These models can be composed of other models - hence a DAG structure. Each node can be run independently (given its dependencies are run too)
- Testing is built-in
- Version control is "built-in" via storing dbt configs in git (Github)
- commands
	- `dbt run` - run the sql queries against the data in the warehouse
		- `dbt run --full-refresh`
		- `dbt run --select <>` - to only run (or test specific models)
	- `dbt test` - validate that data has certain properties (e.g. non-null, unique, consists of certain values, etc.)
	- `dbt debug` - test .dbt/profiles.yml configuration (where bigquery connection information is stored)

What's missing are **metrics**. Lightdash takes the dbt models, and each column of the dbt model becomes a dimension of the table. 

How are Lightdash tables different from dbt models? is it 1:1?

`docker-compose -f docker-compose.yml --env-file .env up --detach --remove-orphans` to run lightdash locally from the repo

`docker exec -it <container_id> psql -U postgres -d postgres` to run psql from inside the postgres container on my local machine to inspect the postgres table

`lightdash preview` allows me to update `schema.yml` and have it updated in preview mode - so I don't always have to push to github and refresh

lightdash defines its own metrics - via the `schema.yml` file in the dbt project - these are other metas, like dimensions. The other way to add metrics is via dbt's semantic layer (availabe in dbt cloud).
- This is what we'd be replacing with our own metrics layer
- This is done using the `meta` tag in the `schema.yml` file that dbt uses.
    - this kinda sucks - it's mixing sql w/ yaml


MetricFlow
`dbt-metricflow`

Changes:

@@ -1,5 +1,8 @@
 Setting up dbt
 
+
+
+
 - used dbt-core, not dbt-cloud
 	- warehouse (bigquery in this case) - structured-app-test
 	- github repo - shivam-singhal/dbt-tutorial

All filepaths in the repo:
README.md
jaffle_shop/.gitignore
jaffle_shop/README.md
jaffle_shop/analyses/.gitkeep
jaffle_shop/dbt_project.yml
jaffle_shop/macros/.gitkeep
jaffle_shop/models/customers.sql
jaffle_shop/models/schema.yml
jaffle_shop/models/staging/stg_customers.sql
jaffle_shop/models/staging/stg_orders.sql
jaffle_shop/seeds/.gitkeep
jaffle_shop/snapshots/.gitkeep
jaffle_shop/tests/.gitkeep
logs/dbt.log
notes.md

@structured-bot-beta
Copy link

Modularity Analysis

The changes appear minor and don't significantly impact modularity. However, the existing 'customers.sql' model could be refactored into smaller components. Consider creating intermediate models for customer_orders and final customer data to improve modularity and reusability.


Versioning Analysis

The changes in the dbt_project.yml file don't include any versioning updates. Consider adding version tags or comments to new models or significant changes to existing models to track iterations and maintain clarity between versions.


Grouping And Folder Structure Analysis

The current folder structure follows DBT best practices by separating staging models (stg_customers.sql, stg_orders.sql) from the final model (customers.sql). This organization promotes clarity and scalability. Consider creating a 'marts' folder for the final customers.sql model to further improve the structure.


Access Control Analysis

No explicit access control measures observed in the changes. Consider adding GRANT statements or role-based access controls to the customer and order models, especially for sensitive fields like customer_id and order details.


Naming Conventions Analysis

The naming conventions appear to be consistent across the project, following snake_case for model and column names. Staging models use the 'stg_' prefix, which is a good practice. No significant naming issues were detected in the changes.


Testing Coverage Analysis

The stg_orders model lacks tests for critical fields like customer_id and order_date. Consider adding not_null tests for these fields. Additionally, the customers model could benefit from a unique test on the customer_id field to ensure data integrity.


Config Best Practices Analysis

The current configuration uses table materialization for all models in jaffle_shop. Consider using view materialization for staging models (stg_customers, stg_orders) to improve performance and resource usage. For the customers model, evaluate if incremental materialization would be beneficial based on data volume and update frequency.


Sql Performance And Efficiency Analysis

The use of SELECT * in the customers CTE of the customers.sql model can negatively impact query performance. Consider explicitly listing the required columns instead. Additionally, ensure that appropriate indexes are in place on the customer_id column in both customers and orders tables to optimize join operations.


Avoiding Anti Patterns In S Q L Analysis

The customers.sql model uses SELECT * in its CTEs, which can impact performance and readability. Consider explicitly selecting only the required columns. Also, ensure that the final SELECT * is necessary; if not, replace it with specific column selection.


Adherence To Data Contracts Analysis

The removal of customers.last_name in the customers.sql model may break existing data contracts. This change could impact downstream dependencies that rely on this field. Please ensure all stakeholders are notified and update any affected data contracts accordingly.


Data Lineage Tracking Analysis

The changes in the project structure don't directly impact data lineage. However, it's recommended to add lineage documentation for key metrics in models like 'customers' and 'stg_orders', especially for fields like 'number_of_orders' and 'total_order_count' to trace their origin and transformations.


Handling Nulls And Defaults Analysis

The changes don't directly affect null handling, but it's worth noting that the customers.sql model uses COALESCE for number_of_orders, which is a good practice. Consider applying similar null handling to other fields like first_order_date and most_recent_order_date to ensure consistency across the model.


Jinja And Macro Reusability Analysis

The changes appear to be minor spacing updates in dbt_project.yml and notes.md. There are no significant changes to SQL logic or Jinja templates. No opportunities for macro creation or template reuse were identified in this update.


Managing Data Freshness And Validity Analysis

The changes do not address data freshness and validity checks. Consider adding freshness configurations to source models, especially for time-sensitive data like orders. Implement appropriate validity windows for critical data sources to ensure data accuracy and timeliness.


Incremental Model Optimization Analysis

The changes do not appear to directly affect incremental model optimization. However, it's worth noting that none of the models shown (customers, stg_customers, stg_orders) are configured as incremental. Consider implementing incremental models with appropriate filters for large datasets to improve performance.


Dependency Management Analysis

The changes made to the project structure and configuration files are minimal and do not affect the dependency management. The existing models (customers.sql, stg_customers.sql, and stg_orders.sql) already use the ref() function appropriately, which is a good practice for managing dependencies in DBT.


Documentation And Descriptions Analysis

The changes to dbt_project.yml and notes.md appear to be minor formatting adjustments. However, the models (customers.sql, schema.yml, stg_customers.sql, stg_orders.sql) lack comprehensive documentation. Consider adding descriptions for models and key fields to improve clarity and maintainability.


Semantic Layer Consistency Analysis

The total_order_count metric is defined in the stg_orders model using the count type. It's important to ensure this definition is consistent with other models that may use a similar metric. Consider standardizing the metric name and definition across all relevant models for consistency in the semantic layer.

@structured-bot-beta
Copy link

Thanks for opening this PR!

Total commits: 2
Files changed: 2
Additions: 6
Deletions: 0

Commits:
e0b3ddc: Update dbt_project.yml
0c1ec81: Update notes.md

Changes:
File: jaffle_shop/dbt_project.yml

Original Content:


# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'jaffle_shop'
version: '1.0.0'

# This setting configures which "profile" dbt uses for this project.
profile: 'jaffle_shop'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  jaffle_shop:
    # Config indicated by + and applies to all files under models/example/
    +materialized: table

Changes:

@@ -14,6 +14,9 @@ profile: 'jaffle_shop'
 model-paths: ["models"]
 analysis-paths: ["analyses"]
 test-paths: ["tests"]
+
+
+
 seed-paths: ["seeds"]
 macro-paths: ["macros"]
 snapshot-paths: ["snapshots"]

File: notes.md

Original Content:

Setting up dbt

- used dbt-core, not dbt-cloud
	- warehouse (bigquery in this case) - structured-app-test
	- github repo - shivam-singhal/dbt-tutorial
		- ran `dbt init jaffle_shop` - this create a `jaffle_shop` and `logs` dir in `dbt-tutorial`
- dbt has models (which are select sql statements)
- These models can be composed of other models - hence a DAG structure. Each node can be run independently (given its dependencies are run too)
- Testing is built-in
- Version control is "built-in" via storing dbt configs in git (Github)
- commands
	- `dbt run` - run the sql queries against the data in the warehouse
		- `dbt run --full-refresh`
		- `dbt run --select <>` - to only run (or test specific models)
	- `dbt test` - validate that data has certain properties (e.g. non-null, unique, consists of certain values, etc.)
	- `dbt debug` - test .dbt/profiles.yml configuration (where bigquery connection information is stored)

What's missing are **metrics**. Lightdash takes the dbt models, and each column of the dbt model becomes a dimension of the table. 

How are Lightdash tables different from dbt models? is it 1:1?

`docker-compose -f docker-compose.yml --env-file .env up --detach --remove-orphans` to run lightdash locally from the repo

`docker exec -it <container_id> psql -U postgres -d postgres` to run psql from inside the postgres container on my local machine to inspect the postgres table

`lightdash preview` allows me to update `schema.yml` and have it updated in preview mode - so I don't always have to push to github and refresh

lightdash defines its own metrics - via the `schema.yml` file in the dbt project - these are other metas, like dimensions. The other way to add metrics is via dbt's semantic layer (availabe in dbt cloud).
- This is what we'd be replacing with our own metrics layer
- This is done using the `meta` tag in the `schema.yml` file that dbt uses.
    - this kinda sucks - it's mixing sql w/ yaml


MetricFlow
`dbt-metricflow`

Changes:

@@ -1,5 +1,8 @@
 Setting up dbt
 
+
+
+
 - used dbt-core, not dbt-cloud
 	- warehouse (bigquery in this case) - structured-app-test
 	- github repo - shivam-singhal/dbt-tutorial

All filepaths in the repo:
README.md
jaffle_shop/.gitignore
jaffle_shop/README.md
jaffle_shop/analyses/.gitkeep
jaffle_shop/dbt_project.yml
jaffle_shop/macros/.gitkeep
jaffle_shop/models/customers.sql
jaffle_shop/models/schema.yml
jaffle_shop/models/staging/stg_customers.sql
jaffle_shop/models/staging/stg_orders.sql
jaffle_shop/seeds/.gitkeep
jaffle_shop/snapshots/.gitkeep
jaffle_shop/tests/.gitkeep
logs/dbt.log
notes.md

@structured-bot-beta
Copy link

Modularity Analysis

While the changes themselves don't directly impact modularity, the existing models (customers.sql and stg_orders.sql) could be improved. Consider breaking down the 'customers' model into smaller, reusable components. For example, create separate models for customer details and order summaries, then join them in the final model.


Versioning Analysis

The changes in this PR do not include explicit versioning updates. Consider adding version tags or comments to the modified files, especially in the dbt_project.yml and schema.yml, to track iterations and changes over time.


Grouping And Folder Structure Analysis

The current folder structure follows DBT best practices by separating staging models (stg_customers.sql, stg_orders.sql) from the final model (customers.sql). This organization promotes clarity and scalability. Consider creating a 'marts' directory for the customers.sql model to further improve the structure.


Access Control Analysis

No significant access control changes detected in this update. However, it's important to ensure that sensitive customer data in the 'customers' and 'stg_customers' models have appropriate access controls. Consider adding GRANT statements or role-based restrictions to protect customer information.


Naming Conventions Analysis

The naming conventions are generally consistent, using snake_case for model and column names. The 'stg_' prefix is appropriately used for staging models. However, consider adding a prefix (e.g., 'int_' or 'fct_') to the 'customers' model to indicate its role in the project structure.


Testing Coverage Analysis

The customer_id field in stg_customers.sql and stg_orders.sql lacks not_null and unique tests. Adding these tests to both models will enhance data quality assurance. Additionally, consider adding a test for the status field in stg_orders.sql to ensure it only contains valid values.


Config Best Practices Analysis

The current configuration uses table materialization for all models in the jaffle_shop project. Consider using view materialization for staging models (stg_customers, stg_orders) and evaluate if incremental materialization is suitable for the customers model to improve performance.


Sql Performance And Efficiency Analysis

The use of SELECT * in the customers CTE in customers.sql can potentially impact query performance. Consider explicitly listing required columns instead. Additionally, ensure that appropriate indexes are in place on join columns (customer_id) to optimize the left join operation.


Avoiding Anti Patterns In S Q L Analysis

The query in customers.sql uses SELECT * in the CTE, which can impact performance and readability. Consider explicitly selecting only the required columns. Also, the final SELECT * from final could be optimized by listing specific columns needed.


Adherence To Data Contracts Analysis

The removal of customers.last_name field in customers.sql may break existing data contracts. Ensure this change is communicated to downstream teams and update any affected data contracts. Consider the impact on data integrity and existing queries that may rely on this field.


Data Lineage Tracking Analysis

The changes in jaffle_shop/dbt_project.yml and notes.md are minor and do not impact data lineage. However, it's important to ensure that any future changes to models or transformations, especially in customers.sql and stg_orders.sql, include proper documentation of data lineage for key metrics like total_order_count.


Handling Nulls And Defaults Analysis

The changes do not directly impact null handling or default values. However, in the customers.sql file, the COALESCE function is used to handle potential null values for number_of_orders, which is a good practice. Consider applying similar null handling to other fields if necessary.


Jinja And Macro Reusability Analysis

The changes do not introduce any new Jinja templates or DBT macros. However, it's worth noting that the existing models (customers.sql, stg_customers.sql, stg_orders.sql) could benefit from using Jinja templates or macros to centralize common logic, such as date formatting or status categorization, if these patterns are repeated across models.


Managing Data Freshness And Validity Analysis

The current changes do not address data freshness and validity checks. Consider adding freshness configurations to the stg_customers and stg_orders models in the schema.yml file. For example, you could add a 'freshness' block under each model to ensure data is updated within an appropriate timeframe.


Incremental Model Optimization Analysis

The changes do not appear to introduce or modify any incremental models. However, it's worth noting that the existing models (customers, stg_customers, stg_orders) are materialized as views, which are always rebuilt and do not require incremental logic.


Dependency Management Analysis

The changes appear to be minor and don't affect the dependency management. However, it's worth noting that the existing models (stg_customers.sql and stg_orders.sql) are using hardcoded table names. Consider using the {{ source() }} function to reference source tables for better dependency management.


Documentation And Descriptions Analysis

The changes to dbt_project.yml and notes.md are minor and don't affect documentation. However, the existing models (customers, stg_customers, stg_orders) lack comprehensive descriptions for some columns. Consider adding detailed descriptions for all columns to improve code clarity and maintainability.


Semantic Layer Consistency Analysis

The metric 'total_order_count' is defined in stg_orders.sql using the 'count' type. However, it's important to ensure this definition aligns with any existing global total_order_count metric across other models. Consider standardizing this metric definition in the semantic layer for consistency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant