Skip to content

V4 schema revisions candidate #903

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

melange396
Copy link
Collaborator

addresses #805

summary:

  • removed notion of wip signals
  • added/updated to new schema w/ history, latest, and [non-temporary] load tables, as well as views for simplifying some operations
  • changes to insert_or_update_batch in csv loading / acquisition code can be summarized as:
    • changed from temporary load table to permanent one, and from covidcast table to latest and history views.
    • no longer creating/truncating/dropping load table(s), but checking table is empty at start for sanity.
    • altered behavior for is_latest_issue processing during load to accommodate new schemata.
  • acquisition now needs dbjobs stuff to run before newly ingested data is available for public consumption
  • API server no longer guesses at indexes to use, and uses historical or latest data views depending on query
  • metadata computation also no longer uses index
  • widened git-ignore filespecs
  • made tests compatible, simplified some where applicable, removed or neutered some others that were obsolete

change details

  • the core of the changes are in 3 main files:
    • src/ddl/v4_schema.sql <--- new schema definitions
    • src/acquisition/covidcast/database.py <--- differentiations on when/where to use the history or the latest table
    • src/server/endpoints/covidcast.py <--- changes to insert into new schema and SQL (also some wip removals)
  • then some helpery or cleanup stuff in:
    • src/server/_query.py <--- small addition of QueryBuilder.retable() method
    • src/acquisition/covidcast/dbjobs_runner.py <--- just some boilerplate for making it runnable
    • src/ddl/v4_schema_aliases.sql <--- just some cross-schema aliasing
    • src/ddl/covidcast.sql <--- removed
    • src/acquisition/covidcast/data_dir_readme.md <--- just removal of wip mention in documentation
    • src/acquisition/covidcast/csv_to_database.py <--- just wip removals

This PR includes the sum of changes from https://github.com/cmu-delphi/delphi-epidata/compare/7fab1fb..e0fb58b
(I didn't want to lose the commit history from the branch joe/v4-schema-revisions, but also wanted to split the full changeset into a few smaller pieces for easier review.

still TODOs:

  • look for TODOs in the code just added and evaluate priorities thereof
  • test in integrations/acquisition/covidcast/test_delete_batch.py does not currently pass, but I will fix it as soon as I can have a quick discussion with Katie
  • verify SQL schema matches in prod/qa vs git
  • work on proposed new schema changes for merged-key dimension table

@melange396 melange396 added the v4 Big covidcast schema redesign label May 3, 2022
@melange396 melange396 requested a review from xavier-xia-99 May 3, 2022 19:33
@krivard
Copy link
Contributor

krivard commented May 6, 2022

Failing test is the delete-batch from dev, which was left out of the initial testfix push because the v4 dev branch wasn't kept up-to-date during development. Rough timeline was:

  • In January, joe/v4-schema-revisions was branched from dev. 61 commits added for v4 functionality.
  • From January-March, 126 commits were added to dev, including the delete-batch feature.
  • In March, v4-schema-revisions-release-prep was branched from dev. 2 commits added for v4 builds in CI.
  • From March-May, 49 commits were added to dev, mostly API docs fixes.
  • In May, v4-schema-revisions-release-prep-prep was branched from joe/v4-schema-revisions. The 61 commits for v4 functionality were organized into 4 thematic commits and rebased onto v4-schema-revisions-release-prep for this PR.

You can view detailed aheads and behinds using a branches search, which will also link to all/any open PRs for details on what commits were added. There is also a compare view showing details on what was added to dev since release-prep was branched.

a few notes:
composite key definitions could be made part of Database class.
i used the variable name "ell" because the single lowercase letter "l" looks ambiguous.
could do away with update_latest column and just use "WHERE d.delete_latest_id<>NULL" instead of "WHERE d.update_latest=1".
could do away with delete_latest_id column if we assume signal_data_id's are synced properly between latest+history tables.
@melange396 melange396 removed the request for review from jgreene1959 May 11, 2022 22:12
@chinandrew
Copy link
Contributor

I haven't kept up with this work and don't have bandwidth to cover every line, so only looked at this superficially. Main thing that stands out to me is the high volume of TODOs that should be tracked somewhere (or resolved immediately if possible, like guess_index_to_use() - we always have commit history if we need it back).

Copy link
Contributor

@krivard krivard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bones are good, needs some paint

Also found a few items we should discuss together to figure out what to do about them

@dshemetov
Copy link
Contributor

dshemetov commented Jun 1, 2022

(VSCode took my inline comments fine, but posted my overall comment to a different repo 🙄)

overall looks good to me, mostly adding comments about minor knits, possible refactors, and a few cautions (biggest one is making sure the load table is empty before deleting). seconding Andrew's comment about a lot of TODOs factoring to issues.

also I made a PR into this branch that removes some other wip stuff i saw #917

krivard added 3 commits June 15, 2022 11:10

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Copy link
Contributor

@krivard krivard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@krivard krivard merged commit 83b6988 into v4-schema-revisions-release-prep Jun 15, 2022
@krivard krivard deleted the v4-schema-revisions-release-prep-prep branch June 15, 2022 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v4 Big covidcast schema redesign
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants