Skip to content

v4 changes #805

Closed
Closed
@krivard

Description

@krivard

This will be necessary to handle the additional data load from DSEW-CPR and several other planned indicator additions. These changes complete the scalability requirements of the data scaling PRD, but not the expressiveness requirements -- we’ll layer on expressiveness later. This release will include only the following changes:

  • split off signal and geo dimension tables
  • split off a latest-only table
  • create views which query like the existing table (minimizing necessary API server changes)

Four parts:

  1. ddl
  2. acquisition
  3. API server
  4. Stress test

1: ddl - @jgreene1959

  • Changes to covidcast.sql
  • 0.3 -> 0.4 migration -- unlike previous migrations this will actually involve substantial data mutation, which Joe has a draft of in Python

2: acquisition - @jgreene1959 + @melange396 to pair

  • convert csv_import and database.py to do initial import into a load table instead of the main fact tables
  • add a new acquisition phase "loader" which does the dimension table updates, puts new rows from the load table into the latest and history fact tables, and records some stats in a job log and meta data etc
  • new loader unit tests

3: api server - @melange396

  • remove USE INDEX
  • point latest queries to latest view
  • point as-of, issue, and lag queries to history view
  • possibly reformulate metadata query? address this after acquisition is settled
  • update api server unit tests

4: stress test in qa environment with replication

  • a day's worth of CSV imports - Katie to collect
  • playback a day's worth of traffic - George to collect
  • a batch issue upload - Katie to collect
  • metadata - we get this for free in Loader, but we should time it anyway

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions