Closed
Description
This will be necessary to handle the additional data load from DSEW-CPR and several other planned indicator additions. These changes complete the scalability requirements of the data scaling PRD, but not the expressiveness requirements -- we’ll layer on expressiveness later. This release will include only the following changes:
- split off signal and geo dimension tables
- split off a latest-only table
- create views which query like the existing table (minimizing necessary API server changes)
Four parts:
- ddl
- acquisition
- API server
- Stress test
1: ddl - @jgreene1959
- Changes to covidcast.sql
- 0.3 -> 0.4 migration -- unlike previous migrations this will actually involve substantial data mutation, which Joe has a draft of in Python
2: acquisition - @jgreene1959 + @melange396 to pair
- convert csv_import and database.py to do initial import into a load table instead of the main fact tables
- add a new acquisition phase "loader" which does the dimension table updates, puts new rows from the load table into the latest and history fact tables, and records some stats in a job log and meta data etc
- new loader unit tests
3: api server - @melange396
- remove
USE INDEX
- point latest queries to latest view
- point as-of, issue, and lag queries to history view
- possibly reformulate metadata query? address this after acquisition is settled
- update api server unit tests
4: stress test in qa environment with replication
- a day's worth of CSV imports - Katie to collect
- playback a day's worth of traffic - George to collect
- a batch issue upload - Katie to collect
- metadata - we get this for free in Loader, but we should time it anyway
Metadata
Metadata
Assignees
Labels
No labels