diff --git a/docs/covidcast_examples.rst b/docs/covidcast_examples.rst new file mode 100644 index 0000000..5d46838 --- /dev/null +++ b/docs/covidcast_examples.rst @@ -0,0 +1,62 @@ +Basic examples +-------------- + +To obtain all available sources of epidemiological data, we can use the following command: + +>>> from delphi_epidata.request import CovidcastEpidata, EpiRange +>>> epidata = CovidcastEpidata() +>>> print(list(epidata.source_names)) +['chng-cli', 'chng-covid', 'covid-act-now', 'doctor-visits', 'fb-survey', 'google-symptoms', 'hhs', 'hospital-admissions', 'indicator-combination-cases-deaths', 'jhu-csse', 'quidel-covid-ag', 'safegraph-weekly', 'usa-facts', 'ght', 'google-survey', 'indicator-combination-nmf', 'quidel-flu', 'safegraph-daily', 'nchs-mortality'] + + +To obtain smoothed estimates of COVID-like illness from our symptom survey, +distributed through Facebook (`fb-survey`), for every county in the United States between +2020-05-01 and 2020-05-07: + +>>> from delphi_epidata.request import EpiRange +>>> apicall = epidata[("fb-survey", "smoothed_cli")].call( +... 'county', "*", EpiRange(20200501, 20200507), +... ) +EpiDataCall(endpoint=covidcast, params={'data_source': 'fb-survey', 'signals': 'smoothed_cli', 'time_type': 'day', 'time_values': '20200501-20200507', 'geo_type': 'county', 'geo_values': '*'}) +>>> data = apicall.df() +>>> data.head() + source signal geo_type geo_value time_type time_value issue lag value stderr sample_size direction missing_value missing_stderr missing_sample_size +0 fb-survey smoothed_cli county 01000 day 2020-05-01 2020-09-03 125 0.825410 0.136003 1722 NaN 0 0 0 +1 fb-survey smoothed_cli county 01001 day 2020-05-01 2020-09-03 125 1.299425 0.967136 115 NaN 0 0 0 +2 fb-survey smoothed_cli county 01003 day 2020-05-01 2020-09-03 125 0.696597 0.324753 584 NaN 0 0 0 +3 fb-survey smoothed_cli county 01015 day 2020-05-01 2020-09-03 125 0.428271 0.548566 122 NaN 0 0 0 +4 fb-survey smoothed_cli county 01031 day 2020-05-01 2020-09-03 125 0.025579 0.360827 114 NaN 0 0 0 + + +Each row represents one observation in one county on one day. The county FIPS +code is given in the ``geo_value`` column, the date in the ``time_value`` +column. Here ``value`` is the requested signal---in this case, the smoothed +estimate of the percentage of people with COVID-like illness, based on the +symptom surveys. ``stderr`` is its standard error. The ``issue`` column +indicates when this data was reported; in this case, the survey estimates for +May 1st were updated on September 3rd based on new data, giving a ``lag`` of 125 days. +See the `Delphi Epidata API `_ documentation for details on all fields of the returned data frame. + +The API documentation lists each available signal and provides technical details +on how it is estimated and how its standard error is calculated. In this case, +for example, the `symptom surveys documentation page +`_ +explains the definition of "COVID-like illness", links to the exact survey text, +and describes the mathematical derivation of the estimates. + +Using the ``geo_values`` argument, we can request data for a specific geography, +such as the state of Pennsylvania for the month of September 2021: + +>>> pa_data = epidata[("fb-survey", "smoothed_cli")].call( +... 'state', "pa", EpiRange(20210901, 20210930) +... ).df() +>>> pa_data.head() + source signal geo_type geo_value time_type time_value issue lag value stderr sample_size direction missing_value missing_stderr missing_sample_size +0 fb-survey smoothed_cli state pa day 2021-09-01 2021-09-06 5 0.928210 0.088187 9390 NaN 0 0 0 +1 fb-survey smoothed_cli state pa day 2021-09-02 2021-09-07 5 0.894603 0.087308 9275 NaN 0 0 0 +2 fb-survey smoothed_cli state pa day 2021-09-03 2021-09-08 5 0.922847 0.088324 9179 NaN 0 0 0 +3 fb-survey smoothed_cli state pa day 2021-09-04 2021-09-09 5 0.984799 0.092566 9069 NaN 0 0 0 +4 fb-survey smoothed_cli state pa day 2021-09-05 2021-09-10 5 1.010306 0.093357 9016 NaN 0 0 0 + +We can request multiple states by providing a list, such as ``["pa", "ny", +"mo"]``. \ No newline at end of file diff --git a/docs/getting_started.rst b/docs/getting_started.rst index bc66233..ab6b099 100644 --- a/docs/getting_started.rst +++ b/docs/getting_started.rst @@ -1,75 +1,303 @@ -.. _getting-started: - Getting Started =============== Overview -------------- -Data Sources +This package provides access to data from various Epidata API endpoints including COVIDcast, +which provides numerous COVID-related data streams, updated daily. + +.. _epidata-endpoints: + +Epidata Data Sources -------------- +The parameters available for each source data are documented in each linked source-specific API page. +| +**COVID-19 Data** +.. list-table:: + :widths: 20 20 40 + :header-rows: 1 -Basic examples --------------- + * - Endpoint + - Name + - Description + * - `covidcast `_ + - COVIDcast + - Delphi’s COVID-19 surveillance streams. + * - `covidcast_meta `_ + - COVIDcast metadata + - Metadata for Delphi's COVID-19 surveillance streams. + * - `covid_hosp_facility `_ + - COVID-19 Hospitalization by Facility + - COVID-19 Reported Patient Impact and Hospital Capacity - Facility Lookup + * - `covid_hosp `_ + - COVID-19 Hospitalization + - COVID-19 Reported Patient Impact and Hospital Capacity. + +| +**Influenza Data** + +.. list-table:: + :widths: 20 20 40 + :header-rows: 1 + + * - Endpoint + - Name + - Description + * - `afhsb `_ + - AFHSB + - ... + * - `meta_afhsb `_ + - AFHSB Metadata + - ... + * - `cdc `_ + - CDC Page Hits + - ... + * - `delphi `_ + - Delphi’s Forecast + - ... + * - `ecdc_ili `_ + - ECDC ILI + - ECDC ILI data from the ECDC website. + * - `flusurv `_ + - FluSurv + - FluSurv-NET data (flu hospitaliation rates) from CDC. + * - `fluview `_ + - FluView + - Influenza-like illness (ILI) from U.S. Outpatient Influenza-like Illness Surveillance Network (ILINet). + * - `fluview_meta `_ + - FluView Metadata + - Summary data about ``fluview``. + * - `fluview_clinical `_ + - FluView Clinical + - ... + * - `gft `_ + - Google Flu Trends + - Estimate of influenza activity based on volume of certain search queries. This is now a static endpoint due to discontinuation. + * - `ght `_ + - Google Health Trends + - Estimate of influenza activity based on volume of certain search queries. + * - `kcdc_ili `_ + - KCDC ILI + - KCDC ILI data from KCDC website. + * - `meta `_ + - API Metadata + - Metadata for ``fluview``, ``twitter``, ``wiki``, and ``delphi``. + * - `nidss_flu `_ + - NIDSS Flu + - Outpatient ILI from Taiwan's National Infectious Disease Statistics System (NIDSS). + * - `nowcast `_ + - ILI Nearby + - A nowcast of U.S. national, regional, and state-level (weighted) percent ILI, available seven days (regionally) or five days (state-level) before the first ILINet report for the corresponding week. + * - `quidel `_ + - Quidel + - Data provided by Quidel Corp., which contains flu lab test results. + * - `sensors `_ + - Delphi's Digital Surveillance Sensors + - ... + * - `twitter `_ + - Twitter Stream + - Estimate of influenza activity based on analysis of language used in tweets from HealthTweets. + * - `wiki `_ + - Wikipedia Access Logs + - Number of page visits for selected English, Influenza-related wikipedia articles. +| + +**Dengue Data** + +.. list-table:: + :widths: 20 20 40 + :header-rows: 1 + + * - Endpoint + - Name + - Description + * - `dengue_nowcast `_ + - Delphi's Dengue Nowcast + - ... + * - `dengue_sensors `_ + - Delphi’s Dengue Digital Surveillance Sensors + - ... + * - `nidss_dengue `_ + - NIDSS Dengue + - Counts of confirmed dengue cases from Taiwan's NIDSS. + * - `paho_dengue `_ + - PAHO Dengue + - ... +| + +**Norovirus Data** + +.. list-table:: + :widths: 20 20 40 + :header-rows: 1 + + * - Endpoint + - Name + - Description + * - `meta_norostat `_ + - NoroSTAT Metadata + - ... + * - `norostat `_ + - NoroSTAT + - Suspected and confirmed norovirus outbreaks reported by state health departments to the CDC. + +| + +Epiweeks and Dates +------------------ +Epiweeks use the U.S. definition. That is, the first epiweek each year is the week, starting on a Sunday, +containing January 4. See `this page `_ for more information. + +Formatting for epiweeks is YYYYWW and for dates is YYYYMMDD. + +Use individual values, comma-separated lists or, a hyphenated range of values to specify single or several dates. +An ``EpiRange`` object can be also used to construct a range of epiweeks or dates. Examples include: + +- ``param = 201530`` (A single epiweek) +- ``param = '201401,201501,201601'`` (Several epiweeks) +- ``param = '200501-200552'`` (A range of epiweeks) +- ``param = '201440,201501-201510'`` (Several epiweeks, including a range) +- ``param = EpiRange(20070101, 20071231)`` (A range of dates) -To obtain all available sources of epidemiological data, we can use the following command: +| ->>> from epidatpy.request import CovidcastEpidata, EpiRange ->>> epidata = CovidcastEpidata() ->>> print(list(epidata.source_names)) -['chng-cli', 'chng-covid', 'covid-act-now', 'doctor-visits', 'fb-survey', 'google-symptoms', 'hhs', 'hospital-admissions', 'indicator-combination-cases-deaths', 'jhu-csse', 'quidel-covid-ag', 'safegraph-weekly', 'usa-facts', 'ght', 'google-survey', 'indicator-combination-nmf', 'quidel-flu', 'safegraph-daily', 'nchs-mortality'] +.. _getting-started: +Basic examples +-------------- + +**COVIDcast** To obtain smoothed estimates of COVID-like illness from our symptom survey, -distributed through Facebook (`fb-survey`), for every county in the United States between +distributed through Facebook, for every county in the United States between 2020-05-01 and 2020-05-07: ->>> from epidatpy.request import EpiRange ->>> apicall = epidata[("fb-survey", "smoothed_cli")].call( -... 'county', "*", EpiRange(20200501, 20200507), -... ) -EpiDataCall(endpoint=covidcast, params={'data_source': 'fb-survey', 'signals': 'smoothed_cli', 'time_type': 'day', 'time_values': '20200501-20200507', 'geo_type': 'county', 'geo_values': '*'}) +>>> from epidatpy.request import Epidata, EpiRange +>>> apicall = Epidata.covidcast("fb-survey", "smoothed_cli", +... "day", "county", +... EpiRange(20200501, 20200507), "*") >>> data = apicall.df() >>> data.head() - source signal geo_type geo_value time_type time_value issue lag value stderr sample_size direction missing_value missing_stderr missing_sample_size -0 fb-survey smoothed_cli county 01000 day 2020-05-01 2020-09-03 125 0.825410 0.136003 1722 NaN 0 0 0 -1 fb-survey smoothed_cli county 01001 day 2020-05-01 2020-09-03 125 1.299425 0.967136 115 NaN 0 0 0 -2 fb-survey smoothed_cli county 01003 day 2020-05-01 2020-09-03 125 0.696597 0.324753 584 NaN 0 0 0 -3 fb-survey smoothed_cli county 01015 day 2020-05-01 2020-09-03 125 0.428271 0.548566 122 NaN 0 0 0 -4 fb-survey smoothed_cli county 01031 day 2020-05-01 2020-09-03 125 0.025579 0.360827 114 NaN 0 0 0 - - -Each row represents one observation in one county on one day. The county FIPS -code is given in the ``geo_value`` column, the date in the ``time_value`` -column. Here ``value`` is the requested signal---in this case, the smoothed -estimate of the percentage of people with COVID-like illness, based on the -symptom surveys. ``stderr`` is its standard error. The ``issue`` column -indicates when this data was reported; in this case, the survey estimates for + source signal geo_type geo_value time_type time_value issue lag value stderr sample_size direction missing_value missing_stderr missing_sample_size +0 fb-survey smoothed_cli county 01000 day 2020-05-01 2020-09-03 125 0.825410 0.136003 1722 None 0 0 0 +1 fb-survey smoothed_cli county 01001 day 2020-05-01 2020-09-03 125 1.299425 0.967136 115 None 0 0 0 +2 fb-survey smoothed_cli county 01003 day 2020-05-01 2020-09-03 125 0.696597 0.324753 584 None 0 0 0 +3 fb-survey smoothed_cli county 01015 day 2020-05-01 2020-09-03 125 0.428271 0.548566 122 None 0 0 0 +4 fb-survey smoothed_cli county 01031 day 2020-05-01 2020-09-03 125 0.025579 0.360827 114 None 0 0 0 + +Each row represents one observation in one county per day. The county FIPS +code is given in the ``geo_value`` column, and the date is given in the ``time_value`` +column. The ``value`` is the requested signal - the smoothed +estimate of the percentage of people with COVID-like illness based on the +symptom surveys. The ``issue`` column indicates when this data was reported; in this case, the survey estimates for May 1st were updated on September 3rd based on new data, giving a ``lag`` of 125 days. -See the `Delphi Epidata API `_ documentation for details on all fields of the returned data frame. - -The API documentation lists each available signal and provides technical details -on how it is estimated and how its standard error is calculated. In this case, -for example, the `symptom surveys documentation page -`_ -explains the definition of "COVID-like illness", links to the exact survey text, -and describes the mathematical derivation of the estimates. - -Using the ``geo_values`` argument, we can request data for a specific geography, -such as the state of Pennsylvania for the month of September 2021: - ->>> pa_data = epidata[("fb-survey", "smoothed_cli")].call( -... 'state', "pa", EpiRange(20210901, 20210930) -... ).df() ->>> pa_data.head() - source signal geo_type geo_value time_type time_value issue lag value stderr sample_size direction missing_value missing_stderr missing_sample_size -0 fb-survey smoothed_cli state pa day 2021-09-01 2021-09-06 5 0.928210 0.088187 9390 NaN 0 0 0 -1 fb-survey smoothed_cli state pa day 2021-09-02 2021-09-07 5 0.894603 0.087308 9275 NaN 0 0 0 -2 fb-survey smoothed_cli state pa day 2021-09-03 2021-09-08 5 0.922847 0.088324 9179 NaN 0 0 0 -3 fb-survey smoothed_cli state pa day 2021-09-04 2021-09-09 5 0.984799 0.092566 9069 NaN 0 0 0 -4 fb-survey smoothed_cli state pa day 2021-09-05 2021-09-10 5 1.010306 0.093357 9016 NaN 0 0 0 - -We can request multiple states by providing a list, such as ``["pa", "ny", -"mo"]``. \ No newline at end of file +See the :py:func:`epidatpy.request.Epidata.covidcast` documentation for further details on the returned +columns. + +In the above code, the ``.df()`` function on the ``apicall`` variable generated a Pandas DataFrame. We can use +other :ref:`output functions ` to parse the requested API call in different formats. To parse the data +into JSON format, we can use the following command: + +>>> data = apicall.json() +>>> data +[{'geo_value': '01000', + 'signal': 'smoothed_cli', + 'source': 'fb-survey', + 'geo_type': 'county', + 'time_type': 'day', + 'time_value': datetime.date(2020, 5, 1), + 'direction': None, + 'issue': datetime.date(2020, 9, 3), + 'lag': 125, + 'missing_value': 0, + 'missing_stderr': 0, + 'missing_sample_size': 0, + 'value': 0.8254101, + 'stderr': 0.1360033, + 'sample_size': 1722.4551}, + {'geo_value': '01001', + 'signal': 'smoothed_cli', + 'source': 'fb-survey', + 'geo_type': 'county', + 'time_type': 'day', + 'time_value': datetime.date(2020, 5, 1), + 'direction': None, + 'issue': datetime.date(2020, 9, 3), + 'lag': 125, + 'missing_value': 0, + 'missing_stderr': 0, + 'missing_sample_size': 0, + 'value': 1.2994255, + 'stderr': 0.9671356, + 'sample_size': 115.8025}, + . + . + . + }] + +Note that all of the :ref:`output functions ` have a ``field`` parameter which takes in any form of iterator objects +to enable fetching the data with customization (e.g. specifying which fields or columns to output). Similar to the previous example, +to parse the data in JSON format, but customize the field to show only ``geo_value`` and ``value``, we would use the following +command: + +>>> data = apicall.json(fields = ['geo_value', 'value']) +>>> data +[{'geo_value': '01000', 'value': 0.8254101}, + {'geo_value': '01001', 'value': 1.2994255}, + {'geo_value': '01003', 'value': 0.6965968}, + {'geo_value': '01015', 'value': 0.4282713}, + {'geo_value': '01031', 'value': 0.0255788}, + {'geo_value': '01045', 'value': 1.0495589}, + {'geo_value': '01051', 'value': 1.5783991}, + {'geo_value': '01069', 'value': 1.6789546}, + {'geo_value': '01071', 'value': 2.1313118}, + . + . + . + }] + + +| + +**Wikipedia Access article "influenza" on 2020w01** + +>>> apicall_wiki = Epidata.wiki(articles='influenza', epiweeks='202001') +>>> data = apicall_wiki.json() +>>> print(data) +[{'article': 'influenza', 'count': 6516, 'total': 663604044, 'hour': -1, 'epiweek': datetime.date(2019, 12, 29), 'value': 9.81910834}] + +| + +**FluView on 2019w01 (national)** + +>>> apicall_fluview = Epidata.fluview(regions='nat', epiweeks='201901') +>>> data = apicall_fluview.classic() +>>> data +{'epidata': [{'release_date': '2020-10-02', + 'region': 'nat', + 'issue': datetime.date(2020, 3, 9), + 'epiweek': datetime.date(2018, 12, 30), + 'lag': 90, + 'num_ili': 42135, + 'num_patients': 1160440, + 'num_providers': 2630, + 'num_age_0': 11686, + 'num_age_1': 9572, + 'num_age_2': None, + 'num_age_3': 11413, + 'num_age_4': 5204, + 'num_age_5': 4260, + 'wili': 3.45972, + 'ili': 3.63095}], + 'result': 1, + 'message': 'success'} + +| + +Other examples (TODO) +-------------- + +(TODO) \ No newline at end of file diff --git a/docs/index.rst b/docs/index.rst index 6fc903c..6fc4ee9 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -1,4 +1,4 @@ -Delphi Epi Data +Delphi Epidata =============== This package provides Python access to the `Delphi Epidata API diff --git a/docs/signals_covid.rst b/docs/signals_covid.rst index 87fef9a..2d43e9b 100644 --- a/docs/signals_covid.rst +++ b/docs/signals_covid.rst @@ -1,26 +1,107 @@ Fetching Data ============= +>>> from delphi_epidata.request import Epidata -Signals -------- +This package provides various functions that can be called on the ``Epidata`` object to obtain any :ref:`Epidata endpoint ` signals of interest. -This package provides a key function to obtain any signal of interest as a -Pandas data frame. Detailed examples are provided in the :ref:`usage examples -`. +The functions below will return an ``EpiDataCall`` object, which contains the appropriate URL +and parameters required to make an API request. The signal of interest can then be obtained in 5 different :ref:`output formats `. +Detailed examples are provided in the :ref:`usage examples `. +COVIDcast Signals +----------------- -Sometimes you would like to work with multiple signals -- for example, to obtain -several signals at every location, as part of building models of features at -each location. For convenience, the package provides a function to produce a -single data frame containing multiple signals at each location. +.. autofunction:: delphi_epidata.request.Epidata.covidcast +| +.. autofunction:: delphi_epidata.request.Epidata.covidcast_meta +| +.. autofunction:: delphi_epidata.request.Epidata.covid_hosp_facility +| +.. autofunction:: delphi_epidata.request.Epidata.covid_hosp_facility_lookup +| +.. autofunction:: delphi_epidata.request.Epidata.covid_hosp_state_timeseries +| +Other Epidata Signals +----------------- +.. autofunction:: delphi_epidata.request.Epidata.pvt_afhsb +| +.. autofunction:: delphi_epidata.request.Epidata.pvt_meta_afhsb +| +.. autofunction:: delphi_epidata.request.Epidata.cdc +| +.. autofunction:: delphi_epidata.request.Epidata.delphi +| +.. autofunction:: delphi_epidata.request.Epidata.ecdc_ili +| +.. autofunction:: delphi_epidata.request.Epidata.flusurv +| +.. autofunction:: delphi_epidata.request.Epidata.fluview +| +.. autofunction:: delphi_epidata.request.Epidata.fluview_meta +| +.. autofunction:: delphi_epidata.request.Epidata.fluview_clinical +| +.. autofunction:: delphi_epidata.request.Epidata.gft +| +.. autofunction:: delphi_epidata.request.Epidata.ght +| +.. autofunction:: delphi_epidata.request.Epidata.kcdc_ili +| +.. autofunction:: delphi_epidata.request.Epidata.meta +| +.. autofunction:: delphi_epidata.request.Epidata.nidss_flu +| +.. autofunction:: delphi_epidata.request.Epidata.nowcast +| +.. autofunction:: delphi_epidata.request.Epidata.pvt_quidel +| +.. autofunction:: delphi_epidata.request.Epidata.pvt_sensors +| +.. autofunction:: delphi_epidata.request.Epidata.pvt_twitter +| +.. autofunction:: delphi_epidata.request.Epidata.wiki +| +.. autofunction:: delphi_epidata.request.Epidata.dengue_nowcast +| +.. autofunction:: delphi_epidata.request.Epidata.pvt_dengue_sensors +| +.. autofunction:: delphi_epidata.request.Epidata.nidss_dengue +| +.. autofunction:: delphi_epidata.request.Epidata.paho_dengue +| +.. autofunction:: delphi_epidata.request.Epidata.pvt_meta_norostat +| +.. autofunction:: delphi_epidata.request.Epidata.pvt_norostat +.. _output-data: - -Metadata +Output Functions -------- +The following functions can be called on an ``EpiDataCall`` object to make an API request and parse the signal in +5 different formats: + - Classic + - JSON + - Pandas DataFrame + - CSV + - Iterator +| +.. autofunction:: delphi_epidata.request.EpiDataCall.classic +| +.. autofunction:: delphi_epidata.request.EpiDataCall.json +| +.. autofunction:: delphi_epidata.request.EpiDataCall.df +| +.. autofunction:: delphi_epidata.request.EpiDataCall.csv +| +.. autofunction:: delphi_epidata.request.EpiDataCall.iter + + +More on COVIDcast (TODO) +------------------------ + Many data sources and signals are available, so one can also obtain a data frame of all signals and their associated metadata: