|
4 | 4 | "cell_type": "markdown",
|
5 | 5 | "metadata": {},
|
6 | 6 | "source": [
|
7 |
| - "# Getting started with epidatpy\n", |
| 7 | + "# Getting started\n", |
8 | 8 | "\n",
|
9 | 9 | "The epidatpy package provides access to all the endpoints of the [Delphi Epidata\n",
|
10 | 10 | "API](https://cmu-delphi.github.io/delphi-epidata/), and can be used to make\n",
|
11 | 11 | "requests for specific signals on specific dates and in select geographic\n",
|
12 | 12 | "regions.\n",
|
13 | 13 | "\n",
|
14 |
| - "## Setup\n", |
15 |
| - "\n", |
16 |
| - "### Installation\n", |
17 |
| - "\n", |
18 |
| - "You can install the stable version of this package from PyPi:\n", |
19 |
| - "\n", |
20 |
| - "```\n", |
21 |
| - "pip install epidatpy\n", |
22 |
| - "```\n", |
23 |
| - "\n", |
24 |
| - "Or if you want the development version, install from GitHub:\n", |
25 |
| - "\n", |
26 |
| - "```\n", |
27 |
| - "pip install -e \"git+https://github.com/cmu-delphi/epidatpy.git#egg=epidatpy\"\n", |
28 |
| - "```\n", |
29 |
| - "\n", |
30 |
| - "\n", |
31 |
| - "### API keys\n", |
32 |
| - "\n", |
33 |
| - "The Delphi API requires a (free) API key for full functionality. While most\n", |
34 |
| - "endpoints are available without one, there are\n", |
35 |
| - "[limits on API usage for anonymous users](https://cmu-delphi.github.io/delphi-epidata/api/api_keys.html),\n", |
36 |
| - "including a rate limit.\n", |
37 |
| - "\n", |
38 |
| - "To generate your key,\n", |
39 |
| - "[register for a pseudo-anonymous account](https://api.delphi.cmu.edu/epidata/admin/registration_form).\n", |
40 |
| - "\n", |
41 |
| - "*Note* that private endpoints (i.e. those prefixed with `pvt_`) require a\n", |
42 |
| - "separate key that needs to be passed as an argument. These endpoints require\n", |
43 |
| - "specific data use agreements to access.\n", |
44 |
| - "\n", |
45 | 14 | "## Basic usage\n",
|
46 | 15 | "\n",
|
47 | 16 | "Fetching data from the Delphi Epidata API is simple. Suppose we are\n",
|
|
52 | 21 | "a data source name, a signal name, a geographic level, a time resolution, and\n",
|
53 | 22 | "the location and times of interest.\n",
|
54 | 23 | "\n",
|
55 |
| - "The `pub_covidcast` function lets us access the `covidcast` endpoint:" |
| 24 | + "The `pub_covidcast` function lets us access the `covidcast` endpoint. Here we\n", |
| 25 | + "demonstrate how to fetch the most up-to-date version of the confirmed cumulative COVID cases\n", |
| 26 | + "from the JHU CSSE data source at the national level." |
56 | 27 | ]
|
57 | 28 | },
|
58 | 29 | {
|
59 | 30 | "cell_type": "code",
|
60 | 31 | "execution_count": null,
|
61 |
| - "metadata": {}, |
| 32 | + "metadata": { |
| 33 | + "nbsphinx": "hidden" |
| 34 | + }, |
62 | 35 | "outputs": [],
|
63 | 36 | "source": [
|
64 |
| - "from epidatpy import EpiDataContext, EpiRange\n", |
| 37 | + "# Hidden cell (set in the metadata for this cell)\n", |
65 | 38 | "import pandas as pd\n",
|
66 | 39 | "\n",
|
67 | 40 | "# Set common options and context\n",
|
68 |
| - "pd.set_option('display.max_columns', None)\n", |
69 |
| - "pd.set_option('display.max_rows', None)\n", |
70 |
| - "pd.set_option('display.width', 1000)\n", |
71 |
| - "\n", |
72 |
| - "epidata = EpiDataContext(use_cache=False)\n", |
73 |
| - "\n", |
74 |
| - "# Obtain the most up-to-date version of the smoothed covid-like illness (CLI)\n", |
75 |
| - "# signal from the COVID-19 Trends and Impact survey for the US\n", |
76 |
| - "apicall = epidata.pub_covidcast(\n", |
77 |
| - " data_source = \"fb-survey\",\n", |
78 |
| - " signals = \"smoothed_cli\",\n", |
79 |
| - " geo_type = \"nation\",\n", |
80 |
| - " time_type = \"day\",\n", |
81 |
| - " geo_values = \"us\",\n", |
82 |
| - " time_values = EpiRange(20210405, 20210410))\n", |
83 |
| - "\n", |
84 |
| - "print(apicall)" |
| 41 | + "pd.set_option(\"display.max_columns\", None)\n", |
| 42 | + "pd.set_option(\"display.max_rows\", 10)\n", |
| 43 | + "pd.set_option(\"display.width\", 1000)" |
85 | 44 | ]
|
86 | 45 | },
|
87 | 46 | {
|
88 |
| - "cell_type": "markdown", |
| 47 | + "cell_type": "code", |
| 48 | + "execution_count": null, |
89 | 49 | "metadata": {},
|
| 50 | + "outputs": [], |
90 | 51 | "source": [
|
91 |
| - "`pub_covidcast` returns an `EpiDataCall`, which is a not-yet-executed query that can be inspected. The query can be executed and converted to a DataFrame by using the `.df()` method:\n" |
| 52 | + "from epidatpy import CovidcastEpidata, EpiDataContext, EpiRange\n", |
| 53 | + "\n", |
| 54 | + "# Create the client object. Note that due to the arguments below all results\n", |
| 55 | + "# will be cached to your disk for 7 days, which helps avoid making repeated\n", |
| 56 | + "# downloads.\n", |
| 57 | + "epidata = EpiDataContext(use_cache=True, cache_max_age_days=7)\n", |
| 58 | + "\n", |
| 59 | + "# `pub_covidcast` returns an `EpiDataCall`, which is a not-yet-executed query\n", |
| 60 | + "# that can be inspected.\n", |
| 61 | + "apicall = epidata.pub_covidcast(\n", |
| 62 | + " data_source=\"jhu-csse\",\n", |
| 63 | + " signals=\"confirmed_cumulative_num\",\n", |
| 64 | + " geo_type=\"nation\",\n", |
| 65 | + " time_type=\"day\",\n", |
| 66 | + " geo_values=\"us\",\n", |
| 67 | + " time_values=EpiRange(20210405, 20210410),\n", |
| 68 | + ")\n", |
| 69 | + "print(apicall)\n", |
| 70 | + "# The query can be executed and converted to a DataFrame by using the `.df()`\n", |
| 71 | + "# method:\n", |
| 72 | + "apicall.df()" |
92 | 73 | ]
|
93 | 74 | },
|
94 | 75 | {
|
|
97 | 78 | "metadata": {},
|
98 | 79 | "outputs": [],
|
99 | 80 | "source": [
|
100 |
| - "data = apicall.df()\n", |
101 |
| - "print(data.head())" |
| 81 | + "# Create the pub_covidcast-specific client object. This you to find what sources\n", |
| 82 | + "# and signals are available without leaving your REPL.\n", |
| 83 | + "covidcast = CovidcastEpidata(use_cache=True, cache_max_age_days=7)\n", |
| 84 | + "# Get a list of all the sources available in the pub_covidcast endpoint.\n", |
| 85 | + "print(covidcast.source_names())\n", |
| 86 | + "print(covidcast.signal_names(\"jhu-csse\"))\n", |
| 87 | + "# Obtain the same data as above with a different interface.\n", |
| 88 | + "covidcast[\"jhu-csse\", \"confirmed_cumulative_num\"].call(\n", |
| 89 | + " \"nation\",\n", |
| 90 | + " \"us\",\n", |
| 91 | + " EpiRange(20210405, 20210410),\n", |
| 92 | + ").df()\n", |
| 93 | + "# See the \"Finding data of interest\" notebook for more features of this interface." |
102 | 94 | ]
|
103 | 95 | },
|
104 | 96 | {
|
|
125 | 117 | "metadata": {},
|
126 | 118 | "outputs": [],
|
127 | 119 | "source": [
|
128 |
| - "apicall = epidata.pub_covidcast(\n", |
129 |
| - " data_source = \"fb-survey\",\n", |
130 |
| - " signals = \"smoothed_cli\",\n", |
131 |
| - " geo_type = \"state\",\n", |
132 |
| - " time_type = \"day\",\n", |
133 |
| - " geo_values = \"*\",\n", |
134 |
| - " time_values = EpiRange(20210405, 20210410))\n", |
135 |
| - "\n", |
136 |
| - "print(apicall)\n", |
137 |
| - "print(apicall.df().head())" |
| 120 | + "epidata.pub_covidcast(\n", |
| 121 | + " data_source=\"fb-survey\",\n", |
| 122 | + " signals=\"smoothed_cli\",\n", |
| 123 | + " geo_type=\"state\",\n", |
| 124 | + " time_type=\"day\",\n", |
| 125 | + " geo_values=\"*\",\n", |
| 126 | + " time_values=EpiRange(20210405, 20210410),\n", |
| 127 | + ").df()" |
138 | 128 | ]
|
139 | 129 | },
|
140 | 130 | {
|
|
152 | 142 | "metadata": {},
|
153 | 143 | "outputs": [],
|
154 | 144 | "source": [
|
155 |
| - "apicall = epidata.pub_covidcast(\n", |
156 |
| - " data_source = \"fb-survey\",\n", |
157 |
| - " signals = \"smoothed_cli\",\n", |
158 |
| - " geo_type = \"state\",\n", |
159 |
| - " time_type = \"day\",\n", |
160 |
| - " geo_values = \"pa,ca,fl\",\n", |
161 |
| - " time_values = EpiRange(20210405, 20210410))\n", |
162 |
| - "\n", |
163 |
| - "print(apicall)\n", |
164 |
| - "print(apicall.df().head())" |
| 145 | + "epidata.pub_covidcast(\n", |
| 146 | + " data_source=\"fb-survey\",\n", |
| 147 | + " signals=\"smoothed_cli\",\n", |
| 148 | + " geo_type=\"state\",\n", |
| 149 | + " time_type=\"day\",\n", |
| 150 | + " geo_values=\"pa,ca,fl\",\n", |
| 151 | + " time_values=\"*\",\n", |
| 152 | + ").df()" |
165 | 153 | ]
|
166 | 154 | },
|
167 | 155 | {
|
|
182 | 170 | "metadata": {},
|
183 | 171 | "outputs": [],
|
184 | 172 | "source": [
|
185 |
| - "apicall = epidata.pub_covidcast(\n", |
186 |
| - " data_source = \"fb-survey\",\n", |
187 |
| - " signals = \"smoothed_cli\",\n", |
188 |
| - " geo_type = \"state\",\n", |
189 |
| - " time_type = \"day\",\n", |
190 |
| - " geo_values = \"pa\",\n", |
191 |
| - " time_values = EpiRange(20210405, 20210410),\n", |
192 |
| - " as_of = \"2021-06-01\")\n", |
193 |
| - "\n", |
194 |
| - "print(apicall)\n", |
195 |
| - "print(apicall.df().head())" |
| 173 | + "epidata.pub_covidcast(\n", |
| 174 | + " data_source=\"fb-survey\",\n", |
| 175 | + " signals=\"smoothed_cli\",\n", |
| 176 | + " geo_type=\"state\",\n", |
| 177 | + " time_type=\"day\",\n", |
| 178 | + " geo_values=\"pa\",\n", |
| 179 | + " time_values=EpiRange(20210405, 20210410),\n", |
| 180 | + " as_of=\"2021-06-01\",\n", |
| 181 | + ").df()" |
196 | 182 | ]
|
197 | 183 | },
|
198 | 184 | {
|
|
213 | 199 | "source": [
|
214 | 200 | "import matplotlib.pyplot as plt\n",
|
215 | 201 | "\n",
|
216 |
| - "plt.rcParams['figure.dpi'] = 300\n", |
| 202 | + "plt.rcParams[\"figure.dpi\"] = 300\n", |
217 | 203 | "\n",
|
218 | 204 | "apicall = epidata.pub_covidcast(\n",
|
219 |
| - " data_source = \"fb-survey\",\n", |
220 |
| - " signals = \"smoothed_cli\", \n", |
221 |
| - " geo_type = \"state\",\n", |
222 |
| - " geo_values = \"pa,ca,fl\",\n", |
223 |
| - " time_type = \"day\",\n", |
224 |
| - " time_values = EpiRange(20210405, 20210410))\n", |
225 |
| - "\n", |
226 |
| - "data = apicall.df()\n", |
| 205 | + " data_source=\"fb-survey\",\n", |
| 206 | + " signals=\"smoothed_cli\",\n", |
| 207 | + " geo_type=\"state\",\n", |
| 208 | + " geo_values=\"pa,ca,fl\",\n", |
| 209 | + " time_type=\"day\",\n", |
| 210 | + " time_values=EpiRange(20210405, 20210410),\n", |
| 211 | + ")\n", |
227 | 212 | "\n",
|
228 | 213 | "fig, ax = plt.subplots(figsize=(6, 5))\n",
|
229 | 214 | "ax.spines[\"right\"].set_visible(False)\n",
|
230 | 215 | "ax.spines[\"left\"].set_visible(False)\n",
|
231 | 216 | "ax.spines[\"top\"].set_visible(False)\n",
|
232 | 217 | "\n",
|
233 |
| - "data.pivot_table(values = \"value\", index = \"time_value\", columns = \"geo_value\").plot(\n", |
234 |
| - " xlabel=\"Date\",\n", |
235 |
| - " ylabel=\"CLI\",\n", |
236 |
| - " ax = ax,\n", |
237 |
| - " linewidth = 1.5\n", |
| 218 | + "(\n", |
| 219 | + " apicall.df()\n", |
| 220 | + " .pivot_table(values=\"value\", index=\"time_value\", columns=\"geo_value\")\n", |
| 221 | + " .plot(xlabel=\"Date\", ylabel=\"CLI\", ax=ax, linewidth=1.5)\n", |
238 | 222 | ")\n",
|
239 | 223 | "\n",
|
240 | 224 | "plt.title(\"Smoothed CLI from Facebook Survey\", fontsize=16)\n",
|
241 |
| - "plt.subplots_adjust(bottom=.2)\n", |
| 225 | + "plt.subplots_adjust(bottom=0.2)\n", |
242 | 226 | "plt.show()"
|
243 | 227 | ]
|
244 | 228 | },
|
|
279 | 263 | "\n",
|
280 | 264 | "## Epiweeks and dates\n",
|
281 | 265 | "\n",
|
282 |
| - "Epiweeks use the U.S. definition. That is, the first epiweek each year is the\n", |
283 |
| - "week, starting on a Sunday, containing January 4. See [this page](https://www.cmmcp.org/mosquito-surveillance-data/pages/epi-week-calendars-2008-2021)\n", |
284 |
| - "for more information.\n", |
285 |
| - "\n", |
286 | 266 | "Formatting for epiweeks is YYYYWW and for dates is YYYYMMDD.\n",
|
287 | 267 | "\n",
|
288 |
| - "Use individual values, comma-separated lists or, a hyphenated range of values to specify single or several dates.\n", |
289 |
| - "An `EpiRange` object can be also used to construct a range of epiweeks or dates. Examples include:\n", |
| 268 | + "Epiweeks use the U.S. CDC definition, which defines the first epiweek each year\n", |
| 269 | + "to be the first week containing January 4th and the start of the week is on\n", |
| 270 | + "Sunday. See [this\n", |
| 271 | + "page](https://www.cmmcp.org/mosquito-surveillance-data/pages/epi-week-calendars-2008-2021)\n", |
| 272 | + "for a less terse explanation. \n", |
| 273 | + "\n", |
| 274 | + "When specifying the time_values argument, you can use individual values,\n", |
| 275 | + "comma-separated lists or, a hyphenated range of values to specify single or\n", |
| 276 | + "several dates (or epiweeks). An `EpiRange` object can be also used to construct\n", |
| 277 | + "a range of epiweeks or dates. Examples include:\n", |
290 | 278 | "\n",
|
291 | 279 | "- `param = 201530` (A single epiweek)\n",
|
292 | 280 | "- `param = '201401,201501,201601'` (Several epiweeks)\n",
|
293 | 281 | "- `param = '200501-200552'` (A range of epiweeks)\n",
|
294 | 282 | "- `param = '201440,201501-201510'` (Several epiweeks, including a range)\n",
|
295 |
| - "- `param = EpiRange(20070101, 20071231)` (A range of dates)" |
| 283 | + "- `param = EpiRange(20070101, 20071231)` (A range of dates)\n" |
296 | 284 | ]
|
297 | 285 | }
|
298 | 286 | ],
|
|
0 commit comments