<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Nicholas Reich: Biostatistics and Infectious Disease Epidemiology</title>
    <description>Reich Lab @ UMassAmherst</description>
    <link>http://reichlab.io//</link>
    <atom:link href="http://reichlab.io//feed.xml" rel="self" type="application/rss+xml" />
    
      <item>
        <title>A modeler&apos;s primer for working with SARS-CoV-2 genomic data</title>
        <description>&lt;p&gt;SARS-CoV-2 is the name of the virus that causes COVID-19. There have been a lot of really interesting analyses of SARS-CoV-2 genomic data that look at patterns over time in the circulation of different variants. When we started writing this post, we didn’t know how to use or obtain these data, many of which are publicly available. So we wrote this primer for accessing genomic data on SARS-CoV-2 that are available for the US. These data are made public through &lt;a href=&quot;https://www.ncbi.nlm.nih.gov/genbank/&quot;&gt;GenBank&lt;/a&gt; and are pre-processed and served publicly by &lt;a href=&quot;https://nextstrain.org/&quot;&gt;the Nextstrain team&lt;/a&gt;. We hope you find it useful!&lt;/p&gt;

&lt;p&gt;&lt;img class=&quot;img-responsive&quot; width=&quot;700&quot; src=&quot;/images/blog/variants-in-2021.png&quot; /&gt;&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;background-on-genomic-data&quot;&gt;Background on genomic data&lt;/h2&gt;

&lt;h3 id=&quot;gisaid-and-genbank&quot;&gt;GISAID and GenBank&lt;/h3&gt;

&lt;p&gt;Over the last two years, COVID-19 genomic data have been collected and
stored in two central data repositories,
&lt;a href=&quot;https://www.gisaid.org/&quot;&gt;GISAID&lt;/a&gt; and
&lt;a href=&quot;https://www.ncbi.nlm.nih.gov/genbank/&quot;&gt;GenBank&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;GISAID is a global science initiative that provides open-access genomic
data of influenza viruses and the SARS-CoV-2 virus. With over 6 million
submissions, GISAID currently hosts the largest repository for
SARS-CoV-2 sequences. Because GISAID data are not in the public domain,
GISAID data can only be accessed after creating a username and agreeing
to a Database Access Agreement.&lt;/p&gt;

&lt;p&gt;GenBank is an annotated collection of publicly available DNA sequences.
It is the genetic sequence database of the US NIH. GenBank currently has
over 3 million sequence read archive (SRA) runs and over 3.8 million
nucleotide records for SARS-CoV-2. This database was created to provide
access to the most up-to-date and comprehensive DNA sequence
information, therefore there are no restrictions on the use or
distribution of this data.&lt;/p&gt;

&lt;p&gt;Though GISAID contains more variant data for SARS-CoV-2 globally, for
projects interested in looking at data in the US may find that GenBank
is sufficient. This is because much US sequencing is conducted through
CDC contracts that stipulate a GenBank submission.&lt;/p&gt;

&lt;h3 id=&quot;clade-and-variant-nomenclature&quot;&gt;Clade and variant nomenclature&lt;/h3&gt;

&lt;p&gt;There are a number of naming systems in use for labeling variants for
the SARS-CoV-2 virus.&lt;/p&gt;

&lt;p&gt;Two naming systems relevant for tracking variants are
&lt;a href=&quot;https://nextstrain.org/blog/2021-01-06-updated-SARS-CoV-2-clade-naming&quot;&gt;Nextstrain&lt;/a&gt;
and &lt;a href=&quot;https://cov-lineages.org/&quot;&gt;PANGO&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The Nextstrain naming system for COVID-19 involves labeling major clades
by the year they first emerged, and a letter. For example, 19A was the
first major clade detected in 2019.&lt;/p&gt;

&lt;p&gt;A new major clade is named if it is at least 2 nucleotides away from its
parent clade and if it reaches 20% frequency in the global population or
for at least 2 months, or if it reaches 30% regional frequency for at
least 2 months. A major clade may also be named if it is a variant of
concern, even if it has not reached the required population frequency.&lt;/p&gt;

&lt;p&gt;To signify clade major lineages with a hierarchical structure, emerging
clades may be written as a string of parent clade lineages. For example,
19A.20A.20C for the major clade 20C. In the Nextstrain notation,
emerging clades are listed with a parent clade and their defining
nucleotide mutation or amino acid mutation. For example, 19A/28688C or
20B/S.484K. To note variants of concern, this nomenclature also labels
them with their relevant spike mutation and a version numbering. For
example, 20I/501Y.V1&lt;/p&gt;

&lt;p&gt;Additional information regarding the original naming naming conventions
using the Nextstrain naming structure can be found at the link:
&lt;a href=&quot;https://nextstrain.org/blog/2020-06-02-SARSCoV2-clade-naming&quot;&gt;https://nextstrain.org/blog/2020-06-02-SARSCoV2-clade-naming&lt;/a&gt; and
updated nomenclature can be found at the link
&lt;a href=&quot;https://nextstrain.org/blog/2021-01-06-updated-SARS-CoV-2-clade-naming&quot;&gt;https://nextstrain.org/blog/2021-01-06-updated-SARS-CoV-2-clade-naming&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The rules of Pango lineage naming are complex. Information describing
the naming rules for Pango lineages can be found
&lt;a href=&quot;https://www.pango.network/the-pango-nomenclature-system/statement-of-nomenclature-rules/&quot;&gt;here&lt;/a&gt;.
In the Pango naming system, lineages are named to signify clusters of
infections with shared ancestry and to highlight
epidemiologically-relevant events.&lt;/p&gt;

&lt;p&gt;For modelers conducting a variant-level analysis, we recommend using
Nextstrain naming conventions. For modelers wishing to conduct a
sub-variant analysis, Nextstrain naming conventions may not be specific
enough and therefore PANGO naming conventions may be useful in these
settings.&lt;/p&gt;

&lt;h3 id=&quot;data-file-types&quot;&gt;Data file types&lt;/h3&gt;

&lt;p&gt;When looking to access genomic data, raw data is stored in a number of
different file types. For variant level analyses, the “metadata” file
described below is a flat tab-separated dataset that includes one row
per sample and has information such as the sample’s origin, cell line,
and preparation method, as well as which variant was identified in the
sample.&lt;/p&gt;

&lt;p&gt;The FASTA file contains the detailed nucleic acid or amino acid sequence
information in a text fie format for for a single sample.&lt;/p&gt;

&lt;h3 id=&quot;limitations-in-using-genomic-data&quot;&gt;Limitations in using genomic data&lt;/h3&gt;

&lt;p&gt;While genomic data is a potentially useful data source, there are flaws
in the data that are worth noting at this time. One issue with using
genomic data is that in order to capture emerging variants in a
geographic region, experts recommend sequencing at least 5% of available
samples. However, as of December 2021, there were 10 states in the US
that had sequenced fewer than 2% of available samples. Thus, it is
possible that within the US, variants and emerging strains are not being
adequately detected. Information about the heterogenous testing rates in
the US can be found in the &lt;a href=&quot;https://www.nature.com/articles/d41586-021-03698-7&quot;&gt;Nature news article linked
here&lt;/a&gt;. Additionally,
in April of 2021, genomic data was only available for only 1.6% of all
positive cases, thus it is possible that variant information before this
date is inaccurate the national level (&lt;a href=&quot;https://www.nature.com/articles/d41586-021-00908-0&quot;&gt;see another Nature
article&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Another issue that may bias population estimates of variant prevalence
is that there are not clear guidelines on how positive samples are
selected for genotyping. If the more severe or more surprising cases are
sent out to be genotyped, it may influences the results. This is
problematic because without knowing how samples are selected for
testing, we cannot control for this issue in models.&lt;/p&gt;

&lt;p&gt;When using genomic data in models, it is important to consider factors
such as sampling density, the timing of sample collection, the portion
of the viral genome sequenced, quality of sequencing data and the
mutation rate of the virus itself. All of these factors may impact the
validity of variant prevalence values (&lt;a href=&quot;https://www.nature.com/articles/s41564-020-0738-5&quot;&gt;see Villabona-Arenas et
al.&lt;/a&gt;).&lt;/p&gt;

&lt;h2 id=&quot;downloading-nextstrain-curated-genbank-data&quot;&gt;Downloading Nextstrain-curated GenBank data&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;https://nextstrain.org/&quot;&gt;Nextstrain&lt;/a&gt; project makes &lt;a href=&quot;https://docs.nextstrain.org/projects/ncov/en/latest/reference/remote_inputs.html#summary-of-available-genbank-open-files&quot;&gt;daily snapshots
of GenBank data available for the US and the
world&lt;/a&gt;.
Specifically, the flat tab-separated file available at
&lt;a href=&quot;https://data.nextstrain.org/files/ncov/open/metadata.tsv.gz&quot;&gt;https://data.nextstrain.org/files/ncov/open/metadata.tsv.gz&lt;/a&gt; is updated
daily, typically around 11am or noon US Pacific time. This file is large
(&amp;gt;350MB as of 2022-02-09). For the below, we assume that you have
downloaded this file and unzipped it so the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.tsv&lt;/code&gt; file can be read in
directly.&lt;/p&gt;

&lt;p&gt;A codebook for the fields in the dataset are available
&lt;a href=&quot;https://docs.nextstrain.org/projects/ncov/en/latest/reference/metadata-fields.html&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;initial-genbank-data-exploration&quot;&gt;Initial GenBank data exploration&lt;/h2&gt;

&lt;p&gt;Let’s start by loading some tidyverse packages that will be useful for
us and then by reading in the dataset.&lt;/p&gt;

&lt;div class=&quot;language-r highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;library&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tidyverse&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;library&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lubridate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;theme_set&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;theme_bw&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;genbank_global&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;read_tsv&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;../data/metadata.tsv&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This is a large file, with 3707198 rows. For starters, we create a
version of these data that contain only US data, and only retains
columns we are interested in. Further, we will reformat certain columns.&lt;/p&gt;

&lt;div class=&quot;language-r highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;us_dat&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;genbank_global&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; 
  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;country&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;==&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;USA&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mutate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ymd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; 
         &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date_submitted&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ymd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date_submitted&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
         &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;reporting_lag&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;as.numeric&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date_submitted&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;strain&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;virus&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Nextstrain_clade&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;  &lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;## info on the virus&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
         &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;region&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;country&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;division&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;location&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;## info on location&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
         &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;host&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sampling_strategy&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;## info about the sample&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
         &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date_submitted&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;reporting_lag&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;## dates&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-r highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;us_dat&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;group_by&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Nextstrain_clade&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;summarize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n_samples&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; 
&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;#&amp;gt; # A tibble: 25 × 2&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;#&amp;gt;    Nextstrain_clade n_samples&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;#&amp;gt;    &amp;lt;chr&amp;gt;                &amp;lt;int&amp;gt;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;#&amp;gt;  1 19A                   4445&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;#&amp;gt;  2 19B                   3966&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;#&amp;gt;  3 20A                  50509&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;#&amp;gt;  4 20B                  23174&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;#&amp;gt;  5 20C                  57723&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;#&amp;gt;  6 20D                    613&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;#&amp;gt;  7 20E (EU1)              134&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;#&amp;gt;  8 20G                  62703&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;#&amp;gt;  9 20H (Beta, V2)        2188&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;#&amp;gt; 10 20I (Alpha, V1)     184305&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;#&amp;gt; # … with 15 more rows&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Note that the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sampling_strategy&lt;/code&gt; field is empty for all US data.&lt;/p&gt;

&lt;div class=&quot;language-r highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;us_dat&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;group_by&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sampling_strategy&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;summarize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n_samples&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; 
&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;#&amp;gt; # A tibble: 1 × 2&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;#&amp;gt;   sampling_strategy n_samples&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;#&amp;gt;   &amp;lt;chr&amp;gt;                 &amp;lt;int&amp;gt;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;#&amp;gt; 1 ?                   1838603&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The following figure shows the reporting lags by state as computed in
the data as the difference between the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;date&lt;/code&gt; the sample was taken and
the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;date_submitted&lt;/code&gt;, which is the date the sample was submitted to the
GenBank system. There appears to be substantial variation by location
(note the shorter lags in MA, CA and VT), with 75% of samples typically
reported by 1 month out. Note this is subsetting to look at all data
starting in August of 2021. Some analyses, for example &lt;a href=&quot;https://github.com/blab/rt-from-frequency-dynamics&quot;&gt;the
&lt;em&gt;R&lt;/em&gt;&lt;sub&gt;&lt;em&gt;t&lt;/em&gt;&lt;/sub&gt; analysis that the Bedford Lab has
done&lt;/a&gt;, remove all
samples from the last 10 days, to try to remove small sample effects in
the early reporting.&lt;/p&gt;

&lt;div class=&quot;language-r highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;us_dat&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ymd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;2021-08-01&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ggplot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;aes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;division&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;reporting_lag&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; 
  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;geom_boxplot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;scale_y_log10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;lag (days, log scale)&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;breaks&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;m&quot;&gt;14&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;m&quot;&gt;21&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;m&quot;&gt;30&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;m&quot;&gt;90&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;m&quot;&gt;360&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;m&quot;&gt;720&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;xlab&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;theme&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;axis.text.x&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;element_text&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;angle&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;60&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;vjust&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hjust&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;/images/blog/samples-by-location.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Plot of samples over time&lt;/p&gt;

&lt;div class=&quot;language-r highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;us_dat&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;group_by&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;summarize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n_samples&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; 
  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ggplot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;aes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n_samples&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;geom_bar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;alpha&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;.3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;stat&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;identity&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;geom_smooth&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;span&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;.1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;se&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;FALSE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;/images/blog/samples-over-time.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Here is a plot of the prevalence of each clade by week across 2021.&lt;/p&gt;

&lt;div class=&quot;language-r highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;strains_of_interest&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; 
  &lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;20I (Alpha, V1)&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; 
    &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;20J (Gamma, V3)&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; 
    &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;21A (Delta)&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;21C (Epsilon)&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;21I (Delta)&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;21J (Delta)&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;21K (Omicron)&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;

&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;by_clade&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;us_dat&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;year&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;==&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;m&quot;&gt;2021&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;   &lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;## focus only on 2021 &lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Nextstrain_clade&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%in%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;strains_of_interest&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mutate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;epiweek&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;epiweek&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;epiweek&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;m&quot;&gt;53&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;        &lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;## values of 53 are at the start of 2021 &lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;group_by&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Nextstrain_clade&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;epiweek&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;summarize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;clade_total&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;group_by&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;epiweek&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mutate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;total_samples&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;clade_total&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
         &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pct_clade_in_epiweek&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;clade_total&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;total_samples&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;  

&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;by_clade&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ggplot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;aes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;epiweek&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pct_clade_in_epiweek&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;color&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Nextstrain_clade&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;geom_line&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;xlab&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;epiweek in 2021&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ylab&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;% of samples by clade&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;/images/blog/variants-in-2021.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;acknowledgments&quot;&gt;Acknowledgments&lt;/h2&gt;

&lt;p&gt;We wish to thank all the labs whose effort and time are essential to
making these important data available. Also, thanks to &lt;a href=&quot;https://bedford.io/&quot;&gt;Trevor
Bedford&lt;/a&gt; and &lt;a href=&quot;https://nextstrain.org/&quot;&gt;the Nextstrain
team&lt;/a&gt; for providing the processed version of
the metadata files and providing some pointers and insights about these
data as we were writing this post.&lt;/p&gt;

</description>
        <pubDate>Tue, 15 Feb 2022 00:00:00 +0000</pubDate>
        <link>http://reichlab.io//2022/02/15/genbank-data.html</link>
        <guid isPermaLink="true">http://reichlab.io//2022/02/15/genbank-data.html</guid>
      </item>
    
      <item>
        <title>Forecasting tools in development</title>
        <description>&lt;p&gt;As I’ve been writing up a progress report for &lt;a href=&quot;https://projectreporter.nih.gov/project_info_description.cfm?aid=9553816&quot;&gt;my NIGMS R35 MIRA award&lt;/a&gt;, I’ve been reminded at how much of the work that we’ve been doing is focused on forecasting infrastructure. A common theme in the &lt;a href=&quot;http://reichlab.io/&quot;&gt;Reich Lab&lt;/a&gt; is making &lt;em&gt;operational&lt;/em&gt; forecasts of infectious disease outbreaks. The operational aspect means that we focus on everything from &lt;a href=&quot;https://doi.org/10.1002/sim.7488&quot;&gt;developing&lt;/a&gt; and &lt;a href=&quot;https://doi.org/10.1371/journal.pcbi.1005910&quot;&gt;adapting&lt;/a&gt; statistical methods to be used in forecasting applications to thinking about the data science toolkit that you need to store, evaluate, and visualize forecasts. To that end, in addition to working closely with the CDC in their &lt;a href=&quot;https://www.cdc.gov/flu/weekly/flusight/index.html&quot;&gt;FluSight initiative&lt;/a&gt;, we’ve been doing a lot of collaborative work on new R packages and data repositories that I hope will be useful beyond the confines of our lab. Some of these projects are fully operational, used in our &lt;a href=&quot;http://flusightnetwork.io/&quot;&gt;production flu forecasts for CDC&lt;/a&gt;, and some have even gone through &lt;a href=&quot;https://doi.org/10.21105/joss.00231&quot;&gt;a level of code peer review&lt;/a&gt;. Others are in earlier stages of development. My hope is that in putting this list out there (see below the fold) we will generate some interest (and possibly find some new open-source collaborators) for these projects.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://flusightnetwork.io/&quot;&gt;
    &lt;img class=&quot;img-responsive&quot; width=&quot;700&quot; src=&quot;/images/flusight/2_flusight.JPG&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Here is a partial list of in-progress software that we’ve been working on:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/reichlab/sarimaTD&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sarimaTD&lt;/code&gt;&lt;/a&gt; is an R package that serves as a wrapper to some of the ARIMA modeling functionality in the &lt;a href=&quot;http://pkg.robjhyndman.com/forecast/&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;forecast&lt;/code&gt;&lt;/a&gt; R package. We found that we consistently wanted to be specifying some transformations (T) and differencing (D) in specific ways that we have found useful in modeling infectious disease time-series data, so we made it easy for us and others to use specifications.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://cran.r-project.org/package=ForecastFramework&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ForecastFramework&lt;/code&gt;&lt;/a&gt; is an R package that we have collaborated on with our colleagues at the &lt;a href=&quot;http://www.iddynamics.jhsph.edu/&quot;&gt;Johns Hopkins Infectious Disease Dynamics lab&lt;/a&gt;. We’ve &lt;a href=&quot;http://reichlab.io/2019/01/29/forecast-framework-demo.html&quot;&gt;blogged about this before&lt;/a&gt;, and we see a lot of potential in this object-oriented framework for both standardizing how datasets are specified/accessed and how models are generated. That said, there still is a long ways to go to document and make this infrastructure usable by a wide audience. The most success I’ve had using it so far was having PhD students write &lt;a href=&quot;https://github.com/reichlab/german-flu-forecasting&quot;&gt;forecast models for a seminar I taught this spring&lt;/a&gt;. I used &lt;a href=&quot;https://github.com/reichlab/german-flu-forecasting/blob/master/code/evaluation-code.R&quot;&gt;a single script&lt;/a&gt; that could run and score forecasts from each model, with a very simple plug-and-play interface to the models because they had been specified appropriately.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://zoltardata.com/&quot;&gt;Zoltar&lt;/a&gt; is a new repository (in alpha-ish release right now) for forecasts that we have been working on over the last year. It was initially designed with our CDC flu forecast use-case in mind, although the forecast structure is quite general, and with &lt;a href=&quot;http://cdcepi.github.io/predx&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;predx&lt;/code&gt;&lt;/a&gt; integration on the way (see next bullet) we are hoping that this will broaden the scope of possible use cases for Zoltar. To help facilitate our and others use of Zoltar, we are working on two interfaces to the Zoltar API, &lt;a href=&quot;https://github.com/reichlab/zoltpy&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;zoltpy&lt;/code&gt;&lt;/a&gt; for python and &lt;a href=&quot;http://reichlab.io/zoltr/&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;zoltr&lt;/code&gt;&lt;/a&gt; for R. Check out the documentation, especially for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;zoltr&lt;/code&gt;. There is quite a bit of data available!&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://cdcepi.github.io/predx&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;predx&lt;/code&gt;&lt;/a&gt; is an R package designed my colleague and friend Michael Johansson of &lt;a href=&quot;https://cdc.gov/&quot;&gt;the US Centers for Disease Control and Prevention&lt;/a&gt; and &lt;a href=&quot;http://outbreakscience.org/&quot;&gt;OutbreakScience&lt;/a&gt;. &lt;a href=&quot;https://www.mtholyoke.edu/~eray/&quot;&gt;Evan Ray&lt;/a&gt;, from the Reich Lab team, has contributed to it as well. The goal of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;predx&lt;/code&gt; is to define some general classes of data for both probabilistic and point forecasts, to better standardize ways that we might want to store and operate on these data.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://reichlab.io/d3-foresight/&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;d3-foresight&lt;/code&gt;&lt;/a&gt; is the main engine behind our &lt;a href=&quot;http://flusightnetwork.io/&quot;&gt;interactive forecast visualizations for flu in the US&lt;/a&gt;. We have also integrated it with Zoltar, so that you can &lt;a href=&quot;https://zoltardata.com/project/4/visualizations&quot;&gt;view forecasts stored in Zoltar&lt;/a&gt; (note, kind of a long load time for that last link) using some of the basic &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;d3-foresight&lt;/code&gt; functionality.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The lion’s share of the credit for all of the above are due to some combination of &lt;a href=&quot;http://www.matthewcornell.org/&quot;&gt;Matthew Cornell&lt;/a&gt;, &lt;a href=&quot;https://lepisma.xyz/about&quot;&gt;Abhinav Tushar&lt;/a&gt;, &lt;a href=&quot;http://katie-house.com/&quot;&gt;Katie House&lt;/a&gt;, and &lt;a href=&quot;https://www.mtholyoke.edu/~eray/&quot;&gt;Evan Ray&lt;/a&gt;.&lt;/p&gt;
</description>
        <pubDate>Mon, 17 Jun 2019 00:00:00 +0000</pubDate>
        <link>http://reichlab.io//2019/06/17/forecast-tools.html</link>
        <guid isPermaLink="true">http://reichlab.io//2019/06/17/forecast-tools.html</guid>
      </item>
    
      <item>
        <title>Forecast Framework Demo</title>
        <description>&lt;p&gt;Want to learn how to do some forecasting with R? Here’s your chance to try out a new time-series forecasting package for R whose aim is to standardize and simplify the process of making and evaluating forecasts!&lt;/p&gt;

&lt;p&gt;The &lt;a href=&quot;http://reichlab.io/&quot;&gt;Reich Lab&lt;/a&gt; uses an R package called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ForecastFramework&lt;/code&gt; to implement forecasting models. There are many benefits to using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ForecastFramework&lt;/code&gt; in a forecasting pipeline, including: standardized and simplified rapid model development and performance evaluation. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ForecastFramework&lt;/code&gt; was created by Joshua Kaminsky of the &lt;a href=&quot;http://www.iddynamics.jhsph.edu/&quot;&gt;Infectious Disease Dynamics Group at Johns Hopkins University&lt;/a&gt;. The package is open source and can be found &lt;a href=&quot;https://github.com/HopkinsIDD/ForecastFramework&quot;&gt;on Github&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;After watching students in the lab working on learning how to use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ForecastFramework&lt;/code&gt;, I decided to create a step-by-step demonstration of the primary use cases of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ForecastFramework&lt;/code&gt;. The complete demo lives at &lt;a href=&quot;http://reichlab.io/forecast-framework-demos/&quot;&gt;reichlab.io/forecast-framework-demos/&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://reichlab.github.io/flusight/&quot;&gt;
    &lt;img class=&quot;img-responsive&quot; width=&quot;700&quot; src=&quot;/images/blog/ff-demo.PNG&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;As the resident CS grad-student programmer in the lab, I wanted to write these demonstrations to make &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ForecastFramework&lt;/code&gt; accessible. However, the demo only scrapes the surface of the many way that we (and others, we hope) will use  &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ForecastFramework&lt;/code&gt;. We have incorporated &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ForecastFramework&lt;/code&gt; into our workflow for generating &lt;a href=&quot;https://journals.plos.org/plosntds/article?id=10.1371/journal.pntd.0004761&quot;&gt;real-time dengue forecasts for the Ministry of Public Health in Thailand&lt;/a&gt;, and students have found it useful in generating small model comparison projects.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://reichlab.io/forecast-framework-demos/&quot;&gt;The online demo&lt;/a&gt; is separated into five sections. Each section will build off knowledge from the previous and will gradually increase in difficulty. However, the demos work as standalone scripts as well. The demos are categorized as the following:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;&lt;a href=&quot;http://reichlab.io/forecast-framework-demos/#the-data-1&quot;&gt;&lt;strong&gt;The Data&lt;/strong&gt;&lt;/a&gt; - This section will examine the raw data used in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ForecastFramework&lt;/code&gt; models ahead.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://reichlab.io/forecast-framework-demos/#defining-inputs-incidence-matrix-1&quot;&gt;&lt;strong&gt;Defining Inputs&lt;/strong&gt;&lt;/a&gt; - This section will define what an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;IncidenceMatrix&lt;/code&gt; is, show how to format your data to be used as an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;IncidenceMatrix&lt;/code&gt;, and exemplify functions of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;IncidenceMatrix&lt;/code&gt; class.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://reichlab.io/forecast-framework-demos/#fitting-and-forecasting&quot;&gt;&lt;strong&gt;Fitting and Forecasting&lt;/strong&gt;&lt;/a&gt; - This section will focus on fitting data to a SARIMA model with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ForecastFramework&lt;/code&gt;, using the &lt;a href=&quot;https://github.com/reichlab/sarimaTD&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sarimaTD&lt;/code&gt; package&lt;/a&gt; developed by &lt;a href=&quot;http://www.mtholyoke.edu/~eray/&quot;&gt;Evan Ray&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://reichlab.io/forecast-framework-demos/#evaluating-multiple-models&quot;&gt;&lt;strong&gt;Evaluating Complex Models&lt;/strong&gt;&lt;/a&gt; - This section will demonstrate evaluation metrics and techniques by comparing two complex models in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ForecastFramework&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://reichlab.io/forecast-framework-demos/#creating-your-own-model&quot;&gt;&lt;strong&gt;Creating your own Model&lt;/strong&gt;&lt;/a&gt; - This section will use object-oriented R programming demonstrate how to create your own model with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ForecastFramework&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href=&quot;http://reichlab.io/forecast-framework-demos/&quot;&gt;Try using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ForecastFramework&lt;/code&gt; today&lt;/a&gt;! I hope you find the tutorials interesting and instructive. If you have any questions or find any bugs, please let me know! I can be found at khouse [at] umass.edu or through &lt;a href=&quot;http://katie-house.com/&quot;&gt;my personal website&lt;/a&gt;.&lt;/p&gt;

</description>
        <pubDate>Tue, 29 Jan 2019 00:00:00 +0000</pubDate>
        <link>http://reichlab.io//2019/01/29/forecast-framework-demo.html</link>
        <guid isPermaLink="true">http://reichlab.io//2019/01/29/forecast-framework-demo.html</guid>
      </item>
    
      <item>
        <title>Data needs for forecasting influenza pandemics</title>
        <description>&lt;p&gt;Last week, I attended a Pandemic Influenza Exercise at the US CDC. To be
clear, there is NOT a pandemic occuring right now, but the CDC ran this
exercise where hundreds of staff members and outside observers and
participants came together to practice going through the motions of a
public health response to a major pandemic. As someone who is usually
sheltered from this everyday aspect of public health decision-making,
this was a fascinating window into understanding the careful, if
time-pressured, scientific deliberation that underlies the response to
public health emergencies.&lt;/p&gt;

&lt;p&gt;&lt;img class=&quot;img-responsive&quot; width=&quot;700&quot; src=&quot;/images/blog/20180917-pneumo-deaths-total.png&quot; /&gt;&lt;/p&gt;

&lt;!--more--&gt;

&lt;h3 id=&quot;forecasting-in-a-pandemic&quot;&gt;Forecasting in a pandemic&lt;/h3&gt;

&lt;p&gt;I was sitting with a team called Risk Assessment and Forecasting, and
was one of three academic forecasters in this group. (The others were
&lt;a href=&quot;http://www.columbia.edu/~jls106/&quot;&gt;Jeff Shaman&lt;/a&gt; and &lt;a href=&quot;https://delphi.midas.cs.cmu.edu/&quot;&gt;Roni
Rosenfeld&lt;/a&gt;.) When I was invited, I
wasn’t sure whether I was going to spend these three days scrambling on
my computer to actually run data analyses or not. As it turned out, our
job was more to listen, watch, and occasionally give some input about
what kind of data would be helpful if we were needing to produce
real-time pandemic forecasts. Jeff was the only one of us who had
actively worked in producing real-time pandemic forecasts before (e.g.,
&lt;a href=&quot;http://currents.plos.org/outbreaks/article/inference-and-forecast-of-the-current-west-african-ebola-outbreak-in-guinea-sierra-leone-and-liberia/&quot;&gt;for the Ebola
outbreak&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;There are lots of open questions about what kinds of models would be
useful and how you might go about setting up forecasting models
(especially more statistical models) in a real pandemic. Fortunately for
public health (although unfortuantely for the science of forecasting),
we don’t get many opportunities to practice. There were a
&lt;a href=&quot;https://www.sciencedirect.com/science/article/pii/S1755436517301275&quot;&gt;handful&lt;/a&gt;
of &lt;a href=&quot;https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4234409/&quot;&gt;good&lt;/a&gt;
&lt;a href=&quot;http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&amp;amp;id=25254986&amp;amp;retmode=ref&amp;amp;cmd=prlinks&quot;&gt;forecasting&lt;/a&gt;
&lt;a href=&quot;http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&amp;amp;id=25480136&amp;amp;retmode=ref&amp;amp;cmd=prlinks&quot;&gt;studies&lt;/a&gt;
during Ebola, and these provide some of our best data on the models and
experience of creating forecasts in real-time during large global
outbreaks. Never before had a global pandemic occurred at a time when a
number of different research groups had the capacity and technical
knowledge to implement reasonable attempts at forecasting the trajectory
of a pandemic.&lt;/p&gt;

&lt;p&gt;Over the course of the past week, our team at the CDC exercise talked a
bit about data that would be useful in real-time for pandemic
forecasting. One thread of this conversation focused on real-time
epidemiological data that would be collected during the outbreak. For
example, consistently sampled surveillance data, especially
laboratory-confirmed case counts and strain-specific positivity rates,
would provide valuable measures of infection rates in a population at a
given time that could be used in models.&lt;/p&gt;

&lt;p&gt;However, a second thread focused on data that exist right now in various
forms that may also be useful in pandemic settings, especially if they
were made more readily and easily available. I am a forecaster and
statistician whose team builds mostly “statistical” models, that is,
models that rely strongly on previously observed empirical data and
patterns in similar prior situations. This is an uncomfortable position
to be in when forecasting new emerging pandemics, as there are so few
“comparable” events on record. This approach constrasts with more
“mechanistic” model-building approaches that rely heavily on a
structured set of assumptions about susceptibility in the underlying
population, and how transmissible the population is, etc…&lt;/p&gt;

&lt;p&gt;So the statistician/data-scientist in me argues that there would be not
insignificant value in having some curated datasets on pandemics.
Certainly things have changed dramatically since 1918, 1957, 1968, and
even 2009, so what we can learn from those settings in a new influenza
pandemic would be limited. But having both &lt;a href=&quot;https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5198166/&quot;&gt;a historical understanding
of pandemics&lt;/a&gt; as
well as some data from those outbreaks &lt;a href=&quot;https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005749&quot;&gt;can provide useful insights into
potential future dynamics of
pandemics&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Some circulating questions during the recent CDC exercise revolved
around whether there would be a hypothetical second wave and if so, what
it would look like. Mechanistic models could start to answer that
question quantitatively early on in an outbreak, but I’d argue that for
a while they’d be so uncertain as to be almost useless in a hypothetical
pandemic response. To start to answer the question about a second wave,
I wanted to look at past data. It is not obvious what the best source of
data like this is, and there does not appear to be a single
authoritative repository or archive of data on pandemic flu. That’s what
set me off on a hunt to make some plots of influenza pandemic dynamics
over the past century.&lt;/p&gt;

&lt;h3 id=&quot;finding-data-from-project-tycho&quot;&gt;Finding data from Project Tycho&lt;/h3&gt;

&lt;p&gt;After a brief and unsuccessful round of Googling for public datasets on
pandemic flu, I turned to &lt;a href=&quot;https://www.tycho.pitt.edu/data/&quot;&gt;Project
Tycho&lt;/a&gt;, a carefully curated public
repository of infectious disease data. As it turns out (as you will see
below), the data they have on influenza (and pneumonia) is not
particularly helpful or exhaustive, but it was at least a place to
start. So I’ll walk through an edited and simplified version of some of
the back-of-the-envelope descriptive analyses I ran for myself this
week. In particular, I’m going to walk through the creation of two
different plots of public pandemic data from two different Project Tycho
data sources that I tried to use to help me think about the potential of
a second wave.&lt;/p&gt;

&lt;p&gt;I’m going to walk through this analysis in R, so first let’s get some
preliminaries out of the way to get my R session set up…&lt;/p&gt;

&lt;div class=&quot;language-r highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;library&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data.table&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;library&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dplyr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;library&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ggplot2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;library&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MMWRweek&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;

&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;theme_set&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;theme_bw&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;  &lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;## because I don&apos;t like the ggplot default theme :-)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;the-1918-influenza-pandemic-plotting-pneumonia-deaths&quot;&gt;The 1918 influenza pandemic: plotting pneumonia deaths&lt;/h3&gt;

&lt;p&gt;Let’s start by using the &lt;a href=&quot;https://www.tycho.pitt.edu/dataset/api/&quot;&gt;Project Tycho
API&lt;/a&gt; to access data through R.
To replicate this analysis, you will need to create an account (if you
don’t have one already), find your API key on your profile page, and
insert it in the code below.&lt;/p&gt;

&lt;div class=&quot;language-r highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;APIKEY&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;put-your-key-here&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I toggled back and forth for a while from my code and the &lt;a href=&quot;https://www.tycho.pitt.edu/dataset/api/&quot;&gt;Project Tycho
API site&lt;/a&gt; to build some code to
construct an API query. It took some time to figure out exactly which
“Condition Name” I should use, but after some hunting and pecking it
turned out the best data for either pneumonia or influenza during the
1918 pandemic is case and death data for pneumonia. The following code
sets up the query:&lt;/p&gt;

&lt;div class=&quot;language-r highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;baseAPIlink&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;https://www.tycho.pitt.edu/api/query?&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ConditionName&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;Pneumonia&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CountryISO&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;US&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PeriodStartDate&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;1918-01-01&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PeriodEndDate&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;1921-12-31&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;

&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;API_link&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;paste0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;baseAPIlink&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;apikey=&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;APIKEY&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&amp;amp;ConditionName=&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ConditionName&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&amp;amp;CountryISO=&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CountryISO&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&amp;amp;Fatalities=0&amp;amp;PartOfCumulativeCountSeries=0&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&amp;amp;PeriodStartDate&amp;gt;=&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PeriodStartDate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&amp;amp;PeriodEndDate&amp;lt;=&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PeriodEndDate&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then, I downloaded the data and made a few small data transformations
for easier management later, like making date fields dates and expanding
the dataframe to have &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NA&lt;/code&gt; for missing values instead of just having
entirely missing rows.&lt;/p&gt;

&lt;div class=&quot;language-r highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;pneumo_dat&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;read.csv&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;API_link&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;header&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;TRUE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mutate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PeriodStartDate&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;as.Date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PeriodStartDate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PeriodEndDate&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;as.Date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PeriodEndDate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tidyr&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;complete&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CityName&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PeriodStartDate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Since not all cities have a lot of cases, I did some subsetting to
identify just cities that had both a lot of cases and weeks where cases
were reported.&lt;/p&gt;

&lt;div class=&quot;language-r highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;cities_with_1918&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pneumo_dat&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; 
    &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PeriodEndDate&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;as.Date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;1919-12-31&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;group_by&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CityName&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;summarize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tot_records&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;is.na&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CountValue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)))&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tot_records&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;60&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CityName&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;

&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;big_cities&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pneumo_dat&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; 
    &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;group_by&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CityName&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;summarize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tot_count&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CountValue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;na.rm&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;TRUE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tot_count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;1000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CityName&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Once we have all that done, it’s not that difficult to use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ggplot&lt;/code&gt; to
construct a plot of pneumonia deaths during and after the 1918 pandemic.&lt;/p&gt;

&lt;div class=&quot;language-r highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;ggplot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pneumo_dat&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CityName&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%in%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;big_cities&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CityName&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%in%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cities_with_1918&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; 
    &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;aes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PeriodStartDate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CountValue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;geom_line&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;facet_grid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CityName&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;~&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;scales&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;free_y&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;/images/blog/20180917-pneumo-deaths-city.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;the-1968-influenza-pandemic-pneumonia-and-influenza-deaths&quot;&gt;The 1968 influenza pandemic: pneumonia and influenza deaths&lt;/h3&gt;

&lt;p&gt;The dataset I used above ends in the early 1950s, so it didn’t cover the
other influenza pandemics of the second half of the 20th century.
However, the “Level 2” data from Proejct Tycho (available, as best I
could tell via a zip file download but not through the API) does have
some different data that covers the 1968 pandemic. If you’re trying to
run this code, you will need to download the Level2 data and insert
below the path to your csv file.&lt;/p&gt;

&lt;div class=&quot;language-r highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;fname&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;your-path-to-download/ProjectTycho_Level2_v1.1.0.csv&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I used &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;data.table&lt;/code&gt; to read it in, since it’s a big ol’ file (~450MB).&lt;/p&gt;

&lt;div class=&quot;language-r highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;DT&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;fread&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;fname&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;fludat&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DT&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;url&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;disease&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%in%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;INFLUENZA&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;PNEUMONIA&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;PNEUMONIA AND INFLUENZA&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
            &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mutate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
                &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;from_date&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;as.Date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;from_date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
                &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;to_date&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;as.Date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;to_date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
                &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;year&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;as.numeric&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;substr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;epi_week&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;m&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;m&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)),&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
                &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;city_state&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;paste&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;loc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;state&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
            &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tidyr&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;complete&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;city_state&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;epi_week&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;As was necessary in the first example, this needed a bit of cleaning up
to prepare for plotting.&lt;/p&gt;

&lt;div class=&quot;language-r highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;aggpidat&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;fludat&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; 
    &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;epi_week&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;196700&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;epi_week&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;197100&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;event&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;==&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;DEATHS&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;disease&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;==&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;PNEUMONIA AND INFLUENZA&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mutate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MMWRweek2Date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;as.numeric&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;substr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;epi_week&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)),&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;as.numeric&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;substr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;epi_week&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))))&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; 
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Again, I tried to focus on cities where there were higher counts and
more weeks with observed data.&lt;/p&gt;

&lt;div class=&quot;language-r highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;big_pi_cities&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;aggpidat&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; 
    &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;group_by&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;city_state&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;summarize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tot_count&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;number&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tot_count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;2000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;city_state&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;

&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cities_with_1968&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;aggpidat&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; 
    &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;epi_week&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;196800&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;epi_week&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;197000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;group_by&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;city_state&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;summarize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tot_records&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tot_records&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;60&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&amp;gt;%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;city_state&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And, here’s the plot:&lt;/p&gt;

&lt;div class=&quot;language-r highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;ggplot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;aggpidat&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;city_state&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%in%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;big_pi_cities&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;city_state&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%in%&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cities_with_1968&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;aes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;number&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;geom_line&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;facet_grid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;city_state&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;~&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;scales&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;free_y&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;/images/blog/20180917-p-and-i-deaths-city.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;what-is-there-to-learn-from-pandemic-data&quot;&gt;What is there to learn from pandemic data?&lt;/h3&gt;

&lt;p&gt;These few plots tell us just a little bit about these historical
pandemics. In 1918, the best data from Tycho seemed to be about
pneumonia, but even then, as is evident from the city-level data, there
are lots of gaps in the observations. This is not as evident in the
overall aggregate data shown in the first plot above the fold. In 1918,
the data clearly show a first peak in late 1918 and then a second peak
in early 1920, although most cities are missing data for almost all of 1919. 
In the 1968 pandemic, there seems to be a small first wave (or
maybe just a larger seasonal wave) in the 1967/1968 season followed by a
larger peak right around December 1968 and January 1969. But this
analogy to the flu pandemic is dependent on the assumption that reported
pneumonia deaths serve as a reasonable proxy for incident flu cases.&lt;/p&gt;

&lt;p&gt;Certainly there must be more and better data out there on influenza
pandemics. For the 2009 N1H1 pandemic, we could retrieve data from 2009
from the &lt;a href=&quot;https://gis.cdc.gov/grasp/fluview/fluportaldashboard.html&quot;&gt;CDC
FluView&lt;/a&gt;
application (which has a nice interface from R in the &lt;a href=&quot;https://github.com/hrbrmstr/cdcfluview&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cdcfluview&lt;/code&gt;
package&lt;/a&gt;) But I’m sure there is
more data on these older pandemics as well out there. It might require
heading into the depths of library stacks, or poring through academic
papers, or finding other less well-established data repositories. But
having this data systematically organized in one place would be useful,
and not all that hard, at least to get it started.&lt;/p&gt;

&lt;p&gt;Once again, this is just one very small piece of a huge puzzle of
pandemic response. Clearly assembling some historical data isn’t going
to prevent the next pandemic, nor will it directly inform
countermeasures or interventions that are part of the response.
Especially given how difficult it is to generalize from these
once-in-a-generation type experiences, it is unclear how directly
applicable these data are to a hypothetical emerging outbreak. That
said, we have precious little real data about pandemics to go on. And
there’s no reason to drive blind without having better and more
accessible versions of what could be a valuable source of information
about dynamics of pandemic second waves, spatial variation in outbreaks,
and more.&lt;/p&gt;
</description>
        <pubDate>Mon, 17 Sep 2018 00:00:00 +0000</pubDate>
        <link>http://reichlab.io//2018/09/17/pandemic-flu-data.html</link>
        <guid isPermaLink="true">http://reichlab.io//2018/09/17/pandemic-flu-data.html</guid>
      </item>
    
      <item>
        <title>Building a collaborative ensemble to forecast influenza</title>
        <description>&lt;p&gt;In March 2017, a group of influenza forecasters who have participated in the &lt;a href=&quot;https://predict.phiresearchlab.org&quot;&gt;CDC FluSight challenge&lt;/a&gt; in past seasons established the FluSight Network, a multi-institution and multi-disciplinary consortium of forecasting teams. This group worked throughout 2017 to create a public, real-time collaborative ensemble forecasting model that provides &lt;a href=&quot;http://flusightnetwork.io/&quot;&gt;updated forecasts of influenza in the US each week&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://flusightnetwork.io&quot;&gt;&lt;img class=&quot;img-responsive&quot; width=&quot;700&quot; src=&quot;/images/blog/collaborative-ensemble-overview.png&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;

&lt;h3 id=&quot;background&quot;&gt;Background&lt;/h3&gt;
&lt;p&gt;Every flu season since 2013/2014, the CDC has organized a challenge where, each week from early November through mid April, teams submit forecasts about the flu season in the US.
For every week in a season, each submission contains forecasts for seven targets of public health interest specified by the CDC for each of the 11 HHS regions. The region-level targets are: the fraction of doctor’s visits where due to influenza-like-illness in each of the next four weeks of the season, the week of season onset, the week in which the peak wILI occurs, and the level of the peak wILI.
The forecasts themselves are text files containing, in a specified format, data that encode predictive distribution for these targets.&lt;/p&gt;

&lt;p&gt;Throughout this project, the central question has been &lt;strong&gt;can we provide better information to decision makers by combining forecasting models&lt;/strong&gt;, and specifically by using past performance of the component models to inform the ensemble approach. All previous participants in FluSight challenges were invited to join the FluSight Network. Four groups decided to participate and contributed &lt;a href=&quot;https://github.com/FluSightNetwork/cdc-flusight-ensemble/tree/master/model-forecasts&quot;&gt;21 models in total&lt;/a&gt; using a diverse array of methodologies, including kernel conditional density estimation, Bayesian state-space models, simple seasonal models, auto-regressive models for time-series, and susceptible-infectious-recovered-susceptible compartmental models, to name just a few.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Institution&lt;/th&gt;
      &lt;th&gt;No. of models&lt;/th&gt;
      &lt;th&gt;Team leaders&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Delphi team at Carnegie Mellon&lt;/td&gt;
      &lt;td&gt;9&lt;/td&gt;
      &lt;td&gt;Logan Brooks, Roni Rosenfeld&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Columbia University&lt;/td&gt;
      &lt;td&gt;7&lt;/td&gt;
      &lt;td&gt;Teresa Yamana, Sasikiran Kandula, Jeff Shaman&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Los Alamos National Laboratory&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;Dave Osthus&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Reich Lab at UMass-Amherst&lt;/td&gt;
      &lt;td&gt;4&lt;/td&gt;
      &lt;td&gt;Nicholas Reich, Abhinav Tushar, Evan Ray&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Prior to the start of the 2017/2018 influenza season in the US (first submissions were due on November 6), we assembled these 21 distinct forecasting models for influenza, each with forecasts from the last seven influenza seasons in the US. (To the extent possible, these forecasts were only allowed to use data available at the time the forecasts were made.)&lt;/p&gt;

&lt;p&gt;Subsequently, we conducted a cross-validation study to compare five different methods for combining these models into a single ensemble forecast. Specifically, this was done by leaving one season out at a time, fitting each ensemble model based on the remaining seasons’ data, and generating ensemble forecasts for each week of the left-out season. Then, we evaluated and compared the performance of the ensemble models.&lt;/p&gt;

&lt;h3 id=&quot;ensemble-specifications&quot;&gt;Ensemble specifications&lt;/h3&gt;
&lt;p&gt;All of our ensemble models are built by taking weighted averages of the component models. We examined the performance of five different possible ensemble specifications (see table below). The “equal weights” model takes a simple average of all of the models, with no consideration of past performance. The other four approaches estimated weights for models based on past performance, using the degenerate EM algorithm.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Model&lt;/th&gt;
      &lt;th&gt;No. of weights&lt;/th&gt;
      &lt;th&gt;description&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Equal weights (EW)&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;Every model gets same weight.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Constant weights (CW)&lt;/td&gt;
      &lt;td&gt;21&lt;/td&gt;
      &lt;td&gt;Every model gets a single weight, not necessarily the same.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Target-type-based weights (TTW)&lt;/td&gt;
      &lt;td&gt;42&lt;/td&gt;
      &lt;td&gt;Two sets of weights, one for seasonal targets and one for weekly wILI targets.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Target-based weights (TW)&lt;/td&gt;
      &lt;td&gt;147&lt;/td&gt;
      &lt;td&gt;Seven sets of weights, one for each target separately.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Target-and-region-based weights (TRW)&lt;/td&gt;
      &lt;td&gt;1,617&lt;/td&gt;
      &lt;td&gt;Target-based weights estimated separately for each region.&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 id=&quot;forecast-evaluation&quot;&gt;Forecast Evaluation&lt;/h3&gt;
&lt;p&gt;We measured performance by computing the average score across all targets and all relevant weeks in the last seven seasons. (Including only “relevant weeks” means, e.g., that for evaluating season onset we exclude weeks after the onset has clearly occured, because at this point, the forecasts are no longer informative.) The ensemble models generally showed better average scores than any of the component models, and there was little difference between the CW, TTW, TW, and TRW models.&lt;/p&gt;

&lt;p&gt;For submitting in real-time in 2017-2018, we selected the ensemble model that achieved the best overall score in the cross-validation experiment over the last seven seasons. This was the target-type-based model (TTW) that assigned one set of weights to each component model for the weekly incidence targets and another set of weights for the seasonal targets (onset timing, peak timing, and peak incidence).&lt;/p&gt;

&lt;p&gt;&lt;img class=&quot;img-responsive&quot; width=&quot;700&quot; src=&quot;/images/blog/collaborative-ensemble-comparison.jpeg&quot; /&gt;&lt;/p&gt;

</description>
        <pubDate>Tue, 28 Nov 2017 00:00:00 +0000</pubDate>
        <link>http://reichlab.io//2017/11/28/flusight-ensemble.html</link>
        <guid isPermaLink="true">http://reichlab.io//2017/11/28/flusight-ensemble.html</guid>
      </item>
    
      <item>
        <title>Slides for Ensemble Talk at MIDAS</title>
        <description>&lt;p&gt;Here are the slides for my presentation today at the annual MIDAS conference in Atlanta, GA. The talk summarizes recent work led by post-doc Evan Ray on creating interpretable “feature-weighted density ensembles” for infectious disease forecasting. The paper is currently under review, but &lt;a href=&quot;https://arxiv.org/abs/1703.10936&quot;&gt;the preprint is available on arXiv&lt;/a&gt;. Check out the 2017-2018 real-time influenza forecasts from this model available on our &lt;a href=&quot;http://reichlab.io/flusight/&quot;&gt;flusight app&lt;/a&gt;. And here are some slices of the feature-dependent weighting functions for predicting peak incidence for influenza in the U.S.&lt;/p&gt;

&lt;p&gt;&lt;img class=&quot;img-responsive&quot; width=&quot;700&quot; src=&quot;/images/blog/ensemble-model-weights.png&quot; /&gt;&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Here is the full slide deck for the talk:
&lt;script async=&quot;&quot; class=&quot;speakerdeck-embed&quot; data-id=&quot;5c7b7fc721c44d4db80dae250db0a704&quot; data-ratio=&quot;1.33333333333333&quot; src=&quot;//speakerdeck.com/assets/embed.js&quot;&gt;&lt;/script&gt;&lt;/p&gt;
</description>
        <pubDate>Wed, 24 May 2017 00:00:00 +0000</pubDate>
        <link>http://reichlab.io//2017/05/24/MIDAS-slides.html</link>
        <guid isPermaLink="true">http://reichlab.io//2017/05/24/MIDAS-slides.html</guid>
      </item>
    
      <item>
        <title>Machine Learning and Clinical Decision Making</title>
        <description>&lt;p&gt;I wrote a response to &lt;a href=&quot;http://www.newyorker.com/magazine/2017/04/03/ai-versus-md&quot;&gt;Siddhartha Mukherjee’s article “A.I. vs. M.D.”&lt;/a&gt; that appeared in the &lt;em&gt;New Yorker&lt;/em&gt; last month. While I submitted it as a letter to the editor, they didn’t publish it. In retrospect, perhaps it was a bit long-winded for their curt and pithy letters section. Mukherjee’s article was published on the heels of Evan submitting &lt;a href=&quot;https://arxiv.org/abs/1703.10936&quot;&gt;his latest work on improving the consistency of infectious disease prediction using interpretable model averaging methods&lt;/a&gt;. What follows is the letter I submitted.&lt;/p&gt;

&lt;!--more--&gt;

&lt;blockquote&gt;
  &lt;p&gt;Siddhartha Mukherjee nicely characterizes both sides in a looming paradigm shift in clinical practice: how to incorporate the promises of “big data” into decision-making and diagnoses that impact real people (“A.I. vs. M.D.”, April 3rd). In doing so, he exposes a rift not just between machine learners (who cling to their data and models) and clinicians (who often trust in anecdote and experience), but a rift in the machine learning community itself.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
  &lt;p&gt;Many “deep learning” algorithms are, as Mukherjee describes, black boxes whose promise to “replace dermatologists and radiologists”  would understandably rankle even very forward-thinking diagnosticians. When life-and-death decisions are being made, it is a lot to ask anyone to trust the black box.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
  &lt;p&gt;However, not all predictive models are or need to be opaque to the experts they are designed to aid. Many predictive models (for example, certain “model averaging” approaches) can provide more information to clinicians and patients about how they work, and why they succeed and fail.  Development and investigation of transparent approaches can and should be emphasized over other more opaque approaches that thwart interpretability.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
  &lt;p&gt;While data-driven approaches deserve to make their way into clinical and public-health decision-making, the benefits for patients will only be maximized if clinicians, biostatisticians, and computer scientists partner closely. To succeed, future efforts must focus on enhancing existing intuitive, interpretable predictive models and finding ways to peel back the layers of complexity of others. With transparency will come trust, and improved clinical care.&lt;/p&gt;
&lt;/blockquote&gt;
</description>
        <pubDate>Sun, 14 May 2017 00:00:00 +0000</pubDate>
        <link>http://reichlab.io//2017/05/14/ai-vs-md.html</link>
        <guid isPermaLink="true">http://reichlab.io//2017/05/14/ai-vs-md.html</guid>
      </item>
    
      <item>
        <title>U.S. Influenza Forecast updates (Nov 29 edition)</title>
        <description>&lt;p&gt;We updated our &lt;a href=&quot;https://reichlab.github.io/flusight/&quot;&gt;U.S. influenza forecasts&lt;/a&gt; on Tuesday, November 29th. (We tend to update the forecasts on Mondays, but the CDC data release was delayed this week due to Thanksgiving last week.) Overall, the data and the short-term forecasts for flu are showing regional circulation of flu that is a bit below the CDC-defined baseline levels. The two exceptions are in HHS Region 2 (NY and NJ) which is right at its baseline level, according to the most recent data from the CDC (reported through November 19th), and HHS Region 4 (the southeastern corner of the US) which already has risen above its baseline. Region 4 has historically had somewhat earlier seasons than the rest of the US. Check out our interactive &lt;a href=&quot;https://reichlab.github.io/flusight/&quot;&gt;FluSight app&lt;/a&gt; for more details on each region.&lt;/p&gt;

&lt;p&gt;Reported U.S. regional influenza incidence in Nov 13-19 (MMWR week 46), 2016. Colors show percent above or below baseline:
&lt;a href=&quot;https://reichlab.github.io/flusight/&quot;&gt;
    &lt;img class=&quot;img-responsive&quot; width=&quot;700&quot; src=&quot;/images/blog/20161130-us-flu-map.png&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Last week, I gave a quick &lt;a href=&quot;https://reichlab.github.io/2016/11/23/introducing-flusight.html&quot;&gt;under-the-hood look at how we make our forecasts&lt;/a&gt;. This week, I’m going to interpret more directly our forecasts for this week.&lt;/p&gt;

&lt;p&gt;Our model is currently combiing predictions from three component models to deliver a single forecast. Depending on the region, it is combining the predictions with different weights. One thing that we’ve noticed is that the seasonal auto-regressive integrated moving average (SARIMA) model is a bit more “jumpy”, or willing to predict a rapid increase than the other models. This year, it has shown mixed performance. For example, last week it predicted upticks in Regions 2 and 4 that were almost exactly right. However, it also predicted upticks in Regions 3 and 6 that were wrong.&lt;/p&gt;

&lt;p&gt;These are early assessments of prediction accuracy, and they may change, as the CDC often does update its data over the course of a few subsequent weeks. See for example this example of data reported in week 52 of 2015 (dark line with dots in the figure below) that was subsequently adjusted down (solid green line):
&lt;a href=&quot;https://reichlab.github.io/flusight/&quot;&gt;
    &lt;img class=&quot;img-responsive&quot; width=&quot;700&quot; src=&quot;/images/blog/20161130-backfill-issue.png&quot; /&gt;
&lt;/a&gt;
This “backfill” issue (adjustment of reported data in weeks following the initial report) is not something that we have yet accounted for in our forecasts, although it is on the short-list of issues to add to our development model that will be udpated throughout the season.&lt;/p&gt;

&lt;p&gt;In most regions, our ensemble model is sticking closely to the &lt;a href=&quot;https://github.com/reichlab/article-disease-pred-with-kcde/raw/master/inst/article/infectious-disease-prediction-with-kcde.pdf&quot;&gt;KCDE model&lt;/a&gt; for its weekly forecasts. The figure below, showing our current forecasts for Region 4, highlights how the red predictions (ensemble model) overlay the blue predictions (KCDE) almost exactly. For now, the ensemble is resisting the urge to follow the urgency of the green SARIMA model or the conservatism of the orange model which is pulling the trajectory back towards a seasonal average:
&lt;a href=&quot;https://reichlab.github.io/flusight/&quot;&gt;
    &lt;img class=&quot;img-responsive&quot; width=&quot;700&quot; src=&quot;/images/blog/20161130-region4-forecast.png&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Stay tuned for periodic updates throughout the season!&lt;/p&gt;
</description>
        <pubDate>Wed, 30 Nov 2016 00:00:00 +0000</pubDate>
        <link>http://reichlab.io//2016/11/30/flu-forecasts.html</link>
        <guid isPermaLink="true">http://reichlab.io//2016/11/30/flu-forecasts.html</guid>
      </item>
    
      <item>
        <title>Under the hood of our real-time flu predictions</title>
        <description>&lt;p&gt;For the second year in a row, the Reich Lab is participating in the &lt;a href=&quot;https://predict.phiresearchlab.org/post/57f3f440123b0f563ece2576&quot;&gt;CDC FluSight challenge&lt;/a&gt;, a project where teams from around the country submit real-time predictions of influenza to the CDC. The teams use a variety of different models and methods to generate these predictions, from &lt;a href=&quot;http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004382&quot;&gt;an empirical Bayes method that uses Google search data&lt;/a&gt; to &lt;a href=&quot;http://www.nature.com/articles/ncomms3837&quot;&gt;a extended Kalman-filter method that uses humidity data&lt;/a&gt; to &lt;a href=&quot;https://github.com/reichlab/article-disease-pred-with-kcde/raw/master/inst/article/infectious-disease-prediction-with-kcde.pdf&quot;&gt;our kernel conditional density estimation method using recent incidence&lt;/a&gt;, and there are many others!&lt;/p&gt;

&lt;p&gt;This year, we – well, mostly &lt;a href=&quot;https://github.com/elray1&quot;&gt;Evan&lt;/a&gt; – have developed a new ensemble method that combines predictions from different models. We – mostly &lt;a href=&quot;https://github.com/lepisma&quot;&gt;Abhinav&lt;/a&gt; – also created a visualizer for our predictions. Check it out &lt;a href=&quot;https://reichlab.github.io/flusight/&quot;&gt;here&lt;/a&gt;! It’s still early in the season, and we’re not seeing much data to suggest that this will be an unusually high or low year, but that’s largely because there just isn’t much information in the early-season data.
In this post, I’m going to give you a quick tour under the hood of our ensemble forecasting methodology. At some point, we’ll have an article up on GitHub or arXiv, but for now, this explanation will have to suffice.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://reichlab.github.io/flusight/&quot;&gt;
    &lt;img class=&quot;img-responsive&quot; width=&quot;700&quot; src=&quot;/images/blog/flusight-wide.png&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;details-of-the-challenge&quot;&gt;Details of the challenge&lt;/h2&gt;
&lt;p&gt;We call our team the Kernel of Truth (left over from our KCDE methods last year, although we hope the name still is appropriate). The contest is based on predicting a measure of influenza incidence that represents the percentage of all doctor’s visits that are for influenza-like-illness (ILI), weighted by population. The measure is called “weighted ILI” and its units are percentage points. Per contest rules, all submissions have to submit full predictive distributions each week from November through April for seven different targets of interest, for each of the HHS regions in the U.S. (and for the country as a whole). Here are the targets&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;incidence for each of the next four weeks&lt;/li&gt;
  &lt;li&gt;onset week: the first week of the first sequence of three weeks to be above the regional CDC baseline for weighted ILI&lt;/li&gt;
  &lt;li&gt;peak week: the week in which peak incidence will occur&lt;/li&gt;
  &lt;li&gt;peak incidence: the actual value of weighted ILI at the peak week&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;component-models&quot;&gt;Component models&lt;/h2&gt;
&lt;p&gt;For our submissions this year, we obtain the final predictive distributions as a weighted average of predictions from three component models:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;A “fixed” model using either Kernel Density Estimation (for onset week, peak week, and peak incidence), or a generalized additive model (for predictions of incidence at horizons 1 to 4 weeks).  The raw predictions from this model do not change as new data are observed over the course of the season (though the predictions for incidence in individual weeks do depend on the week being predicted), and can be interpreted as a representation of “everything that we have seen in the past”.  A separate model fit is obtained for each region. In week 3 of the competition (Nov 21, 2016), we modified this method to truncate the predictive distributions for onset timing, peak timing, and peak incidence so that values that have been eliminated by previously observed incidence are assigned low probability. Currently, this is done in a very ad hoc manner.&lt;/li&gt;
  &lt;li&gt;A model combining Kernel Conditional Density Estimation (KCDE) and copulas. this method is described in more detail &lt;a href=&quot;https://github.com/reichlab/article-disease-pred-with-kcde/raw/master/inst/article/infectious-disease-prediction-with-kcde.pdf&quot;&gt;here&lt;/a&gt;. In brief, KCDE is used to obtain separate predictive densities for each future week in the season.  In order to predict seasonal quantities (onset, peak timing, and peak incidence), we use a copula to model dependence among those individual predicitive densities, thereby obtaining a joint predicitive density for incidence in all future weeks.  Predicitive densities for the seasonal quantities can be obtained as appropriate integrals of this joint density.  A separate model fit is obtained for each region.&lt;/li&gt;
  &lt;li&gt;A seasonal auto-regressive integrated moving average (SARIMA) model. This model is fit to seasonally differenced log(weighted_ili) using a stepwise procedure to select the model specification. A separate model fit is obtained for each region.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;ensemble-model&quot;&gt;Ensemble model&lt;/h2&gt;
&lt;p&gt;The final predictions are obtained as a linear combination of the predictions from these component models using a method known as “stacking” or model averaging.  The model weights depend on the week of the season in which the predictions are made. There is a lot of gnarly math and computation that we’re leaving out here, but if you’d like to see it let us know in the comments section and post some more details.  We estimate the weights via gradient tree boosting, optimizing leave-one-season-out crossvalidated log scores (using the definition of log scores specified for this competition). Currently we are using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;xgboost&lt;/code&gt; package in R to implement this, although there have been some rumblings about moving to another method, as this one is giving us some problems when the curvature of our loss function is negative. I’ll spare you the details for now.&lt;/p&gt;

&lt;p&gt;We are submitting two variations on the ensemble model:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;KoTstable is a stable version, in which we will hold fixed all details and model fits for the component models as well as the model weights throughout the season.  Because this model will not be updated, we will be able to learn about model performance over the course of the season. These are the predictions currently shown on the &lt;a href=&quot;https://reichlab.github.io/flusight/&quot;&gt;FluSight visualizer&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;KoTdev is a development version, which we will update over the course of the season.  We have plans for tweaks to all three of the existing component models, the addition of new component models, and changes to computation of the model weights.  This model provides a sandbox for development of new features and continuous improvement of our prediction methodology.  In the first submission week, the predictions from KoTstable and KoTdev were&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;/h2&gt;
&lt;p&gt;There is a lot more work to do on this to get it to where we want to it be, but one of the “advantages” of these challenges is that they force you to get stuff out there and just try it out. Some of the things that we are thinking about doing are improving the estimation methodology for the weights (including perhaps some kind of smoothing or regularization of the weights), adding a more mechanistic model that incorporates some biological features of flu, and incorporating the uncertainty in recent observations (as you can see in the app, there can be adjustments to reported cases, especially in the most recently reported weeks). So, there’s lots to do, and we’re hopefully just getting started.&lt;/p&gt;
</description>
        <pubDate>Wed, 23 Nov 2016 00:00:00 +0000</pubDate>
        <link>http://reichlab.io//2016/11/23/introducing-flusight.html</link>
        <guid isPermaLink="true">http://reichlab.io//2016/11/23/introducing-flusight.html</guid>
      </item>
    
      <item>
        <title>Using the DELPHI API to access infectious disease data</title>
        <description>&lt;p&gt;This week I attended a workshop at the CDC about last year’s &lt;a href=&quot;https://predict.phiresearchlab.org/flu/index.html&quot;&gt;FluSight challenge&lt;/a&gt;, a competition that scores weekly real-time predictions about the course of the influenza season. They are planning another round this year and are hoping to increase the number of teams particiating. Stay tuned to &lt;a href=&quot;https://predict.phiresearchlab.org/flu/index.html&quot;&gt;this site&lt;/a&gt; for more info.&lt;/p&gt;

&lt;p&gt;At the workshop, I learned about &lt;a href=&quot;http://delphi.midas.cs.cmu.edu/&quot;&gt;DELPHI’s&lt;/a&gt; real-time epidemiological &lt;a href=&quot;https://github.com/undefx/delphi-epidata&quot;&gt;data API&lt;/a&gt;. The API is linked to various data sources on influenza and dengue, including US CDC flu data, Google Flu Trends, and Wikipedia data. There is &lt;a href=&quot;https://github.com/undefx/delphi-epidata#the-api&quot;&gt;some documentation&lt;/a&gt; and &lt;a href=&quot;https://github.com/undefx/delphi-epidata#code-samples&quot;&gt;minimal examples&lt;/a&gt;, and this post documents a more robust and complete example for using the API via R. I’ll note that the CDC’s influenza data, can also be accessed via the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cdcfluview&lt;/code&gt; R package, which I’m not going to discuss here and I will focus here on accessing some of the other data sources. Here’s a teaser of this data that you can also interactively explore on the &lt;a href=&quot;http://delphi.midas.cs.cmu.edu/epivis/epivis.html&quot;&gt;DELPHI EpiVis website&lt;/a&gt;:
&lt;img class=&quot;img-responsive&quot; width=&quot;600&quot; src=&quot;/images/blog/epivis.png&quot; /&gt;&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;setup&quot;&gt;Setup&lt;/h2&gt;

&lt;p&gt;Let’s start by loading the R script containing the relevant methods needed to access the API.&lt;/p&gt;

&lt;div class=&quot;language-r highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;source&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;https://raw.githubusercontent.com/undefx/delphi-epidata/master/code/delphi_epidata.R&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then, load in two packages that we will use to tidy and plot the data.&lt;/p&gt;

&lt;div class=&quot;language-r highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;library&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MMWRweek&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;library&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ggplot2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;dengue-data-from-taiwan-cdc&quot;&gt;Dengue data from Taiwan CDC&lt;/h2&gt;

&lt;p&gt;Here is some code that pulls data from &lt;a href=&quot;http://nidss.cdc.gov.tw/en/&quot;&gt;Taiwan’s NIDSS&lt;/a&gt;, specifically asking for nationwide data, and data from the central region. A complete list of &lt;a href=&quot;https://github.com/undefx/delphi-epidata/blob/master/labels/nidss_regions.txt&quot;&gt;regions&lt;/a&gt; and &lt;a href=&quot;https://github.com/undefx/delphi-epidata/blob/master/labels/nidss_locations.txt&quot;&gt;locations&lt;/a&gt; are available. Also, I’ve specified a range of weeks from the first week of 2010 (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;201001&lt;/code&gt;) to the last week of 2016 (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;201653&lt;/code&gt;).&lt;/p&gt;

&lt;div class=&quot;language-r highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;res&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Epidata&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nidss.dengue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;locations&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;nationwide&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;central&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; 
                            &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;epiweeks&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Epidata&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;201001&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;m&quot;&gt;201653&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)))&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The above command should pull data down into your current session, but it will be a little bit ‘list-y’, so here is some code I wrote to clean it up and make it a bit more of a workable dataset in R.&lt;/p&gt;

&lt;div class=&quot;language-r highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data.frame&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;matrix&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unlist&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;res&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;epidata&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nrow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;length&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;res&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;epidata&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;byrow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;colnames&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;names&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;res&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;epidata&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[[&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]])[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;is.null&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;res&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;epidata&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[[&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]])]&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;as.numeric&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;as.character&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;year&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;as.numeric&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;substr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;epiweek&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;m&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;m&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;week&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;as.numeric&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;substr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;epiweek&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;m&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;m&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MMWRweek2Date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MMWRyear&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;year&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MMWRweek&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;week&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Note the use of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MMWRweek2Date()&lt;/code&gt; function that gives us a date column in our data frame. And here is a plot of the resulting data.&lt;/p&gt;

&lt;div class=&quot;language-r highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;ggplot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;aes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;color&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;location&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;geom_point&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;scale_y_log10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;/images/blog/nidss-data.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;wikipedia-data&quot;&gt;Wikipedia data&lt;/h2&gt;

&lt;p&gt;Let’s try loading some of the Wikipedia data on influenza and other related terms. The article. I think this reflects the number of hits on pages of certain articles, although I’m not sure.&lt;/p&gt;

&lt;div class=&quot;language-r highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;res&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Epidata&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;wiki&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;articles&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;influenza&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;common_cold&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;cough&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
                    &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;epiweeks&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Epidata&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;201101&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;m&quot;&gt;201553&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)))&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data.frame&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;matrix&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unlist&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;res&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;epidata&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nrow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;length&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;res&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;epidata&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;byrow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;colnames&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;names&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;res&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;epidata&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[[&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]])[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;is.null&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;res&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;epidata&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[[&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]])]&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;as.numeric&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;as.character&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;year&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;as.numeric&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;substr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;epiweek&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;m&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;m&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;week&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;as.numeric&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;substr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;epiweek&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;m&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;m&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MMWRweek2Date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MMWRyear&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;year&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MMWRweek&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;week&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ggplot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;aes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;color&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;article&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; 
  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;geom_point&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; 
  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;geom_smooth&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;span&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;.1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;se&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;FALSE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;/images/blog/wiki-data.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Happy data exploring!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;UPDATE&lt;/strong&gt;: (2 Sept 2016) Roni Rosenfeld, the head of the DELPHI group at CMU, pointed out and asked me to mention that David Farrow was the force behind the creation of the epidata API and the epivis tool.&lt;/p&gt;
</description>
        <pubDate>Thu, 01 Sep 2016 00:00:00 +0000</pubDate>
        <link>http://reichlab.io//2016/09/01/epidata-api.html</link>
        <guid isPermaLink="true">http://reichlab.io//2016/09/01/epidata-api.html</guid>
      </item>
    
  </channel>
</rss>
