Talks from innovators around the world
Regular seminar series where partner institutions and global thought leaders can present cutting-edge research
When we speak of the next pandemic, it's no longer a question of "if" but "when". The NSF research community is uniquely placed to ensure that multidisciplinary and cutting-edge science is applied to real-time real-world situations. Join us as we talk to the scientists at the heart of these innovations - and through our special "researcher match" episodes, we hope to spark new ideas and collaborations among researchers.
This is Science before the Storm.
A special podcast series from researchers at the Biocomplexity Institute (NSSAC Division), University of Virginia on supporting COVID-19 response in the US over the past year.
Introduction to Epidemiology // CDC
Epidemiology is the “study of distribution and determinants of health-related states among specified populations and the application of that study to the control of health problems.” — A Dictionary of Epidemiology
These materials provide an overview of epidemiology investigations, methods, and data collection.
- Key concepts and terms
- Calculating rates
- Approach and methodology
- Data sources and study design
Dependencies: R (various R packages) and Stan.
This is a graduate-level Computer Science (CS) course on computational epidemiology, which is the study and development of computational techniques and tools for modeling, simulating, predicting, forecasting, surveilling, mitigating, and visualizing the spread of disease. In this course, we will use techniques from different areas of CS including algorithms, data mining, discrete-event simulations, machine learning, and network science. The course is organized into four parts: (i) Disease-spread models and analysis of disease dynamics, (ii) Inference, prediction, and forecasting problems related to disease-spread, (iii) Infection control and disease surveillance problems, (iv) Additional topics including a discussion of disease-related datasets and the use of technology for gathering contact data.
Explainable AI // Su-In Lee and Ian Covert
This course is about explainable artificial intelligence (XAI), a subfield of machine learning that provides transparency for complex models. Modern machine learning relies heavily on black-box models like tree ensembles and deep neural networks; these models provide state-of-the-art accuracy, but they make it difficult to understand the features, concepts, and data examples that drive their predictions. As a consequence, it's difficult for users, experts, and organizations to trust such models, and it's challenging to learn about the underlying processes we're modeling.
In response, some argue that we should rely on inherently interpretable models in high-stakes applications, such as medicine and consumer finance. Others advocate for post-hoc explanation tools that provide a degree of transparency even for complex models. This course explores both perspectives, and we'll discuss a wide range of tools that address different questions about how models makes predictions. We'll cover many active research areas in the field, including feature attribution, counterfactual explanations, instance explanations and human-AI collaboration.
Synthetic Pandemic Outbreak Data
Synthetic Pandemic Outbreak Data for the US-UK Privacy-Enhancing Technologies Challenge // UVA Biocomplexity Institute & Emory University
The COVID-19 pandemic has emphasized the need for a more robust disease surveillance infrastructure. However, the development of such an infrastructure runs into a double-bind. Assurances of privacy require adversarial testing against realistic systems. Likewise, it is difficult to build realistic systems without access to data. To get around this problem, we created synthetic datasets that reflect a realistic disease outbreak. This data was used as a component in the US-UK Prize Challenge on Privacy-Enhancing Technologies. In the challenge, participants were tasked with creating personalized risk forecasts of infection in a privacy-preserving manner. The challenge was put on by the U.K.’s Center for Data Ethics and Innovation (CDEI) and Innovate UK, as well as by the U.S. National Institutes of Standards and Technology (NIST), and the National Science Foundation (NSF) in cooperation with the White House Office of Science and Technology Policy (OSTP).
COVID-19 Open Data // Daily global time-series data // Google Cloud Platform
This repository attempts to assemble the largest COVID-19 epidemiological database in addition to a powerful set of expansive covariates. It includes open, publicly sourced, licensed data relating to demographics, economy, epidemiology, geography, health, hospitalizations, mobility, government response, weather, and more. Moreover, the dataset merges daily time-series, >20,000 global sources, at a fine spatial resolution, using a consistent set of region keys.
COVID Data Tracker is CDC's home for COVID-19 data. COVID Data Tracker combines data from across the COVID-19 response in one location. Data is grouped by category to make it easier to find the data you need. The statistics bar at the top brings you the latest data on cases, vaccines, and deaths at-a-glance. Each category offers data visualizations, and crosslinks to relevant pages, and many offer data downloads. New categories of data are added regularly, and most data are updated every day.
COVID-19 open-access data and computational resources are being provided by federal agencies, including NIH, public consortia, and private entities. These resources are freely available to researchers, and this page will be updated as more information becomes available.
HHS Protect Public Data Hub // US Department of Health & Human Services
The whole-of-America response to the COVID-19 pandemic demands data sharing in near-real time. At HHS, four principles drive this work: transparency, sharing, privacy, and security. This site provides information on the current state of the American health care system. HHS recognizes the importance of providing high-quality, accessible, and timely information for entrepreneurs, researchers, and policymakers to help drive insights and better health outcomes for all. On this site, you can explore data visualizations on hospitalizations, testing, therapeutics, and more.
COVID-19 Data in the United States // NY Times
This is an ongoing repository of data on coronavirus cases and deaths in the U.S. released by the New York Times. Find visualizations of the data here.
The COVID Tracking Project // The Atlantic Monthly Group
The COVID Tracking Project is a volunteer organization launched from The Atlantic and dedicated to collecting and publishing the data required to understand the COVID-19 outbreak in the United States. The data collected included testing and outcomes, race and ethnicity, long-term care, vaccine metadata, and city data. As of March 7, 2021, they are no longer collecting new data. Find the US federal COVID-19 data sources most comparable to the data compiled by The COVID Tracking Project here.
COVID-19 Community // Reference data
This project is a community effort to build a Neo4j Knowledge Graph (KG) that integrates heterogeneous biomedical and environmental datasets to help researchers analyze the interplay between host, pathogen, the environment, and COVID-19.
COVID-19 GIS Hub // ESRI
ESRI provides maps, datasets, applications, and more for COVID-19.
COVID-19 Data Repository // CSSE, Johns Hopkins University
This is the data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). Also, Supported by ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab (JHU APL).
COVID Information Commons // Northeast Big Data Innovation Hub, Midwest Big Data Innovation Hub, South Big Data Innovation Hub, & West Big Data Innovation Hub
The COVID Information Commons (CIC) is an open website to facilitate knowledge-sharing and collaboration across various COVID research efforts, funded by the NSF Convergence Accelerator and the NSF Technology, Innovation and Partnerships Directorate. They have collected COVID-19 datasets spanning the globe.
Coronavirus (Covid-19) Data in the Russian Federation // Polymatica
The dataset contains the latest available public data on COVID-19 in the regions of the Russian Federation, including a daily situation update in cases and deaths. The file is in xlsx format and can be used in Microsoft Excel, Polymatica, and other analytical systems.
Non-pharmaceutical Intervention Data
Virginia County-level NPIs // Biocomplexity Institute, UVA
The dataset contains non-pharmaceutical interventions (NPIs) against COVID-19 from counties and independent cities in Virginia. NPIs are methods for reducing the spread of a disease that do not involve vaccines or drug treatments. Specifically, this dataset focuses on dates when closures or mandates were implemented or lifted in the following five categories: masks, businesses, pre-K-12 schools, colleges, and religious organizations.
US COVID-19 State and County Policy Orders // U.S. Department of Health & Human Services
This data is a manually curated dataset that provides a standardized view into state and county policy orders (executive orders, ordinances, etc.) from the following sources:
- BU COVID-19 State Policy Database - Raifman J, Nocka K, Jones D, Bor J, Lipson S, Jay J, and Chan P. (2020). "COVID-19 US state policy database."
- wikidata - Stay At Home Policies Queried from WikiData
- Manual curation by a dedicated group of Virtual Student Federal Service Interns - Summer 2020: T Adler, J Bastian, L Beckett, M Cohen, K Honey, C Kennedy, E Nudell
NCBI Virus is a resource designed to support the retrieval, display, and analysis of a curated collection of virus sequences and large sequence datasets. The SARS-CoV-2 Data Hub allows researchers to search, retrieve, and analyze SARS-CoV-2 sequences and other content in the NCBI Virus SARS-CoV-2 Data Hub Interactive Dashboard. Find more NCBI Resources here.
Real-time tracking of pathogen evolution // Nextstrain
Nextstrain is an open-source project to harness the scientific and public health potential of pathogen genome data. We are incorporating SARS-CoV-2 genomes as soon as they are shared and providing analyses and situation reports. In addition, we have developed a number of resources and tools, and are facilitating independent groups to run their own analyses.
API for Internet Archive TV News chyrons // Third Eye
TV cable news channels display chryons on the "lower thirds" of screens, to display breaking news and other highlights. Using the Internet Archive TV News, TV architect Tracey Jaquith built the Third Eye to scan the lower parts of the screen and apply OCR, or optical character recognition, to turn the words into text. Third Eye captures four TV cable news channels: BBC News, CNN, Fox News, and MSNBC.
Coronavirus (COVID-19) Tweets Dataset // IEEE DataPort
This dataset includes CSV files that contain IDs and sentiment scores of the tweets related to the COVID-19 pandemic. The real-time Twitter feed is monitored for coronavirus-related tweets using 90+ different keywords and hashtags that are commonly used while referencing the pandemic. The oldest tweets in this dataset date back to October 01, 2019. This dataset has been wholly re-designed on March 20, 2020, to comply with the content redistribution policy set by Twitter. Twitter's policy restricts the sharing of Twitter data other than IDs; therefore, only the tweet IDs are released through this dataset. You need to hydrate the tweet IDs in order to get complete data. For detailed instructions on the hydration of tweet IDs, please read this article.
CORD-19 // COVID-19 Open Research Dataset // Semantic Scholar team at the Allen Institute for AI
CORD-19 is a corpus of academic papers about COVID-19 and related coronavirus research. It's curated and maintained by the Semantic Scholar team at the Allen Institute for AI to support text mining and NLP research. Please read our paper for an in-depth description of how it was created. The final version of CORD-19 was released on June 2, 2022.
Documenting COVID-19 // History Lab & MuckRock & Columbia's Brown Institute for Media Innovation
Documenting COVID-19 is a repository of searchable documents related to the COVID-19 pandemic obtained through state open-records laws and the Freedom of Information Act.
LitCOVID // Curated literature hub // NCBI - NLM - NIH
A literature hub for tracking up-to-date scientific information about the 2019 novel Coronavirus. LitCovid is the most comprehensive resource on the subject, providing central access to 281,104 (and growing) relevant articles in PubMed. The articles are updated daily and are further categorized by different research topics (e.g. transmission) and geographic locations.
COVID-19 Open Research Map // Interactively explore research output // Kenedict Innovation Analytics
This map allows you to interactively explore research output related to COVID-19, coronavirus, and SARS-CoV-2 published since January 1, 2020. Each colored data point is a published document. Documents are connected when they use similar words and terminology in their abstracts.
Knowledge Hub: Coronavirus // Frontiers
The Frontiers Coronavirus Knowledge Hub provides an up-to-date source of trusted information and analysis on COVID-19 and coronaviruses, including the latest research articles, information, and commentary from our world-class scientific community.
COVID-19 Surveillance Dashboard // Biocomplexity Institute, UVA
In an effort to support the planning and response efforts for the recent Coronavirus pandemic, the Network Systems Science and Advanced Computing (NSSAC) division of the Biocomplexity Institute and Initiative at the University of Virginia has prepared a visualization tool that provides a unique way of examining data curated by different data sources.
COVID-19 Clinical Cases // Figure 1
This resource library is a compilation of clinical knowledge and first-hand experiences of COVID shared in real-time by healthcare professionals.
SPIKE-Search over cord19 // Allen Institute for AI
This is a tool for performing an extractive search, using various query modes. It allows a level of query expressivity and control that is substantially more powerful than existing search solutions. Learn more in our blog post.
This tool searches over the COVID-19 Open Research Dataset (CORD-19), a free resource of over 50K scholarly articles about COVID-19 and related coronaviruses provided by AI2's Semantic Scholar project (last update: 12/16/2020).
SciSight // Semantic Scholar team at the Allen Institute for AI
SciSight is a tool for exploring the evolving network of science in the COVID-19 Open Research Dataset, from Semantic Scholar at the Allen Institute for AI. Its goal is to help accelerate scientific research, with tools to visualize the emerging literature network around COVID-19. Use the exploratory search tools to find out what groups are working on what directions, see how biomedical concepts interact and evolve over time, and discover new connections.
SciFact // CORD-19 Claim Verification // Allen Institute for AI
Due to the rapid growth in the scientific literature, there is a need for automated systems to assist researchers and the public in assessing the veracity of scientific claims. To facilitate the development of systems for this task, we introduce SciFact, a dataset of 1.4K expert-written claims, paired with evidence-containing abstracts annotated with veracity labels and rationales.
Live Visualizations for Public Use // Datawrapper
Covering the coronavirus is a challenge. We’d like to help. Here are more than 20 charts, maps and tables that show the latest coronavirus numbers. You can embed any of them on your own website. Since we know that lots of you use this blog post to actually inform yourselves, you can find visualizations on top.
COVID-19 Resources // University of Virginia Library
A guide for updates, information, and scholarly content about the coronavirus outbreak.
COVID-19 Medical Resource Demand Dashboard - US National // Biocomplexity Institute, UVA
One major concern as the pandemic got underway was whether hospitals could handle the influx of COVID-19 patients. The US Medical Resource Demand Dashboard was developed to allow public health officials to identify where and when hospitalizations are likely to peak. In an effort to support the planning and response efforts for the recent Coronavirus outbreak, the Network Systems Science and Advanced Computing (NSSAC) division of the Biocomplexity Institute and Initiative at the University of Virginia has prepared a visualization tool that couples epidemic simulations produced by NSSAC with hospital resource counts to project when different Virginia Hospital Preparedness Program (VA HPP) regions (also called Virginia Hospital Alerting & Status System (VHASS) regions) might hit the crisis stage due to the COVID-19 pandemic.