Millions Of People Are Missing From CDC COVID Data As States Fail To Report Cases

Millions of people are missing from CDC COVID data as states fail to report cases
Click to enlarge the image. Caption John Moore/Getty Images John Moore/Getty Images

The new online Health Equity Tracker features colorful maps that show how the COVID-19 pandemic affected different races and ages in the United States. But you can see something is wrong.

A few states have become grayer, but that doesn't mean they've been immune to the pandemic.

Josh Zarrabi, a software engineer at Atlanta's Morehouse School of Medicine who recently launched the tracking portal, says that there is no data coming from Texas. This should make a lot of Americans unhappy. They should also say something like: "Wow, we need the data. We're missing a lot of the puzzle." "

It's more than a simple search for Texas jigsaw puzzle pieces.

The Centers for Disease Control and Prevention have tallied more than 39 million COVID-19 cases in the U.S.

NPR's analysis revealed that about one in five known cases, or 7 million people, are missing from the more detailed CDC data. A further problem is that about two-thirds the data available aren't useable because health care providers have marked certain fields as "unknown" and left them blank.

While most states have provided all records, a few have not. Texas, Missouri and West Virginia have all submitted less than a tenth of their cases. A handful of other states, such as Kentucky, Michigan, and Texas, have significant gaps in their data. Each state completely misses over 30% of its known cases.

While more than 3 million Texans have been diagnosed with COVID-19, only 81,000 of them are included in the data. This is not even 3%.


It is absurd. It is shameful. It is wrong," Nancy Krieger, a Harvard University social epidemiologist, says. Good data is essential for proper planning, understanding the risks and how they are changing. You need real data that is publicly accessible and easily accessible.

This data set contains standardized details from states that help the CDC monitor COVID-19's spread and evaluate demographic trends. It also helps the CDC develop health guidance for at risk groups and the entire country. These records are not always complete.

Only 1% of records are missing patient's age and sex. 36% have their race/ethnicity left blank or marked as "unknown". The CDC requested that states indicate whether patients had any of the 15 symptoms (fever, chills, or muscle aches). However, most states have not done so. While health departments can update these details when they complete case investigations, more than 90% of patients are still left with blanks or marked "unknown" for most of the fields.

Paula Yoon, director of surveillance at the CDC oversees approximately 120 infectious diseases, which includes COVID-19. Her epidemiologists use lab reports and field studies to fill in the gaps as best they can. Yoon believes their job would be easier if they had all the data.

Yoon states, "Yes, we would have a better place." It's not that the states don't share those data with us. It's not because the states don’t have these data.

Why aren't states submitting data?

There are many reasons that some states may not have submitted their data. The fact that public health has been severely underfunded for many years is one of the main reasons. This has led to a patchwork of outdated solutions for tracking disease that connects hospitals and public health departments from each county.

Many counties have their own tracking systems, which can't automatically transfer records from one county to another.

Chip Cohlmia, Jackson County Health Department manager for communicable diseases, jokes that public healthcare is keeping fax machines alive at the Kansas City Health Department. His county, like many others, has hospitals that fax records to county departments of health. Workers manually enter the data.

Cohlmia states, "It's almost like driving an old car. You have to drive the car at about 100 miles per hour." But, you don't know that you haven’t changed the oil. You haven’t checked the tires. You have the check-engine light on.

Rebecca Roesslet, Public Health Planning Supervisor in Columbia, says that they still have to manually transfer 12,000 records. This is more than half the COVID-19 cases the county has received so far. It is a tedious process of copying and pasting data points field by field.

"That's not what we prioritize. Roesslet states that right now, the priority is to contact people who have been tested positive for COVID.

The state of Texas launched a COVID-19 tracking system in May 2020. However, most large counties already had their own systems. Public health has added yet more pieces to its quilt of tracking systems that don’t automatically communicate with one another.

Janet Pichette (chief epidemiologist at Austin Public Health) says that they wanted more data than what the state was collecting. They didn't want an outside system to go down unintentionally.

"Once you're a data scientist or epidemiologist, you can become very territorial." Right? Pichette says.

The CDC cannot dictate what state health departments should do. States can't tell cities or counties what they should do.

Diana Cervantes says, "I wouldn’t touch that with 10 feet pole." Although she is currently an epidemiologist at the University of North Texas and teaches it, she managed a 49-county area with the state's health department from 2018 to 2018.

"We prefer this approach of being hands-off. Cervantes states that they don't want the state to get involved in power struggles with locals. "The state is not something they should be worried about because they aren't accountable to them.

Local health officials report directly to county leaders, such as Ellis County Judge Todd Little in Dallas. They had a full-time staff member who was trying to reconcile the county data with "unreliable" state data. But they quit in June.

"We have done a good job of reducing the spread of a suburban county. We're now ready to move on and live the freedom all Texans enjoy every day. Little states that we are ready to move on.

Are data gaps possible to patch?

It can be difficult to fill in the missing data even in an imaginary world that has all the technology and workers. Some patients might not be able to fit into the CDC's standard sex boxes or race, while others might not wish to share their personal information.

Cohlmia describes his case investigation calls with COVID-19 residents in the Kansas City region as "loud, angry, violent screaming, and that kind of thing." "There were death threats against our office. Protests were held outside.

NPR heard from several states that the pandemic exceeded their technical capabilities, overwhelmed their staff and caused them to lose their jobs. However, there are some signs that solutions are possible.

Lisa Cox, spokesperson for the Missouri Department of Health and Senior Services, says that the department is trying to find a way to eliminate duplicate records before sending data to the CDC. She anticipates this to happen by September.

According to a spokesperson for Texas' health department, the department plans to transfer case data from different counties into the CDC's system in October. The delay is attributed to Texas's decentralized public healthcare system.

There is no perfect state with COVID-19 data. Fortunately, most states, including those with decentralized public healthcare systems, have figured how to communicate what they have to the CDC. Researchers, non-profit organizations and any citizen interested in the matter can access the data to compare state effects of COVID-19.

Krieger, a Harvard epidemiologist, says that "we should have these data at the point." "If you don't have enough data, the solution is to make it public and tell everyone that it isn't good enough. Then figure out how to improve it."

California's public-health director quit last fall after a glitch in the state's tracking system caused thousands of records to be lost. California spent $15 million to hire a tech company to build a new system capable of keeping up. Within a matter of months, California's COVID-19 data began to flow back into the CDC database.

The CDC provided $200 million in COVID-19 assistance funds to help states modernize their data systems this summer. Yoon, the CDC surveillance director, said that although this money has enabled thousands of hospitals to add electronic reporting, there is still much work to be done. To keep the momentum going, it will take continued funding, skilled workers and cooperation from counties and states.

Yoon states, "We cannot do it alone." "It is not an easy task to modernize the existing system and then walk away.

Texas alone has more than 3 million cases, while Missouri, Louisiana and Mississippi have hundreds of thousands of cases.

Their to-do lists will only grow as the delta variant continues to surge.


The CDC's COVID-19 case surveillance dataset updates twice monthly. It includes all cases with a 14-day lag. NPR used the cumulative case count of each state to calculate the missing COVID-19 patient in each update from June 2020. This was done to account for the reporting delay.

This data set includes 29 variables derived from standard COVID-19 case reports forms. NPR used the COVID-19 case report forms to determine how many records were missing or unknowable for each variable.

This report uses data from the most recent update on Aug. 17.