Data Literacy

Looking beyond daily updates on the number of COVID-19 cases

By April 11, 2020 No Comments
graph on coronavirus

This blog is about data. Specifically, what we do with it, and what we should think about to ensure that we make sense of it and interpreting in the context of the bigger picture.

COVID-19 data is topical at the moment, and we use a specific dataset to illustrate key points about interpreting data.

The most commonly heard data-based story on COVID-19

Most people are following updates on COVID-19 cases closely. We often find ourselves discussing the increase in the number of cases and deaths when the statistics are updated. These discussions usually involve comparisons between countries.

For example, based on the first few columns in the table below, the story could be as follows:

“America now has the highest COVID-19 infection rate, with a total of 435,160 cases. Other countries with more cases than China are Spain (148,220 cases), Italy (139,422 cases), Germany (113,296 cases) and France (112,950 cases).  

China is the country with the sixth-highest number of cases (81,865).”

And the story could continue as follows:

“The highest number of deaths are recorded in Italy (17,669), followed by the USA (14,797) and Spain (14,792).”

We can also compare death rates with infection rates, and so on.

Numbers give us some sense of security because they give the impression that we can pin-point something – that we can know exactly what the situation is. However, if we do not contextualise data, it may not give an accurate account of a situation at all.

Regarding the COVID-19 data, we also need to look at other factors. In the case of COVID-19 data, key questions would be:

  • What is the proportion of cases in relation to the population of the country?
  • What is the number of people actually tested?
  • What proportion of the population has been tested?

There may be other stories too

Once we start asking these questions, the story changes, and we gain new insights. For example:

“Germany and France have almost the same number of COVID-19 cases (respectively 113,296 and 112,950), but the incidence per 1million of the population is slightly higher for France (1,730) than for Germany (1,352), and the number of deaths per 1million of the population for France is markedly higher (167) than for Germany (28).”

Similarly, we would be able to say the following:

“While the actual number of COVID-19 deaths in the USA (14,797) is almost the same as in Spain (14,792), the death rate per 1million of the population is much higher in Spain (316) than in the USA (45).”

What is important about this, is that if we look at China, which we previously saw as the country with the 6th highest infection rate, we can see now that their infection rate and death rate per 1 million of the population was very low, at respectively 57 and 2.

Surely this would significantly change the way we interpret the data?

worldometer graph

(Accessed on 9 April 2020 online at https://www.worldometers.info/coronavirus/#news)

Seeing the numbers in the context of the bigger picture

While the actual numbers are important, and while every life matters, a more accurate picture of the scale of the epidemic per country will be obtained if we sort the columns in this table according to statistics that show the proportion of cases in relation to 1 million of the population.

In the table below, we can see that countries that do not even feature in headline reports on the epidemic, actually have the highest incidence of cases in relation to 1 million of the population. At rates much higher than the countries which are hotspots, merely because of the sheer number of cases.

worldometer 2

(Accessed on 9 April 2020 online at https://www.worldometers.info/coronavirus/#news)

This table tells a different story:

microstates

Credit: https://en.wikipedia.org/wiki/European_microstates

“The number of COVID-19 cases in relation to 1 million of the population in three of Europe’s six micro-states are staggeringly high. Vatican City has 9,988 cases per 1million of the population; San Marino 9,077 and Andorra 7,300).”

According to this analysis Spain and Italy have respectively the 8th and 10th highest number of cases per 1million of the population. In this comparison China is not near to featuring on the list of countries with the highest number of cases.

Other contextual factors

An analysis of death rates per 1 million of the population will show similar results. Another factor that needs to be taken into account in interpreting the numbers is the trajectory of the disease. For example, we know that the peak has been reached in China, and that Italy is slightly ahead of Spain and the USA.

This will influence how we interpret the currently reported incidence and death rates. We could also inform our analysis by keeping an eye on the number of new cases reported daily (see the first table). And if this is steadily or exponentially increasing. We will be able to anticipate in what direction the numbers will move in the next few days.

Another factor that could influence the numbers is testing. We should consider the following questions: “Do countries conduct the same tests?” and “What is the reliability level of the tests?”

An issue that is coming up in the media is whether all COVID-19 deaths are actually reported as such. This may vary from country to country, and may also link up with the extent to which testing is done. If more people are tested, reporting of COVID-related deaths is likely to be more accurate.

It should also be considered if social distancing measures are or have been implemented in a country, and if so, what the nature of such measures are or were.

A shifting picture

It is clear that the COVID-19 statistics on any given day are merely a snapshot in time, which should be considered within the context of many factors. It tells only part of a story that is still unfolding.

It is also important to note that different countries are at different stages in the disease. This means that while numbers may give us a sense of “knowing exactly”, and even a sense of security, we have to engage with numbers critically to understand them better.

South African numbers

Compared to other countries, South Africa still has relatively low numbers (see the table below).

table of data

(Source: DWC. Derived from data from https://www.worldometers.info/coronavirus/#news)

south african map

 

When interpreting provincial statistics in South Africa, it is necessary to at least consider the number of cases per province in relation to the size of the population in that province.

data from south africa

 (Source: DWC. Derived from data from https://en.wikipedia.org/wiki/2020_coronavirus_pandemic_in_South_Africa        and https://en.wikipedia.org/wiki/List_of_South_African_provinces_by_population)

As on 9 April 2020, the percentage of confirmed cases in Gauteng and the Western Cape is higher than the percentage of the proportion of the population resident in those provinces. Confirmed cases in KZN and Free State are on par with the percentage of the population resident there.

And in the other provinces, which typically have large rural populations, the incidence is significantly lower than the percentage of the population resident in those provinces. This picture will probably change as testing in communities is rolled out.

Improving data literacy

The examples we have used here relates to the current situation with COVID 19, but the principles mentioned here apply to other datasets and contexts as well. We encourage our readers to engage critically with data, to discover the multiple stories that can be told by a single dataset.

Even though we are engaging with numbers, which often seem to tell a very precise story, in reality, the truth is often more nuanced than what we see at face value.

Postscript

We have used COVID-19 data to illustrate some points regarding data literacy. We do want to emphasise that we are acutely aware that these numbers are not only cold facts, but that they relate to real people, real hardship and untold sadness of those who have lost loved ones in this time.

To express our sympathy with the loss of many families in this time, and to emphasise that we know that numbers on their own cannot fathom the depth of people’s lived experiences, we would like to share this poem with you.

By Fia van Rensburg

untitled poem