The main goal of this analysis is to provide a visual and updated overview of the worldwide Covid-19 pandemic including:
Import the latest reports from CSSE at Johns Hopkins University. The datasets we will use are:
Date of the latest data to be gathered is "yesterday" to ensure the data is available as the files are updated daily at midnight.
We use the date variable "yesterday" to build the urls dynamically.
First look at the overall data.
For our analysis we will use the following columns:
Let's create a dataframe with these columns.
We rename the columns to the values originally used in this notebook as the column names from the sources have been changing over time.
Let's have a look at the nan values and fix them.
After having a look at those 73 rows, all nan values are in the columns Longitude and Latitude.
First we create a list with the countries without geographical coordinates and the we define a function to add these values manually.
We clean up the data and consolidate the names of countries with several variations or with a comma using the function format_country.
From the first look at the datasets, we found that South Korea is Korea,South in the original dataset and Congo is assigned for both the "Republic of the Congo" and the "Democratic Republic of the Congo" .
We add a column for the active positive cases.
Our main interest is to see how the numbers of active cases are changing.
Note: This column wasn't available in the original dataset when this notebook started to take shape.
We group the dataset by countries to have a total value per nation and list the top five countries with active cases.
Looking at the assigned coordinates, those countries with several entries in the dataset (e.g. autonomous territories within a country or countries with entries per state) may have assigned a point far from the capital city, e.g. for Denmark, the assigned coordinates were those of Greenland. We will adjust these points with the function add_coordinates.
We create a map where each country with active cases is labeled as follows:
Blue circle: less than 1000 reported active cases.
Orange circle: more than 1000 and less than 10000 reported active cases.
Red circle: more than 10000 reported active cases.
Summary data per country is shown if you click on each country's circle.
We list the fifteen countries with most active cases.
We also calculate the death rate as number of deaths over confirmed positive cases and include it in the column "Death rate [%]".
The value of death rate can be interpreted from several perspectives and it is a controversial value as each country has a different approach to count the numbers of deaths due to Covid-19, e.g. in Italy Covid-19 post-mortem tests are done while in Germany only deaths from people tested positive when alive are counted as Covid-19 deaths.
As of 28.04.20, Belgium has the highest death rate. It is to note that Belgium, unlike UK, Spain or Italy, include in the Covid-19 death toll fatalities outside hospitals as well as people suspected of having died of Covid-19.
As of 13.03.20, most of the victims Covid-19 positive in Italy were 70 years old and older 1.
According to a study by the Leverhulme Centre for Demographic Science at the University of Oxford 1 and a publication in FAZ 2, some of the main reasons for Italy's high death rate are:
In contrast, Germany has a low death rate from the top-10 list of active cases even though it was on the fourth place of active cases as of 05.04. Possible factors that may influence this number are:
Summary data per country is shown if you point on each country's block.
We check the length of the datasets for worldwide confimed, recovered and fatalities and group them by country to avoid several entries per country.
China and South Korea are added to the list for comparison reasons as they have been the two first countries to slow down and decrease the number of active cases after an outbreak.
On 26.03, US has surpassed Italy in number of active cases and China in number of total positives.
US curve of active cases has grown dramatically since mid March.
Italy started a strong growth of cases around the carnival festivities in the third week of February whereas Germany's and Spain's curves of active cases started to go up at a fast pace around two weeks after. In Germany many of the initial cases were connected to people who went for a winter holiday in northern Italy in the third week of February.
As of 05.04, the growth of the active cases curves in the top European countries is showing signs of slowing down.
Question: Why did the curve in US take momentum in such a short timeframe in comparison to Italy, Germany and Spain? Might it be related to the lack of early contigency measures and lack of initial testing?
As in many of the countries with most active cases, the most dense and international cities are the ones worst hit by the outbreak. In US, the biggest hub of cases is located in NYC.
We compare the curves above with the one from China, the country with the first reported outbreak. It is worth mentioning that the official data provided by China has been questioned by the international community.
We added also the graph from South Korea where the active cases are starting to drop since mif March.
In both countries the peak of active cases shows in the graphs around a month since the curve started to increase at a fast pace.
Both countries took strict measures to content the spread of the virus including lockdowns of hotspots, social distancing, self-isolation and closing of public places and schools. In addition, in South Korea, a massive testing campaign was done to identify and traces cases in the earlier stages and limit the spread.
As of 15.06 the first wave of covid-19 cases has peaked in many Asian and European countries and restrictions in these countries have started to be eased. However, the risk of a second wave is latent as well as the development of out of control outbreaks in African countries as it is now in Brazil.
1 Jennifer Beam Dowd, Valentina Rotondi, Liliana Andriano, David M. Brazel, Per Block, Xuejie Ding, Yan
Liu, Melinda C. Mills.
Demographic science aids in understanding the spread and fatality
rates of COVID-19. March 15, 2020
2 Andreas Rossmann. Warum ist es in Italien so schlimm? Frankfurter Allgemeine Zeiting. March 23, 2020.
3 Frauke Suhr. So oft wird auf COVID-19 getestet. May 4, 2020.
4 Corona-Infektionen (COVID-19) in Deutschland nach Altersgruppen. May 17, 2020.