Making Choropleth Maps for COVID-19 in Stata
In a previous post I covered how to create a choropleth map in Stata using GIS mapping ESRI shapefiles. If you are interested in visually tracking the current SARS-Cov2 pandemic, you can use publicly available GIS mapping shapefiles and virus infection data to map the spread of the pandemic. In this post I go through how to map the spread in New Zealand and Australia. To make these maps I need ESRI shapefiles, which I have obtained for Australia through the Australian Bureau of Statistics, and for New Zealand through Stats NZ. You will need to create an account with Stats NZ to access their geographic boundary files.
COVID19 Infections in New Zealand
First up is New Zealand. The New Zealand government is recording all infections by District Health Board. To effectively map the infections across New Zealand, I am going to need GIS mapping data of the District Health Boards. I have downloaded these ESRI shapefiles from Stats NZ, and I have saved them in the folder "NZ", which is located in a folder called "Maps" in my documents folder. The database file (.dbf) and shapefile (.shp) are both called "district-health-board-2015". I am going to change my current working directory to the NZ folder in Maps, as this makes my analysis easier. If you are using and saving files outside your current working directory you will need to specify the full file path where I use only filenames below. To set up the map data in Stata I use the spshape2dta command as follows:
This gives the following output, as expected:
Note: this will save the two files in your current working directory, not the directory the database and shapefiles are saved in. You can move them, however make sure both files are always together in the same folder.
I will test the map data to make sure it works. In the command pane:
The base map has been correctly imported. Now I need to get the COVID19 data for New Zealand. This data is updated daily from the New Zealand Health Ministry. I am using data that was last updated on 15 April at 9am (GMT +12). For this analysis, the easiest way to get the data into Stata is to simply copy-paste it. I highlight the relevant table, right-click, and select copy. I then open Stata and clear the previous dataset, then open the Data Editor in edit mode, right-click the top-left cell and click paste. It will ask if the top row is variable names or data, I select "Variable Names" and the data is correctly pasted into Stata. I then save this data as covidata_nz_200415.dta.
Now I need to merge this data with my District Health Board map data, however I first need to alter the names of two boards in my COVID19 dataset. They have been pasted with the correct spelling, however the map data does not have the accents on some of the names. These need to be removed so I can merge the datasets. I manually edit these and then re-save my COVID19 dataset.
I am now going to merge the two datasets. In the command pane:
Our map has been drawn correctly, however the legend is currently overlapping the bottom part of New Zealand. Let's move it. In the command pane:
This is much better. If you find that you need to move the legend, a ring value of greater than zero will place the legend outside the confines of the map.
This map is showing infection count data. It does not take into account the different population sizes of the different districts. If we assume that districts with a higher population would also have a higher number of returning travelers, then districts with a bigger population would naturally have a higher number of cases. To account for this, I am going create a new variable that calculates the percent of the population that has contracted the virus per District Health Board. To do this I will need the population data, which is easily obtainable through wikipedia. To get the population data into Stata I use the online tool https://wikitable2csv.ggor.de/. This creates a download link with the population table in a .csv file. I am now going to import that data and attach it to my COVID19 data to generate a new map. In the command pane:
There are a small number of changes from the raw count data. The two larger districts at the bottom of the South Island (DHB name Southern) and top of the North Island (DHB name Waikato) are coloured the same. This means even proportionally these two districts have the highest number of cases compared to all other districts. The colours of two districts at the top of South Island have flipped, with the top district (DHB name Nelson Marlborough) moving up to the top bracket and the top right district (DHB name Canterbury) moving down to the second-top bracket. There are a few other changes, but no District Health Board moves up or down more than one bracket.
COVID19 Infections in Australia
The Australian Government is tracking infections by state. This data is available and updated daily through the Australian Department of Health, however in this case the infection numbers are more easily imported into Stata by using the raw data from the Johns Hopkins University COVID19 Dashboard. The GIS mapping files for Australia with state boundaries can be found through the Australian Bureau of Statistics. I previously downloaded mapping data for Australia with state boundaries and imported it into Stata for this Choropleth Maps blog post. This data is saved in the AUS folder in Maps in my documents folder. I will now create the COVID19 maps for Australia. In the command pane:
As previously discussed in the Making Choropleth Maps blog, the ACT needs to be drawn twice otherwise it is coloured over by NSW and you don't get an accurate representation in the map. However, to get this map to work you need to know which colour-bracket the ACT fits into. Fortunately there are 4 colour brackets by default and 8 state/territory regions, so to find out which bracket the ACT fits into simply sort by your colour variable (confirmed in this case). You can then list that variable in the Results pane and the observation number for the ACT will tell you which of the 4 brackets the ACT fits into (e.g. observation 1 or 2 is bracket 1, observation 3 or 4 is bracket 2, etc.). In this case the ACT is in the lowest bracket, and so I specify it's colour as navy*0.2 which is the lowest of the four brackets as specified in the fcolor() option. This generates the following map:
Here we see that New South Wales and Victoria have the two highest number of confirmed cases. However, as with New Zealand, this is only the raw count data. Let's break it down by population. I already have Australian population data by state, so all I have to do is merge that data together with my map and COVID19 data. In the command pane:
This shows much greater differences compared to what we saw in New Zealand. Victoria was in the top bracket for total number of cases, however when you look at the proportion of the population that is infected they drop all the way to the lowest bracket. Conversely, Tasmania was in the second-lowest bracket for total number of cases, but is now in the highest bracket for their proportion of population infected. Both Western Australia and Queensland move down a bracket when you look at their number proportionally, and South Australia moves up a bracket. The Australian Capital Territory moves from the lowest bracket for total number of cases, to the second-highest bracket for proportion of population infected. New South Wales and the Northern Territory stay the same, in the highest and lowest brackets respectively.