Geospatial analysis in Stata: Mapping multiple variables

Updated: May 14



Above is a map showing the population density of suburbs in ACT, combined with ACT public schools location and size (FTE students), with different point sizes representing different school sizes. Today I'm going to show you how to create the above graph in Stata.


In order to draw the map, we need various datasets. All the datasets are downloadable from ACT government open data portal. I have prepared the below list showing you what datasets that are required and where you can download these:

1. ACT map shape file: https://data.gov.au/dataset/ds-dga-0257a9da-b558-4d86-a987-535c775cf8d8/details. Please choose ACT LOCALITY POLYGON shp GDA2020.zip(SHP) to download.

2. ACT population data: https://www.data.act.gov.au/Health/Estimated-resident-population-ACT/f9ny-mif2. You will need to modify the .csv file to make sure that it contains the variables suburbs and population that we will need to merge with the ACT map file later. Here, I have prepared the ACT population.csv file for your convenience, please feel free to use this one if needed.

ACT population
.xlsx
Download XLSX • 24KB

3. ACT public school location data:

1) Primary school: https://actmapi-actgov.opendata.arcgis.com/datasets/schools-government-primary?geometry=148.348%2C-35.504%2C149.823%2C-35.112

2) High school: https://actmapi-actgov.opendata.arcgis.com/datasets/schools-government-secondary

3) College: https://actmapi-actgov.opendata.arcgis.com/datasets/schools-government-college?geometry=148.357%2C-35.496%2C149.832%2C-35.103

NOTE

a) We need to download both Spreadsheet and Shapefile for the school location data.

b) The spreadsheet that is downloaded contain X and Y columns, they represent the location of the school. However, they won't be recognised by Stata. Therefore we need to convert the location data to the coordinator data (Latitude and longitude) that can be used to draw a map in Stata. The coordinator converter website I used is:

https://www.gps-coordinates.net/gps-coordinates-converter

4. ACT public school size ( FTE student): https://www.education.act.gov.au/about-us/policies-and-publications/publications_a-z/census Choose Census - Public Schools. Then download word version of Census of ACT Public Schools August 2020. Then copy the data from Table 10 and paste it into Excel, this will be the data that we need later to create the .dta file in Stata. Here, I have already prepared the ACT schools size.csv for you to use. Please feel free to download it:

ACT schools size
.xlsx
Download XLSX • 23KB

So now we have all the data ready to draw the map! Let's start the journey.


First, I will change my current working directory and set up the map data in Stata. The file I use is the first map shape file I downloaded:

cd "C:\Users\KuaiKuaiWang\Documents\actmap"
spshape2dta"C:\Users\KuaiKuaiWang\Downloads\actmap\ACT_LOCALITY_POLYGON_SHP"

Here, I used the spshape2dta command to create the _shp.dta file in Stata. The original map data I downloaded online is called ACT_LOCALITY_POLYGON. Please note that two Stata datasets: ACT_LOCALITY_POLYGON_SHP.dta and ACT_LOCALITY_POLYGON_SHP_shp.dta are created. And they are both saved in my current working directory

(C:/Users/KuaiKuaiWang/Documents/actmap). Please make sure that both files are always saved in the same folder. The file that we are going to use to create the map is ACT_LOCALITY_POLYGON_SHP.dta, which is the regular dataset.


Now I need to convert the population .csv file to .dta file with the variables that we need for this analysis: suburbs and population. To do this, I select the columns SUBURBS and POIPULATION in .csv file that I previously saved, right-click, and select copy. I then open Stata and clear the previous dataset, then open the Data Editor in Edit mode, right-click the top-left cell and click paste. It will ask if the top row is variable names or data, I select "Variable Names" and the data is correctly pasted into Stata. And then I saved this as ACT population.dta in Stata:

save "C:\Users\KuaiKuaiWang\Documents\actmap\ACT population"

Now I can merge the map and the population file in Stata. I first open the map file I created in the first step, then I change the variable name which represents suburbs to SUBURBS. This is to make sure that the variable's name is the same as the one in ACT population.dta file, since we will use SRUBURBS to merge the two dataset later on. You can also modify the variable's name in population file to be the same as the one in the map file. Either way, please make sure that the variable that will be used to merge has the same name in two files. The codes for the above are:

clear
use "C:\Users\KuaiKuaiWang\Documents\actmap\ACT_LOCALITY_POLYGON_SHP.dta", clear

rename NAME SUBURBS
label variable SUBURBS "SUBURBS"

So now I have everything ready to merge the two dataset, I will merge the two dataset using variable SUBURBS:

merge 1:1 SUBURBS using "C:\Users\KuaiKuaiWang\Documents\actmap\ACT population"

We can then draw a map of ACT with population density in different suburbs:

grmap POPULATION, fcolor(Greens)

Here, you can use fcolor() to specify any colour you prefer. Stata gives me the graph below:


I then saved this new merged dataset as ACTmapwithPopulation.dta:

save "C:\Users\KuaiKuaiWang\Documents\actmap\ACTmapwithPopulation.dta", clear

We are now one step closer to our final map. Now I will get the schools location file ready. Before that, I have converted the location data in the original spreadsheet to coordinators (Longitude and Latitude) which can be read in Stata. Similarly to what I did to create the file for ACT map, we will first need to set up the school location map file in Stata:

clear
cd "C:\Users\KuaiKuaiWang\Documents\primary school"
spshape2dta "C:\Users\KuaiKuaiWang\Documents\primary school\Schools_-_Government%2C_Primary"

Please note that the file I downloaded from the website is called Schools_-_Government%2C_Primary.cpg, I kept the original name. So now I have created two .dta dataset, the one I used was Schools_-_Government%2C_Primary.dta.


Now I will merge the secondary and high school dataset with the primary dataset. Before merging, please make sure that you have converted the location columns in the other two spreadsheets.


To merge the three dataset together, simply copy the columns you need (suburbs and coordinators you create) in the spreadsheet and paste them in Stata together with the primary school dataset. You will need to copy and past data from one spreadsheet at one step. Please note, you might need to drop some observations and change the column's name. After this. I have saved the new merged dataset as schools.dta.


Now with the ACT map with population file and the school location file, I can draw a map of population density with points on the map representing school locations:

clear
use "C:\Users\KuaiKuaiWang\Documents\actmap\ACTmapwithPopulation.dta", clear

grmap POPULATION, fcolor(Greens) legend(position(4)) point(data("C:\Users\KuaiKuaiWang\Documents\actmap\schools.dta") xcoord(_CX) ycoord(_CY) size(small small small) by(type) fcolor(cyan blue red) legenda(on))


In the above map, I have created 3 different points representing 3 types of public schools - College, High and Primary. You can do this by using the point() option. Noticed that in the bracket, the file I used was the school location file schools.dta, so I basically tell Stata to draw points showing locations of public schools in ACT. You can control the points' size by using size() option. And again, you can use fcolor() to specify the colours of the points.


Our final step is to change the size of the points based on the size of the schools. I have created ACT schools size .csv based on the data from Table 10 in the Census of ACT Public Schools August 2020 word file I previously downloaded. Now I copy the relevant columns (School Name and Student FTE) in this spreadsheet and paste them into Stata under Edit mode. Then I save this as schoolsize.dta. Now I can merge this file with schools.dta. This will give me a new file that combines school size with location. Please note that in order to merge these two files, we will need to make sure the variable that we use to merge in the two files has the the same name. Here, I have changed School Name in schoolsize.dta to SCHOOL_NAM to be consistent with the variable's name in schools.dta. In your own practice, you can specify any name for the variable, as long as the names in two files for merging purpose are the same. The codes for the above are:


rename School Name SCHOOL_NAM
label variable SCHOOL_NAM "SCHOOL_NAM"
save "C:\Users\KuaiKuaiWang\Documents\schoolsize.dta"
clear
use "C:\Users\KuaiKuaiWang\Documents\actmap\schools.dta"
merge 1:1 SCHOOL_NAM using "C:\Users\KuaiKuaiWang\Documents\actmap\schoolsize.dta"

NOTE: If you want to check if there is any unmatched observations in the merged dataset (most likely scenario is the school names don't match in the two datasets), you can use the below command to check if the school names are matching, after merging:

edit SCHOOL_NAM _merge if _merge != 3

This allows me to visualise if there are any unmatched school names, Stata will give me an output with all the unmatched school names. What I did here is to change the names in ACT schools size.csv and repeat the step below (i.e. copy and paste the relevant columns into Stata and save) to replace the saved schoolsize.dta. After changing the unmatched school names, I save ir as the new schoolsize.dta, when save it, Stata will ask if I would like to replace the previous schoolsize.dta, click yes. Then reopen the schools.dta file in Stata and repeat the above merging step. I then save the merged dataset as schools2.dta in Stata:

save "C:\Users\KuaiKuaiWang\Documents\schools2.dta", clear

Now, we have all the files ready to draw the final map:

clear
cd "C:\Users\KuaiKuaiWang\Documents\actmap"
use "C:\Users\KuaiKuaiWang\Documents\actmap\ACTmapwithPopulation.dta", clear

grmap POPULATION, fcolor(Greens) legend(position(4)) point(data("C:\Users\KuaiKuaiWang\Documents\actmap\schools2.dta") xcoord(_CX) ycoord(_CY) size(vsmall vsmall vsmall) by(type) fcolor(cyan blue red) legenda(on) proportional(schoolsize) psize(absolute)) title(ACT Public Schools Size and Location with Population Density, size(small))

Stata draws the below map for me:


Here, I used proportional() to control the point size to be proportional to the school size. Also, by using Graph Editor, I can change the opacity of the point.


To draw the final map, it requires the process of merging 6 difference datasets gradually, among which you will need to constantly modify your variable's name to keep consistency with the other dataset that you are merging with. It is a complicated process, however, once you get the original datasets ready, if you merge and save each of the file correctly, you will draw your final map successfully.


If you are interested in creating graphs in Stata, we would recommend the book Speaking Stata Graphics. If you have any questions, please feel free to contact us at sales@surveydesign.com.au. We would be happy to answer any of your questions.

573 views7 comments

Recent Posts

See All