Create 1,500 maps, in 20 lines of R code, in less than 5 minutes
Creating beautiful maps quickly in R has become increasingly easy with the advent of R’s package tmap. In this article, I will show you how to read in CDBG data with an API, use R’s tigris package to determine spatial boundaries, and write a 20-line function and loop pattern in R that will automate the process of generating clean maps for tract-level data in cities across the US.
I spent my first seven years of life in St. Clair Shores, MI so let’s start there with an example.
We begin by selecting the various libraries that we need for this project. Tidyverse is simply a go-to package since I always seem to be managing data. All of the other libraries are useful for working with spatial data.
My data has already been prepped for map creation. In subsequent posts, I will demonstrate how to do this. For now, one of my datasets contains the Census tract level geometry layers (See the tigris package to download the geometry layers) with the state, place, and Low-to-Moderate Income (LMI) share.
The second dataset I have includes the parcel-level CDBG investments downloaded from the API set up by the U.S. Department of Housing and Urban Development found here (https://www.hudexchange.info/programs/idis/). While the original HUD data includes Lat/Lon coordinates (latitude and longitude), one can use the sf package in R to map it onto the place.
The tigris package allows one to extract the tract and place level data for the United States. The command st_set_crs sets the coordinate system in the sf package (https://r-spatial.github.io/sf/reference/st_crs.html).
Thus, my second dataset contains the state, place, CDBG amount (“ACTV_FUNDING_AMT”), and the geometric point data corresponding to the original Lat/Lon from the API call in HUD.
One last prep step to get to the function. Because Census tracts are nested within places (cities), I want a unique listing of all the cities in my sample. So, I concatenate my tract-level data and my CDBG (point-level) data by PLACEFP (city FIPS) and by STATEFP (state FIPS). Then, I make them distinct and put them into a list that I will use in a loop later.
# Now we turn to the mapping function.
First, one needs to lay out the structure of the function carefully. I start by calling naming my function (mapFunction) and recognizing that I want it to have three inputs. These inputs correspond to the tract-level data, the CDBG data, and the unique cities code (PlaceUnique). I prefill it with the datasets that I will use (e.g. tractsCDBG, CDBG) and one of the PlaceUnique codes (c=3605606). The first five numbers are the place FIPS and the last two are the state FIPS. When I call this function, if I don’t change anything in the call then it will produce a map for that particular city.
The next step is that I want it to pull in my datasets and call the unique places of interest, because later I will loop through these.
Next, I use the tmap package to create my shape boundaries (tm_shape). For the first map I want the shapes to be polygons (look like Census tracts), so I use tm_polygons. I color (col) them based on the level of need in the community (e.g. what I call LMI_LQ), use a color palette, identify that I want five different colors to represent the scale of the continuous variable (LMI_LQ), and title it LMI LQ. This is the share of families that are Low-to-Moderate Income (LMI) in the Census tract to all families that are LMI in the city. Finally, I make sure that the borders to the Census tracts are apparent with tm_borders.
Next, I want the point-level data to overlay these polygons. The points should be based on the CDBG expenditures. I want the amount of CDBG investment to increase the size of the point. I also want there to be 5 different sizes for different levels of investment. Thus, I use tm_dots and size the investment of CDBG with the size command. At this point I simply add them together into one visual (literally as easy as adding them in R).
Because I am going to automate this for 1,700 cities in my sample I need to save it to my directory rather than view each map individually. But, consider that some of the cities have the same name. Also, consider that I want to use the name of the city when saving it to my directory for easy identification. The as.name function allows me to pull the city's name (UJURIS in my dataset) and the state name.
Finally, I use tmap_animation to save the map (z), and name it with the code filename. I add the name, and the second iteration of my save so I am not overwriting earlier data I need, the state, and the extension of the image (this is a .gif file). I ensure that the data is returned and I close the function.
Putting this all together, our function looks like this. While I wrapped the text for visual purposes, this is really about 10 lines of code.
Right now, the function can be called as is.
# Runs the function on the input data
At this stage though, I want to automate the process for all of my map creation. I did start by inputting individual values and realized I did not want to do that 1,700 times. I could use one of the apply (sapply, vapply, etc.) functions in R, but I want to avoid potential errors if a map hits a snag in the process. As a result, I use a for loop to automate the process for all cities from the unique cities list described above and I add a try statement to ensure that if there are errors the process keeps running.
After running this list, the files immediately populated my directory as anticipated. You can see that each was created in a matter of seconds.
When I switch to image view within that directory, I see that all of the maps were produced as expected.
Because I am interested in understanding whether CDBG investments reach those places of greatest need, let me select two maps to display. The first map is from the City of Roswell, GA which has done a nice job investing in the places with the highest need.
I contrast this with another map from the city of Lynwood, CA whose investment decisions seem to avoid those areas with the greatest need. Of course, there are other reasons for these investments; however, now I have the maps to be able to understand the spatial distribution of these decisions across all of my cities.
Going forward, I will show how you can pull the data from APIs and spatialize your data so that you can automate an entire project of this nature in a single R script without downloading any data.
For now, in less than 20 lines of code, we wrote a function that automated the creation of more than 1500 maps. Total computer processing time was about 3 minutes.