Mapping Census Data in R

Felipe Valdez

2024-08-15

Getting Started

Workshop aims & objectives

  • Some R experience (in case you don’t have any)
  • Access Census Data
  • Map Census Data

Pre-requisites

To follow this demo you will need to:

  1. Install R from https://cran.rstudio.com/
  2. Install RStudio from https://posit.co/download/rstudio-desktop/

Census Data

  • Sources:
    • Decennial Census
    • American Community Survey (ACS)
  • Access:

Census Hierarchies

Census Hierarchies

Census Hierarchies 2

Census Hierarchies example

Demo

Question

  1. Where is the latino population in the State of Pennsylvania?
  2. How this population has changed in the las decade?

Following options

  1. Download the R notebook and execute the code
  2. Copy & paste the code in a new R notebook
  3. Follow the workshop website

Installing the packages

We are going to start installing the following packages

install.packages(c("tidycensus", "tidyverse", "mapview", "ggspatial", "leafsync"))

tidycensus: allows users to interface with a select number of the US Census Bureau’s data APIs and return tidyverse-ready data frames, optionally with simple feature geometry included.

tidyverse: is a collection of open-source R packages that help with data science tasks like importing, tidying, manipulating, and visualizing data

mapview: is a tool for quickly creating interactive maps of spatial data.

ggspatial: is a framework for interacting with spatial data using ggplot2 to create maps.

leafsync: is a plugin for leaflet to produce potentially synchronised small multiples of leaflet web maps wrapping Leaflet.

tidycensus

Allows users to interface with a select number of the US Census Bureau’s data APIs and return tidyverse-ready data frames, optionally with simple feature geometry included.

Essential functions

get_decennial(), which requests data from the US Decennial Census APIs for 2000, 2010, and 2020.

get_acs(), which requests data from the 1-year and 5-year American Community Survey samples. Data are available from the 1-year ACS back to 2005 and the 5-year ACS back to 2005-2009.

get_estimates(), an interface to the Population Estimates APIs. These datasets include yearly estimates of population characteristics by state, county, and metropolitan area, along with components of change demographic estimates like births, deaths, and migration rates.

get_pums(), which accesses data from the ACS Public Use Microdata Sample APIs. These samples include anonymized individual-level records from the ACS organized by household and are highly useful for many different social science analyses. get_pums() is covered in more depth in Chapters 9 and 10.

get_flows(), an interface to the ACS Migration Flows APIs. Includes information on in- and out-flows from various geographies for the 5-year ACS samples, enabling origin-destination analyses.

The American Community Survey ACS

Nationwide survey that collects and publishes information about the US population’s social, economic, housing, and demographic characteristics.

Income, jobs and occupations, educational attainment, veterans, and housing tenure.

  • 1-year estimates: Available for geographic areas with populations of 65,000 or more. These estimates are more current.
  • 5-year estimates: Available for all geographic areas, including census tracts and block groups. These estimates are more statistically reliable, especially for smaller population groups

Hands On

Getting a single variable

library(tidycensus)


Get the number of latino population B03001_003 at the county level for the state of Pennsylvania PA

latino_pop_PA <- get_acs(
  geography = "county",
  state = "PA",
  variables = "B03001_003",
  year = 2022
)
latino_pop_PA
# A tibble: 67 × 5
   GEOID NAME                           variable   estimate   moe
   <chr> <chr>                          <chr>         <dbl> <dbl>
 1 42001 Adams County, Pennsylvania     B03001_003     7688    NA
 2 42003 Allegheny County, Pennsylvania B03001_003    29272    NA
 3 42005 Armstrong County, Pennsylvania B03001_003      560    NA
 4 42007 Beaver County, Pennsylvania    B03001_003     3227    NA
 5 42009 Bedford County, Pennsylvania   B03001_003      598    NA
 6 42011 Berks County, Pennsylvania     B03001_003    99460    NA
 7 42013 Blair County, Pennsylvania     B03001_003     1721    NA
 8 42015 Bradford County, Pennsylvania  B03001_003      946    NA
 9 42017 Bucks County, Pennsylvania     B03001_003    38195    NA
10 42019 Butler County, Pennsylvania    B03001_003     3391    NA
# ℹ 57 more rows

Exploring source variables

With load_variables() you can see a list of all variables with their corresponding code. Simply set the year and source.

vars <- load_variables(2022, "acs5")

head(vars)
# A tibble: 6 × 4
  name        label                                   concept          geography
  <chr>       <chr>                                   <chr>            <chr>    
1 B01001A_001 Estimate!!Total:                        Sex by Age (Whi… tract    
2 B01001A_002 Estimate!!Total:!!Male:                 Sex by Age (Whi… tract    
3 B01001A_003 Estimate!!Total:!!Male:!!Under 5 years  Sex by Age (Whi… tract    
4 B01001A_004 Estimate!!Total:!!Male:!!5 to 9 years   Sex by Age (Whi… tract    
5 B01001A_005 Estimate!!Total:!!Male:!!10 to 14 years Sex by Age (Whi… tract    
6 B01001A_006 Estimate!!Total:!!Male:!!15 to 17 years Sex by Age (Whi… tract    

Getting a table

latino_table_PA <- get_acs(
  geography = "county",
  state = "PA",
  table = "B03001",
  year = 2022,
)
latino_table_PA
latino_table_PA <- get_acs(
  geography = "county",
  state = "PA",
  table = "B03001",
  year = 2022,
)
latino_table_PA
# A tibble: 2,077 × 5
   GEOID NAME                       variable   estimate   moe
   <chr> <chr>                      <chr>         <dbl> <dbl>
 1 42001 Adams County, Pennsylvania B03001_001   104604    NA
 2 42001 Adams County, Pennsylvania B03001_002    96916    NA
 3 42001 Adams County, Pennsylvania B03001_003     7688    NA
 4 42001 Adams County, Pennsylvania B03001_004     4419   430
 5 42001 Adams County, Pennsylvania B03001_005     1607   336
 6 42001 Adams County, Pennsylvania B03001_006      243   149
 7 42001 Adams County, Pennsylvania B03001_007      405   263
 8 42001 Adams County, Pennsylvania B03001_008      385   244
 9 42001 Adams County, Pennsylvania B03001_009        4     6
10 42001 Adams County, Pennsylvania B03001_010       91    70
# ℹ 2,067 more rows
latino_table_PA <- get_acs(
  geography = "county",
  state = "PA",
  table = "B03001",
  year = 2022,
  output = "wide"
)
latino_table_PA
# A tibble: 67 × 64
   GEOID NAME        B03001_001E B03001_001M B03001_002E B03001_002M B03001_003E
   <chr> <chr>             <dbl>       <dbl>       <dbl>       <dbl>       <dbl>
 1 42001 Adams Coun…      104604          NA       96916          NA        7688
 2 42003 Allegheny …     1245310          NA     1216038          NA       29272
 3 42005 Armstrong …       65538          NA       64978          NA         560
 4 42007 Beaver Cou…      167629          NA      164402          NA        3227
 5 42009 Bedford Co…       47613          NA       47015          NA         598
 6 42011 Berks Coun…      428483          NA      329023          NA       99460
 7 42013 Blair Coun…      122640          NA      120919          NA        1721
 8 42015 Bradford C…       60159          NA       59213          NA         946
 9 42017 Bucks Coun…      645163          NA      606968          NA       38195
10 42019 Butler Cou…      194562          NA      191171          NA        3391
# ℹ 57 more rows
# ℹ 57 more variables: B03001_003M <dbl>, B03001_004E <dbl>, B03001_004M <dbl>,
#   B03001_005E <dbl>, B03001_005M <dbl>, B03001_006E <dbl>, B03001_006M <dbl>,
#   B03001_007E <dbl>, B03001_007M <dbl>, B03001_008E <dbl>, B03001_008M <dbl>,
#   B03001_009E <dbl>, B03001_009M <dbl>, B03001_010E <dbl>, B03001_010M <dbl>,
#   B03001_011E <dbl>, B03001_011M <dbl>, B03001_012E <dbl>, B03001_012M <dbl>,
#   B03001_013E <dbl>, B03001_013M <dbl>, B03001_014E <dbl>, …
B03001 Table
Code Variable
B03001_004 Mexican
B03001_005 Puerto Rican
B03001_006 Cuban
B03001_007 Dominican
B03001_009 Costa Rican
B03001_010 Guatemalan
B03001_011 Honduran
B03001_012 Nicaraguan
B03001_013 Panamanian
B03001_014 Salvadoran
B03001_015 Other Central American
B03001_017 Argentinian
B03001_018 Bolivian
B03001_019 Chilean
B03001_020 Colombian
B03001_021 Ecuadorian
B03001_022 Paraguayan
B03001_023 Peruvian
B03001_024 Uruguayan
B03001_025 Venezuelan
B03001_026 Other South American
B03001_027 Other Hispanic or Latino

Exploring the percentage of latino population

# Simplify county names

latino_table_PA$NAME <- gsub(" County|, Pennsylvania", "", latino_table_PA$NAME)

# Calculate the percentage of latino popualtion by county

latino_table_PA$latino_percentage <- (latino_table_PA$B03001_003E / latino_table_PA$B03001_001E) * 100
Code
library("ggplot2")
ggplot(latino_table_PA, aes(x = reorder(NAME, -latino_percentage), y = latino_percentage)) +
  geom_bar(stat = "identity", fill = "darkred") +
  labs(
    x = "County",
    y = "Latino Population Percentage",
    title = "Percentage of Latino Population by County in Pennsylvania"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

What is the spatial distribution of the latino population?

Getting the census geometry

latino_table_PA <- get_acs(
  geography = "county",
  state = "PA",
  table = "B03001",
  year = 2022,
  output = "wide",
  geometry = TRUE
)
plot(latino_table_PA$geometry)

Mapping the percentage

latino_table_PA$latino_percentage <- (latino_table_PA$B03001_003E / latino_table_PA$B03001_001E) * 100

ggplot(data = latino_table_PA, aes(fill = latino_percentage)) + 
  geom_sf()

Customizing maps in ggplot

ggplot(data = latino_table_PA, aes(fill = latino_percentage)) + 
  geom_sf() + 
  scale_fill_distiller(palette = "YlOrRd", 
                       direction = 1) + 
  labs(title = "  Percentage of latino population by county in Pennsylvania, 2022",
       caption = "Data source: 2022 5-year ACS, US Census Bureau",
       fill = "Percenatge") + 
  theme_void()

Customizing maps in ggplot

Exploring other geometries, census tracts

latino_table_tracts_PA <- get_acs(
  geography = "tract",
  state = "PA",
  table = "B03001",
  year = 2022,
  output = "wide",
  geometry = TRUE
)
ggplot(data = latino_table_tracts_PA, aes(fill = latino_percentage)) + 
  geom_sf() + 
  scale_fill_distiller(palette = "YlOrRd",
                       direction = 1) + 
  labs(title = "  Percentage of latino population by tract in Pennsylvania, 2022",
       caption = "Data source: 2022 5-year ACS, US Census Bureau",
       fill = "Percentage") + 
  theme_void()

Improving the map layout

library("maps")

ggplot(data = latino_table_tracts_PA, aes(fill = latino_percentage)) + 
  geom_sf(linewidth=0.01) + 
  scale_fill_distiller(palette = "YlOrRd",
                       direction = 1) + 
  labs(title = "  Percentage of latino population by tract in Pennsylvania, 2022",
       caption = "Data source: 2022 5-year ACS, US Census Bureau",
       fill = "Percentage") + 
  borders("county","pennsylvania") +
  theme_void()

Improving the map layout

Focusing on a specific county: Lehigh County

Mapping the result

lehigh_latino$latino_percentage <- ifelse(lehigh_latino$B03001_003E>0, 100* (lehigh_latino$B03001_003E / lehigh_latino$B03001_001E),0)

ggplot(data = lehigh_latino, aes(fill = latino_percentage)) + 
  geom_sf(linewidth=0.01) + 
  scale_fill_distiller(palette = "YlOrRd",
                       direction = 1) + 
  labs(title = "  Percentage of latino population by tract in Lehigh County, 2022",
       caption = "Data source: 2022 5-year ACS, US Census Bureau",
       fill = "Percentage") + 
  theme_void()

Mapping the result

Customized interactive maps with mapgl()

mapgl

The mapgl R package allows users to create interactive maps in R using the Mapbox GL JS and MapLibre GL JS libraries:

Features Create globe visualizations, layer objects to make filled maps, circle maps, heatmaps, and 3D graphics, and customize map styles and views.

Ease of use Designed to be intuitive for R users while still offering the capabilities of the Mapbox GL JS and MapLibre GL JS libraries

Flexibility Allows for more code to be written when making maps, but also gives users more flexibility in how they design their maps

Shiny web applications Includes utilities to use Mapbox and MapLibre maps in Shiny web applications

Getting started

install.packages("mapgl")
library(mapgl)
maplibre()

Specifying the initial view

Adding census data to maplibre

pa_map <- maplibre(bounds = latino_table_PA) 

pa_map

Adding the layer

pa_map |> 
  add_fill_layer(
  id = "pa_latino",
  source = latino_table_PA,
  fill_color = interpolate(
    column = "latino_percentage",
    values = c(1, 40),
    stops = c("lightyellow", "darkorange"),
    na_color = "lightgrey"
  ),
  fill_opacity = 0.7
 ) |> 
  add_legend(
    "Percentage of Latino Population, 2022",
    values = c(1, 40),
    colors = c("lightyellow", "darkorange")
  )

Adding the layer

Adding interactivity to your map

latino_table_PA$popup <- glue::glue(
  "<strong>County: </strong>{latino_table_PA$NAME}<br><strong>Percentage: </strong>{sprintf('%.2f', latino_table_PA$latino_percentage)}%")

brewer_pal <- RColorBrewer::brewer.pal(6, "YlGnBu")

pa_map |> 
  add_fill_layer(
    id = "pa_latino",
    source = latino_table_PA,
    fill_color = step_expr(
      column = "latino_percentage",
      base = brewer_pal[1],
      stops = brewer_pal[1:6],
      values = seq(0.8, 27.2, length.out = 6),
      na_color = "white"
    ),
    fill_opacity = 0.5,
    popup = "popup",
    tooltip = "latino_percentage",
    hover_options = list(
      fill_color = "yellow",
      fill_opacity = 1
    )
  ) |> 
  add_legend(
    "Latino Population, 2022",
    values = c(
      "Less than 5%",
      "5%-10%",
      "10%-15%",
      "15%-20%",
      "20%-25%",
      "More than 25%"
    ),
    colors = brewer_pal,
    type = "categorical"
  ) |>
  add_navigation_control()

Adding interactivity to your map

Latino Population in Philadelphia, 2022