Week 4: CPLN 5920-MUSA 5080
2026-02-10
Part 1. Why Spatial Data Analysis Matters
Part 2. Spatial Data Fundamentals
Part 3. Spatial Operations
Part 4. Geometry Operations
Part 5. Coordinate Systems
Part 6. Putting It All Together
You’ve been working with:
Missing piece: WHERE are these patterns occurring?
This week: Add the spatial dimension to understand:
“Which communities have the lowest income?”
→ Are they clustered? Isolated? Near resources?
“Where should we locate a new health clinic?”
→ Optimize access for underserved populations
“How do school districts compare?”
→ Account for geographic boundaries and spillovers
“Is there environmental justice concern?”
→ Do pollution sources cluster near vulnerable communities?
Geographic bias in algorithms:
Examples:
Your role: Understand spatial patterns to design fairer systems
Real world → Simplified geometric representations
Three basic types:
Each feature has:
Why sf?
Key principle: Spatial data is just data frame (or tibble) + geometry column
Common spatial data formats:
Examples in corresponding code
Extract features based on spatial relationships
examples in code
Key functions: st_filter(), st_intersects(), st_touches(), st_within()
The .predicate tells st_filter() what spatial relationship to look for:
Basic structure:
st_filter(data_to_filter, reference_geometry, .predicate = relationship)
Different questions need different relationships:
st_touchesst_withinst_intersectsDefault: If no .predicate specified, uses st_intersects
For county neighbors, why st_touches not st_intersects?
example in code
st_intersects includes the reference feature itself!
| Predicate | Definition | Policy Use Case |
|---|---|---|
st_intersects() |
Any overlap at all | “Counties affected by flooding” |
st_touches() |
Share boundary, no interior overlap | “Neighboring counties” |
st_within() |
Completely inside | “Schools within district boundaries” |
st_contains() |
Completely contains | “Districts containing hospitals” |
st_overlaps() |
Partial overlap | “Overlapping service areas” |
st_disjoint() |
No spatial relationship | “Counties separate from urban areas” |
Most common: st_intersects() (any overlap) and st_touches() (neighbors)
Combine datasets based on spatial relationships
examples in codes
examples in codes
examples in codes
Important: Units depend on coordinate reference system!
The dot (.) is a placeholder that represents the data being passed through the pipe (%>%).
pa_counties <- pa_counties %>% mutate( area_sqkm = as.numeric(st_area(.)) / 1000000 )
The . refers to pa_counties - the data frame being passed through the pipe. So this is equivalent to:
pa_counties <- pa_counties %>% mutate( area_sqkm = as.numeric(st_area(pa_counties)) / 1000000 )
Create zones around features
`{r buffers} # 10km buffer around all hospitals hospital_buffers <- hospitals %>% st_buffer(dist = 10000) # 10,000 meters
hospital_buffers <- hospitals %>% mutate( buffer_size = case_when( type == “Major Medical Center” ~ 15000, type == “Community Hospital” ~ 10000, type == “Clinic” ~ 5000 ) ) %>% st_buffer(dist = .$buffer_size) `
Policy application: Service accessibility zones, environmental impact areas
Find overlapping areas
The Key Difference st_filter() with predicates: Selects complete features (keeps or removes entire rows)
st_intersection() and st_union(): Modifies geometries (creates new shapes)
Use st_filter() when:
Use st_intersection() when:
Combine multiple features into one
Summarize data across spatial boundaries
The Earth is round, maps are flat
Problems:
Example: Measuring areas in latitude/longitude gives wrong answers
Geographic Coordinate Systems (GCS):
Projected Coordinate Systems (PCS):
WGS84 (EPSG:4326)
Web Mercator (EPSG:3857)
State Plane / UTM zones
Albers Equal Area
To simply check current CRS st_crs(pa_counties)
To set CRS (ONLY if missing) pa_counties <- st_set_crs(pa_counties, 4326)
Transform to different CRS
Pennsylvania South State Plane (good for PA analysis)
pa_counties_projected <- pa_counties %>% st_transform(crs = 3365)
Transform to Albers Equal Area (good for area calculations) pa_counties_albers <- pa_counties %>% st_transform(crs = 5070) `
Transform when:
Typical spatial analysis steps:
Research question: Which communities have poor access to healthcare?
Step 1: Load data
Step 2: Transform to local projection
counties <- st_transform(counties, 3365) hospitals <- st_transform(hospitals, 3365)
Step 3: Calculate access
hospital_buffers <- hospitals %>% st_buffer(dist = 25000) # 25km access zone
county_access <- counties %>% mutate( has_hospital = st_intersects(., st_union(hospital_buffers), sparse = FALSE), distance_to_nearest = as.numeric(st_distance(st_centroid(.), hospitals)) )
Step 4: Join with demographic data and analyze