Version 1.0 | First Created Sep 15, 2023 | Updated Sep 20, 2023

Keywords: Multiple Ring Buffer, Indicators, Transit Oriented Development, Gentrification, Boston

Introduction

In city planning, transit-oriented development (TOD) is a type of urban development practice that aims to maximize the amount of residential, business, and leisure space within walking distance of public transport. TOD offers a range of benefits that all contribute to creating a more vibrant, sustainable, and inclusive communities. For example, a community with a strong and dependable transit system and streetscaping elements can discourage vehicle dependence and congestion. This further leads to reduction in greenhouse gas emissions per capita and a smaller carbon footprint. In addition, as cities become more walkable, TOD can stimulate economic growth by attracting business and creating jobs. This includes bringing in mixed-use developments, such as retail, dining, and entertainment options within easy reach of resident. Moreover, TOD increases property values in surrounding areas, which is particularly important for real-estate developers and governments.

Transit oriented development has been a large part of Boston’s growth. The Boston Redevelopment Authority has been undertaking a TOD initiative to build on and reinforce Boston’s existing and emerging neighborhood centers. Their goal is to connect all neighborhood centers into a comprehensive transit network and to support that network with appropriate development patterns near these neighborhood centers. To contribute to theri work, this report conducts a comprehensive analysis of existing TOD patterns in Boston. Through visualizing temporal changes in demographic, social, and economic characteristics of neighborhoods across Boston, examining the spatial distribution of transit stops, and comparing the differences of development patterns between TOD and Non-TOD tracts, we aim to address the following questions:

  1. Who is living around transit stations in Boston?

  2. How does the social, economic, and demographic characteristics of households living near transit stations in Boston change over time?

  3. Are Bostonians willing to pay a premium to live in transit-rich areas? If residents value these locations, officials should consider changing the zoning code to allow increased density around transit.

  4. How can planners in Boston be more strategic about developments in transit-rich neighborhoods in order to satisfy the needs of those who favor transit proximity?

The GitHub repository for this report is available here.

Data Manipulation

Before we begin our analysis and visualization, there are a couple of data cleaning and manipulation steps that we need to perform as well as some analytical decisions to specify.

ACS Data

We are using the America Community Survey (5-year estimate) for year 2010 and 2019 in this report. We chose the ACS data because it collects detailed demographic, social, economic, and housing data on a wide range of topics, providing a more comprehensive view of a community’s characteristics in Boston. We chose 2019 as the end year because COVID-19 since 2020 has drastically changed people’s work and travel behaviors: the widespread adoption of remote work during the pandemic would certainly led to decline in the demand for office space and the need for transit-oriented office developments. In order to better understand if there’s a difference between individuals living in TOD and Non-TOD neighborhood over time, we chose the following ACS variables, categorized into demographic, social, and economic characteristics, for our analysis.

Variable Name ACS Name
Demographic
Total Population B02001_001
White Population B02001_002
Black Population B02001_003
Asian Population B02001_005
Latinx Population B03002_012
Male 70-74 B01001_022
Male 75-79 B01001_023
Male 80-84 B01001_024
Male over 85 B01001_025
Female 70-74 B01001_046
Female 75-79 B01001_047
Female 80-84 B01001_048
Female over 85 B01001_049
Non-US Citizen B05001_006
Economic
Median Household Income B19013_001
Median Rent B25058_001
Population under Poverty B06012_002
Social
Bachelor Degree B06009_005
Graduate Degree B06009_006
Renter Occupied Household B25026_009
Population without Vehicle B08014_002

Based on these ACS variables, we further computed the following variables:

  • PctMinority : percentage of minority population, where minority here refers to anyone who is Asian, Latinx, or Black.

  • PctAging: percentage of aging population, where aging here refers to anyone who is over 70 years old.

  • PctForeign: percentage of population who are non-US citizens.

  • PctBachelors: percentage of population with at least a Bachelor’s degree.

  • PctRenter: percentage of renter occupied households.

  • PctNoCar:percentage of population without a vehicle.

  • PctPoverty: poverty rate.

The following code gives an example of the data cleaning steps we performed using 2010 ACS data, where we downloaded the ACS data, reprojected the data, removed areas outside of Boston, renamed the columns, performed additional data manipulations, and discarded unnecessary variables.

boston10 <- 
  get_acs(geography = "tract", variables = acs_var, 
          year=2010, state='MA', county=025, geometry=T, output="wide") %>% 
  st_transform('EPSG:26986') %>% 
  filter(!(GEOID %in% remove)) %>% 
  rename(TotalPop = B02001_001E, 
         Whites = B02001_002E,
         African_Americans = B02001_003E,
         Asians = B02001_005E,
         Latinx = B03002_012E,
         M70_74 = B01001_022E,
         M75_79 = B01001_023E,
         M80_85 = B01001_024E,
         M85 = B01001_025E,
         FM70_74 = B01001_046E,
         FM75_79 = B01001_047E,
         FM80_85 = B01001_048E,
         FM85 = B01001_049E,
         MedHHInc = B19013_001E, 
         MedRent = B25058_001E,
         TotalPoverty = B06012_002E,
         College = B06009_005E,
         Graduate = B06009_006E,
         NoCar = B08014_002E,
         Renter = B25026_009E,
         Foreigner = B05001_006E
         ) %>% 
  dplyr::select(-starts_with("B"))%>%
  mutate(pctWhite = ifelse(TotalPop > 0, Whites / TotalPop, 0),
         pctMinority = ifelse(TotalPop > 0, (African_Americans + Asians + Latinx ) / TotalPop, 0),
         pctAging = ifelse(TotalPop > 0, (M70_74 + M75_79 + M80_85 + M85 + FM70_74 + FM75_79 + FM80_85 + FM85 ) / TotalPop, 0),
         pctBachelors = ifelse(TotalPop > 0, ((College + Graduate) / TotalPop), 0),
         pctPoverty = ifelse(TotalPop > 0, TotalPoverty / TotalPop, 0),
         pctNoCar = ifelse(TotalPop > 0, NoCar / TotalPop, 0),
         pctRenter = ifelse(TotalPop > 0, Renter / TotalPop, 0),
         pctForeigner = ifelse(TotalPop > 0, Foreigner / TotalPop, 0),
         year = "2010") %>% 
  dplyr::select(-Whites,-African_Americans, -Asians, -Latinx, -M70_74, -M75_79, -M80_85, -M85, -FM70_74, -FM75_79, -FM80_85, -FM85, -College, -Graduate, -TotalPoverty, -NoCar, -Renter, -Foreigner)

Note: Boston city is part of the larger Suffolk county in Massachusetts, which also includes the city of Chelsea, Revere, and Winthrop. To limit our analysis within Boston, we removed all areas outside Boston. The map below shows Boston relative to other cities in Suffolk county.

ggplot() + 
  geom_sf(data=st_union(Suffolk), fill= "#D3D3D3", color = "white", size = 2) +
  geom_sf(data=st_union(boston10), fill= "#bd7ebe", color = "white", size = 2) +
  labs(title = "Boston City in Suffolk County",
       caption = "Data: American Community Survey 2010 and 2019",
       color = "Boston") +
  theme_light() +
  theme(plot.title = element_text(size = 12, face = "bold")) +
  annotate(geom = "text",x = -71.10,y = 42.3, size = 4, label = "Boston",color = "white") +
  annotate(geom = "text",x = -71.04,y = 42.40, size = 3, label = "Chelsea",color = "black") +
  annotate(geom = "text",x = -71.01,y = 42.42, size = 3, label = "Revere",color = "black") +
  annotate(geom = "text",x = -70.97,y = 42.38, size = 3, label = "Winthrop",color = "black") 

Now, we repeat the same process to obtain the 2019 ACS data and combine it with the 2010 data.

boston19 <- 
  get_acs(geography = "tract", variables = acs_var, 
          year=2019, state='MA', county=025, geometry=T, output="wide") %>% 
  st_transform('EPSG:26986') %>% 
  filter(!(GEOID %in% remove)) %>% 
  rename(TotalPop = B02001_001E, 
         Whites = B02001_002E,
         African_Americans = B02001_003E,
         Asians = B02001_005E,
         Latinx = B03002_012E,
         M70_74 = B01001_022E,
         M75_79 = B01001_023E,
         M80_85 = B01001_024E,
         M85 = B01001_025E,
         FM70_74 = B01001_046E,
         FM75_79 = B01001_047E,
         FM80_85 = B01001_048E,
         FM85 = B01001_049E,
         MedHHInc = B19013_001E, 
         MedRent = B25058_001E,
         TotalPoverty = B06012_002E,
         College = B06009_005E,
         Graduate = B06009_006E,
         NoCar = B08014_002E,
         Renter = B25026_009E,
         Foreigner = B05001_006E
         ) %>% 
  dplyr::select(-starts_with("B")) %>% 
    mutate(pctWhite = ifelse(TotalPop > 0, Whites / TotalPop, 0),
         pctMinority = ifelse(TotalPop > 0, (African_Americans + Asians + Latinx ) / TotalPop, 0),
         pctAging = ifelse(TotalPop > 0, (M70_74 + M75_79 + M80_85 + M85 + FM70_74 + FM75_79 + FM80_85 + FM85 ) / TotalPop, 0),
         pctBachelors = ifelse(TotalPop > 0, ((College + Graduate) / TotalPop), 0),
         pctPoverty = ifelse(TotalPop > 0, TotalPoverty / TotalPop, 0),
         pctNoCar = ifelse(TotalPop > 0, NoCar / TotalPop, 0),
         pctRenter = ifelse(TotalPop > 0, Renter / TotalPop, 0),
         pctForeigner = ifelse(TotalPop > 0, Foreigner / TotalPop, 0),
         year = "2019") %>% 
  dplyr::select(-Whites,-African_Americans, -Asians, -Latinx, -M70_74, -M75_79, -M80_85, -M85, -FM70_74, -FM75_79, -FM80_85, -FM85, -College, -Graduate, -TotalPoverty, -NoCar, -Renter, -Foreigner)

allTracts <- rbind(boston10,boston19)

Transit Data

The Boston transit data is provided by MassGIS, through which we downloaded the raw file and uploaded it to our GitHub repository. To ensure data consistency, we have repojected this layer for it to match with our census data, and have re-categorized transit lines.

boston_stops <- st_read("https://raw.githubusercontent.com/emilyzhou112/MUSA5080-Boston-TOD/main/data/boston_stps.geojson") %>%
  st_transform(st_crs(boston10)) %>% 
  mutate(stops = ifelse(grepl("/", LINE), "OTHER", LINE)) 

The MBTA Rapid Transit layers represent the core subway system lines and stations within the Greater Boston area. There are five subways lines in Boston: Red, Green, Orange, Blue, and Silver, which services many Boston neighborhoods plus parts of Cambridge, Somerville, Revere, Malden, Brookline, Newton, Milton, and Quincy. To limit our scope of analysis within Boston, we have removed subway stations outside of Boston. The map below shows transit stops used in our study as well as those that we have excluded.

MBTA <- st_read("https://raw.githubusercontent.com/emilyzhou112/MUSA5080-Boston-TOD/main/data/MBTA_node.geojson") %>%
  st_transform(st_crs(boston10))

ggplot() + 
  geom_sf(data=st_union(boston10), fill= "#D3D3D3", color = "white", size = 2) +
  geom_sf(data=MBTA, color = "black", size= 1) + 
  geom_sf(data=boston_stops, 
          aes(colour = stops %>% str_to_title()),
          show.legend = "point", size= 1.5) + 
  scale_colour_manual(values = c("#7eb0d5","#b2e061", "#ffb55a", "#ffee65", "#fd7f6f", "#bd7ebe")) +
  labs(title = "Boston MBTA Rapid Transit Stops",
       caption = "Data: MassGIS 2022",
       colour = "Lines") +
  theme_light() +
  theme(plot.title = element_text(size = 12, face = "bold"))

To delineate census tracts that are in close proximity to transit stations, we delineated a half mile area for each transit stations, and compared them to all the centroids (center ponits) of census tracts. If the centroid is located within the half mile area, we categorize that tract as “TOD”; otherwise, this tract is considered “Non-TOD”. An additional data manipulation here is to adjust median rent and median household income for inflation.

stops_buffer <- st_union(st_buffer(boston_stops, 804)) %>% st_sf() 

allTracts.group <- 
  rbind(
    st_centroid(allTracts)[stops_buffer,] %>%
      st_drop_geometry() %>%
      left_join(allTracts) %>%
      st_sf() %>%
      mutate(TOD = "TOD"),
    st_centroid(allTracts)[stops_buffer, op = st_disjoint] %>%
      st_drop_geometry() %>%
      left_join(allTracts) %>%
      st_sf() %>%
      mutate(TOD = "Non-TOD")) %>%
  mutate(MedRent.inf = ifelse(year == "2010", MedRent * 1.17, MedRent)) %>% 
  mutate(MedHHInc.inf = ifelse(year == "2010", MedHHInc * 1.17, MedHHInc)) 

The map below shows the distribution of TOD and Non-TOD tracts in Boston, using the above method.

allTracts.group %>% 
  filter(year == 2010 ) %>% 
  ggplot() + 
  geom_sf(aes(fill = TOD), color = "transparent") +
  scale_fill_manual(values = c("#7eb0d5", "#b2e061")) + 
  geom_sf(data=boston_stops, 
          color = "#fd7f6f", size= 0.8)+
  labs(title = "Transient Oriented Development Tracts",
       subtitle = "Tracts within 0.5 mile of transit stops",
       caption = "Data: American Community Survey 2010 and 2019",
       fill = "Tracts Types") +
  theme_light() +   
  theme(plot.subtitle = element_text(size = 9,face = "italic"),
        plot.title = element_text(size = 12, face = "bold"))

Data Analysis

Demographic Characteristics

We first looked at the temporal changes in aging population distribution. The map below reveals that the number of populations that are 70 and above increases in Boston overall. Yet, most aging population live in Non-TOD tracts. This pattern is consistent through the years as we can see that TOD tracts are relatively “younger” than Non-TOD tracts. This loosely implies that in Boston, transit-rich neighborhoods are favored by the younger cohort, probably because those people have a higher demand for transit to get to school and work.

allTracts.group %>% 
  ggplot()+
  geom_sf(aes(fill=pctAging), color="transparent") +
  scale_fill_continuous(low = "#FAF9F6", high = "blue4", name= "Percentage Aging", breaks = c(0, 0.1, 0.2, 0.3))+
  facet_wrap(~year) +
  geom_sf(data=stops_buffer, color = "red", fill = "transparent", linewidth = 0.4) +
  theme(legend.position = "bottom",
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        plot.subtitle = element_text(size = 9,face = "italic"),
        plot.title = element_text(size = 12, face = "bold"),
        panel.background = element_blank(),
        panel.border = element_rect(colour = "grey", fill=NA, size=0.8)
        ) + 
  labs(title = "Percentage of Aging Population in Boston",
       subtitle = "Red circles indicate areas within 0.5 mile of a transit stops",
       caption = "Data: American Community Survey 2010 and 2019 \n MassGIS 2022") 

We then looked at the distribution of Non-US citizens in Boston. We selected this demographic variable because we expected that non-US citizens are more likely to live in transit-rich neighborhoods because these places often provide easy access to major job centers, business districts, and employment opportunities, which is especially attractive to foreigners seeking employment in urban areas. Yet, the map below reveals only a slight difference between number of non-US citizens between TOD and Non-TOD tracts, where TOD tracts attract slightly more foreigners than Non-TOD tracts, and this pattern is consistent throughout years.

allTracts.group %>% 
  ggplot()+
  geom_sf(aes(fill=pctForeigner), color="transparent") +
  scale_fill_continuous(low = "#FAF9F6", high = "blue4", name= "Percentage Foreigner", breaks = c(0, 0.2, 0.4, 0.6))+
  facet_wrap(~year) +
  geom_sf(data=stops_buffer, color = "red", fill = "transparent", linewidth = 0.4) +
 theme(legend.position = "bottom",
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        plot.subtitle = element_text(size = 9,face = "italic"),
        plot.title = element_text(size = 12, face = "bold"),
        panel.background = element_blank(),
        panel.border = element_rect(colour = "grey", fill=NA, size=0.8)
        ) + 
  labs(title = "Percentage of Non-US Citizens in Boston",
       subtitle = "Red circles indicate areas within 0.5 mile of a transit stops",
       caption = "Data: American Community Survey 2010 and 2019 \n MassGIS 2022") 

We also looked at the distribution of racial minority population in Boston. The map below shows that minority population are concentrated in southern Boston, mainly in Roxbury and Dorchester. More importantly, these neighborhoods are mostly made up of Non-TOD tracts. This pattern is consistent over time and in 2019, we are seeing an increasing number of minority population living in the same neighborhoods as in 2010.

allTracts.group %>% 
  ggplot()+
  geom_sf(aes(fill=pctMinority), color="transparent") +
  scale_fill_continuous(low = "#FAF9F6", high = "blue4", name= "Percentage Minority", breaks = c(0, 0.3, 0.6, 0.9))+
  facet_wrap(~year) +
  geom_sf(data=stops_buffer, color = "red", fill = "transparent", linewidth = 0.4) +
 theme(legend.position = "bottom",
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        plot.subtitle = element_text(size = 9,face = "italic"),
        plot.title = element_text(size = 12, face = "bold"),
        panel.background = element_blank(),
        panel.border = element_rect(colour = "grey", fill=NA, size=0.8)
        ) + 
  labs(title = "Percentage of Minorities in Boston",
       subtitle = "Red circles indicate areas within 0.5 mile of a transit stops",
       caption = "Data: American Community Survey 2010 and 2019 \n MassGIS 2022") 

Economic Characteristics

For economic variables, we first looked at median rent in Boston. The map below reveals that rent is significantly higher in TOD tracts than Non-TOD tracts in both 2010 and 2019. This implies that transit-rich neighborhoods are also more expensive. In addition, after adjusting for inflation, we also see that median rent has increased significantly in all census tracts in Boston, and this increase in rent is most salient in city center, where multiple transit lines meet.

allTracts.group %>% 
  ggplot()+
  geom_sf(aes(fill=MedRent.inf), color="transparent") +
  scale_fill_continuous(low = "#FAF9F6", high = "seagreen4", name= "Rent")+
  facet_wrap(~year) +
  geom_sf(data=stops_buffer, color = "red", fill = "transparent", linewidth = 0.4) +
   theme(legend.position = "bottom",
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        plot.subtitle = element_text(size = 9,face = "italic"),
        plot.title = element_text(size = 12, face = "bold"),
        panel.background = element_blank(),
        panel.border = element_rect(colour = "grey", fill=NA, size=0.8)
        ) + 
  labs(title = "Median Rent in Boston",
       subtitle = "Red circles indicate areas within 0.5 mile of a transit stops \n2019 data adjusted for inflation",
       caption = "Data: American Community Survey 2010 and 2019 \n MassGIS 2022") 

Looking at the percentage of people under poverty in Boston, we can see that in both 2010 and 2019, neighborhoods in southern Boston (Roxbury and Dorchester) have higher poverty rates than other neighborhoods. These neighborhoods are mostly located in Non-TOD tracts. Yet, we are also able to see poverty rate is decreasing over the years. In 2010, a significant number of TOD tracts in western Boston have higher poverty rate than others. In 2019, however, we see that the condition improved, with most census tract with higher poverty rate concentrated in south Boston.

allTracts.group %>% 
  ggplot()+
  geom_sf(aes(fill=pctPoverty), color="transparent") +
  scale_fill_continuous(low = "#FAF9F6", high = "seagreen4", name= "Percentage Poverty")+
  facet_wrap(~year) +
  geom_sf(data=stops_buffer, color = "red", fill = "transparent", linewidth = 0.4) +
 theme(legend.position = "bottom",
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        plot.subtitle = element_text(size = 9,face = "italic"),
        plot.title = element_text(size = 12, face = "bold"),
        panel.background = element_blank(),
        panel.border = element_rect(colour = "grey", fill=NA, size=0.8)
        ) + 
  labs(title = "Percentage Poverty in Boston",
       subtitle = "Red circles indicate areas within 0.5 mile of a transit stops",
       caption = "Data: American Community Survey 2010 and 2019 \n MassGIS 2022")

The median household income data is not ideal as it contains a lot of missing data entries. However, it still informs us that in both 2010 and 2019, households living in TOD tracts have higher median household income. This is consistent with previous maps, where TOD tracts also have higher rent and lower poverty rates. It is also worth pointing out that median household income in TOD tracts is the highest in the city center. As the distance from city center increases, the difference in median household income between TOD and Non-TOD tracts become less prominent.

allTracts.group %>% 
  mutate(inc = MedHHInc.inf/1000) %>% 
  ggplot()+
  geom_sf(aes(fill=inc), color="transparent") +
  scale_fill_continuous(low = "#FAF9F6", high = "seagreen4", name= "Income (thousands)")+
  facet_wrap(~year) +
  geom_sf(data=stops_buffer, color = "red", fill = "transparent", linewidth = 0.4) +
  theme(legend.position = "bottom",
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        plot.subtitle = element_text(size = 9,face = "italic"),
        plot.title = element_text(size = 12, face = "bold"),
        panel.background = element_blank(),
        panel.border = element_rect(colour = "grey", fill=NA, size=0.8)
        ) + 
  labs(title = "Median Household Income in Boston",
       subtitle = "Red circles indicate areas within 0.5 mile of a transit stops\n2019 data adjusted for inflation",
       caption = "Data: American Community Survey 2010 and 2019 \n MassGIS 2022")

Social Characteristics

From the map below, we can see that in both 2010 and 2019, the majority of houses in Boston are renter occupied, with houses in TOD tracts, especially those tracts in city center, appear slightly more preferable to renters. Considering that there are a lot of universities in Boston and that some of these universities have their own transit station on campus, we suspect that a large number of renters of students, who chose to live around.

allTracts.group %>% 
  ggplot()+
  geom_sf(aes(fill=pctRenter), color="transparent") +
  scale_fill_continuous(low = "#FAF9F6", high = "#0047AB", name= "Percentage Renters")+
  facet_wrap(~year) +
  geom_sf(data=stops_buffer, color = "red", fill = "transparent", linewidth = 0.4) +
   theme(legend.position = "bottom",
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        plot.subtitle = element_text(size = 9,face = "italic"),
        plot.title = element_text(size = 12, face = "bold"),
        panel.background = element_blank(),
        panel.border = element_rect(colour = "grey", fill=NA, size=0.8)
        ) + 
  labs(title = "Percentage Renters in Boston",
       subtitle = "Red circles indicate areas within 0.5 mile of a transit stops",
       caption = "Data: American Community Survey 2010 and 2019 \n MassGIS 2022") 

We see a clear difference in spatial pattern here when considering whether people living in Boston own a car or not. Compared to Non-TOD tracts, a significantly higher number of people in TOD tracts do not have a car. This could imply that people chose to live in transit-rich neighborhood because of the “transit” itself.

allTracts.group %>% 
  ggplot()+
  geom_sf(aes(fill=pctNoCar), color="transparent") +
  scale_fill_continuous(low = "#FAF9F6", high = "#0047AB", name= "Percentage No Vehicle", breaks = c(0, 0.2, 0.4, 0.6))+
  facet_wrap(~year) +
  geom_sf(data=stops_buffer, color = "red", fill = "transparent", linewidth = 0.4) +
   theme(legend.position = "bottom",
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        plot.subtitle = element_text(size = 9,face = "italic"),
        plot.title = element_text(size = 12, face = "bold"),
        panel.background = element_blank(),
        panel.border = element_rect(colour = "grey", fill=NA, size=0.8)
        ) + 
  labs(title = "Percentage of Population without Car in Boston",
       subtitle = "Red circles indicate areas within 0.5 mile of a transit stops",
       caption = "Data: American Community Survey 2010 and 2019 \n MassGIS 2022") 

Lastly, we looked at education attainment for Bostonians. From 2010 to 2019, the number of people with a least a bachelor’s degree increased in Boston, and this increase is the most salient in TOD tracts. Again, census tracts with low education attainment are Non-TOD tracts, and are concentrated in south Boston.

allTracts.group %>% 
  ggplot()+
  geom_sf(aes(fill=pctBachelors), color="transparent") +
  scale_fill_continuous(low = "#FAF9F6", high = "#0047AB", name= "Percentage Bachelors", breaks = c(0, 0.3, 0.6, 0.9))+
  facet_wrap(~year) +
  geom_sf(data=stops_buffer, color = "red", fill = "transparent", linewidth = 0.4) +
   theme(legend.position = "bottom",
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        plot.subtitle = element_text(size = 9,face = "italic"),
        plot.title = element_text(size = 12, face = "bold"),
        panel.background = element_blank(),
        panel.border = element_rect(colour = "grey", fill=NA, size=0.8)
        ) + 
  labs(title = "Percentage of Population with at Least a Bacehlor's Degree in Boston",
       subtitle = "Red circles indicate areas within 0.5 mile of a transit stops",
       caption = "Data: American Community Survey 2010 and 2019 \n MassGIS 2022") 

Quick Summary

To support our observations above, we further calculated the mean for all of our variables, which allows us to better examine changes in population characteristics across space and time as a whole.

allTracts.Summary <- 
  st_drop_geometry(allTracts.group) %>%
  group_by(year, TOD) %>%
  summarize(Rent = mean(MedRent.inf, na.rm = T),
            Income = mean(MedHHInc.inf, na.rm = T),
            Pop = mean(TotalPop, na.rm = T),
            PctMinority = mean(pctMinority, na.rm = T),
            PctAging = mean(pctAging, na.rm = T),
            PctBach = mean(pctBachelors, na.rm = T),
            PctPoverty = mean(pctPoverty, na.rm = T),
            PctNoCar = mean(pctNoCar, na.rm = T),
            PctRenter = mean(pctRenter, na.rm = T),
            PctForeign = mean(pctForeigner, na.rm = T))

The result is shown in the table as well as the chart below.

kable(allTracts.Summary, format = "html") %>%
 kable_styling(full_width = T, 
               bootstrap_options = c("striped", "hover"),
               font_size = 12) %>% 
  footnote(general_title = "\n",
           general = "Table 1: Comparaing Population Characteristics Across Space and Time ") %>% 
  row_spec(0, font_size = 14) %>% 
  row_spec(1, color = "white", background = "#7eb0d5") %>% 
  row_spec(2, color = "white", background = "#b2e061") %>% 
  row_spec(3, color = "white", background = "#7eb0d5") %>% 
  row_spec(4, color = "white", background = "#b2e061") %>% 
  column_spec(1:2, bold = T)
year TOD Rent Income Pop PctMinority PctAging PctBach PctPoverty PctNoCar PctRenter PctForeign
2010 Non-TOD 1152 66652 3272 0.551 0.074 0.210 0.168 0.056 0.471 0.118
2010 TOD 1310 61120 3402 0.423 0.063 0.330 0.214 0.170 0.629 0.154
2019 Non-TOD 1367 74354 3856 0.570 0.077 0.262 0.166 0.058 0.476 0.111
2019 TOD 1567 74538 3761 0.456 0.072 0.412 0.175 0.170 0.624 0.146

Table 1: Comparaing Population Characteristics Across Space and Time
# 

Who is living around transit stations in Boston, what are their social, economic, and demographic characteristics, and do they vary across time? Our analysis have found the following to answer our first and second questions: 1) Populations living in transit-rich neighborhoods in Boston are of relatively younger age, are mostly renters, have higher education attainment, have higher median household income, and do not own any vehicles. 2) On the other hand, Non-TOD neighborhoods in Boston tend to have higher concentration of aging population and minority population as well as lower education attainment. 3) Gentrification remains prevalent in the city as we see citywide increase in median household rent and median household income. In addition, the city is becoming more and more diverse; its poverty rate has been reducing while overall and individuals’ education attainment has been increasing.

allTracts.Summary %>%
  gather(Variable, Value, -year, -TOD) %>%
  ggplot(aes(year, Value, fill = TOD)) +
  geom_bar(stat = "identity", position = "dodge") +
  facet_wrap(~Variable, scales = "free", ncol=5) +
  scale_fill_manual(values = c("#7eb0d5", "#b2e061")) +  
  labs(title = "Indicator Differences Across Time and Space",
       caption = "Data: American Community Survey 2010 and 2019") +
  theme(legend.position = "bottom",
        axis.ticks.x=element_blank(),
        axis.ticks.y=element_blank(),
        plot.title = element_text(size = 12, face = "bold"),
        panel.background = element_blank(),
        panel.border = element_rect(colour = "grey", fill=NA, size=0.8)
        ) 

Further Analysis

We did some further analysis to rearrange our tables and calculate percentage point change. This includes pivoting the table twice to make year 2010 and 2019 two separate columns

allTracts.Long <- allTracts.Summary %>%
  pivot_longer(cols = c(Rent, Income, Pop, PctMinority, PctAging, PctBach, PctPoverty, PctNoCar, PctRenter, PctForeign), names_to = "Variable", values_to = "Value") %>%
  pivot_wider(names_from = year, values_from = Value, names_prefix = "Year") %>% 
  mutate(pct_change = (Year2019-Year2010) / Year2010)

Looking at the percentage change of variables between Non-TOD and TOD tracts, we may see that from 2010 to 2019, 1) median house rent and median household income increases significantly in TOD tracts compared to Non-TOD, 2) poverty rate was significantly reduced in TOD tracts compared to Non-TOD, 3) population increase is more salient in Non-TOD tracts than TOD.

allTracts.Long %>% 
  ggplot() +
  geom_bar(aes(x = Variable, y = pct_change, fill = TOD),
           stat = "identity") +
  facet_wrap(~TOD) +
  coord_flip() +
  scale_fill_manual(values = c("#7eb0d5", "#b2e061")) +  
  labs(title = "Percentage Change 2010-2019 for TOD and Non-TOD Tracts",
       caption = "Data: American Community Survey 2010 and 2019") +
  ylab ("Percentage Change") +
  theme(legend.position = "bottom",
        axis.ticks.x=element_blank(),
        axis.ticks.y=element_blank(),
        plot.title = element_text(size = 12, face = "bold"),
        panel.background = element_blank(),
        panel.border = element_rect(colour = "grey", fill=NA, size=0.8)
        ) 

We also looked into see if there’s intra-regional variations of our variables among TOD tracts, using population and median house rent as an example. To achieve that, we joined the centroids of census tracts into the half mile buffer of transit stations, which we then summarized the total population and median rent within half mile of each transit station.

boston_stops$ID <- seq_along(boston_stops$STATION) # unique id for each station

stop_attributes <- st_buffer(boston_stops, 804) %>% 
  st_sf() %>% 
  st_intersection(st_centroid(boston10)) %>% 
  st_drop_geometry() %>% 
  group_by(ID) %>% 
  summarize(
    total_population = sum(TotalPop),
    median_rent = mean(MedRent, na.rm = T))

stops_summary <- boston_stops %>% 
  left_join(stop_attributes, by = "ID")

Not surprisingly, the map below shows that transit station in city center attracts more people to live nearby while as distance from the city center gets greater, the dot size becomes smaller, meaning there’s less people living nearby.

ggplot() +
  geom_sf(data=st_union(boston10_4326), fill= "#D3D3D3", color = "white", size = 2) +
  geom_point(data = stops_summary_df, aes(x = X, y = Y, size = total_population, color = str_to_title(stops))) +
  scale_size(range = c(0.5, 4.5), name = "Population") +    
  scale_colour_manual(values = c("#7eb0d5","#b2e061", "#ffb55a", "#ffee65", "#fd7f6f", "#bd7ebe"), name = "Stops") +
  labs(title = "Population within 0.5 Mile of Tansit Stops",
       subtitle = "Dot size proportional to population",
       caption = "Data: American Community Survey 2019") +
  theme_light() +   
  theme(plot.subtitle = element_text(size = 9,face = "italic"),
        plot.title = element_text(size = 12, face = "bold"))

Similarly, median rent is the highest around transit stations in the city center.This shows that the influence of transit station on nearby population and settlements characteristics are not the same throughout the city.

ggplot() +
  geom_sf(data=st_union(boston10_4326), fill= "#D3D3D3", color = "white", size = 2) +
  geom_point(data = stops_summary_df, aes(x = X, y = Y, size = rent, color = str_to_title(stops))) +
  scale_size(range = c(0.5, 4), name = "Rent") +    
  scale_colour_manual(values = c("#7eb0d5","#b2e061", "#ffb55a", "#ffee65", "#fd7f6f", "#bd7ebe"), name = "Stops") +
  labs(title = "Median Rent within 0.5 Mile of Tansit Stops",
       subtitle = "Dot size proportional to rent",
       caption = "Data: American Community Survey 2019") +
  theme_light() +   
  theme(plot.subtitle = element_text(size = 9,face = "italic"),
        plot.title = element_text(size = 12, face = "bold"))

Moreover, we were curious if there is a linear relationship between distance from transit stops and rent? We therefore created a multiple ring buffer, where each buffer is of half a mile distance to the previous one, to answer this question.

boston_MRB <- multipleRingBuffer(st_union(boston_stops), 14484, 804)
ggplot() +
    geom_sf(data=boston_MRB, aes(fill = distance), color = "#708090") +
    scale_fill_continuous(low = "#FAF9F6", high = "#beb9db", name= "Distance")+
    geom_sf(data=st_union(boston10), fill= "#D3D3D3", color = "white", size = 2) +
    geom_sf(data=boston_stops, 
          color = "#fd7f6f", size= 0.8) +    
    labs(title="Half Mile Buffers",
         subtitle = "Distance in-between each buffer is 0.5 mile",
         caption = "Data: American Community Survey 2010 and 2019 \nMassGIS 2022") +
    theme_light() +   
    theme(plot.subtitle = element_text(size = 9,face = "italic"),
          plot.title = element_text(size = 12, face = "bold"))

The figure below shows that overall, rent decreases as distance from transit stops gets greater. However, the relationship is not linear as we see that rent is increasing from 2500 to 4000 meters. This implies that transit stations is not the only factor defining rent. In other words, it could be households are willing to pay more for transit amenities, or that they pay for other types of amenities in neighborhoods that happen to also be transit-rich: proximity to transit station only partially influence their decisions to live in nearby locations.

allTracts.rings <-
  st_join(st_centroid(dplyr::select(allTracts, GEOID, year)),
          boston_MRB) %>%
  st_drop_geometry() %>%
  left_join(dplyr::select(allTracts, GEOID, MedRent, year), 
            by=c("GEOID"="GEOID", "year"="year")) %>%
  st_sf() 


allTracts.rings.summary <- st_drop_geometry(allTracts.rings) %>%
  group_by(distance, year) %>%
  summarize(Mean_Rent = mean(MedRent, na.rm=T))


ggplot(allTracts.rings.summary, aes(x = distance, y = Mean_Rent)) + 
  geom_line(aes(color = year), size = 2) + 
  scale_color_manual(values = c("#7eb0d5", "#b2e061"), name = "Year") +
  labs(title = "Mean Rent as a Function of Distance to Subway Stations",
       caption = "Data: American Community Survey 2010 and 2019") +
  xlab ("Distance from Transit Stops") +
  ylab("Mean Rent") + 
  theme(legend.position = "bottom",
        plot.title = element_text(size = 12, face = "bold"),
        panel.background = element_blank(),
        panel.border = element_rect(colour = "grey", fill=NA, size=0.8)
        ) 

Future Planning

To quickly summarize, Boston is gentrifying between 2010 and 2019, with increasing population, diversity, median household income, median rent, and education attainment as well as decreasing poverty rate throughout the city. There’s difference in terms of population characteristics between TOD and Non-TOD regions, where TOD attracts mostly renters who are younger, wealthier, and well-educated, whereas Non-TOD tracts’ households consist larger concentration of minority and aging population with lower education attainment.

In addition, there’s intra-regional difference within TOD tracts, where median rent for houses next to transit stations that are in city center being the most expensive. Further, rent did not only decrease as we move away from transit stations. This leads us to conclude that not all transit stations are as appealing as each other. Proximity to transit station only affects people’s decision to live in nearby locations to a certain extent. TOD neighborhoods in Boston is able to attract younger and wealthier individual not only because they offer convenience in public transportation, but also could because Bostonians value the cultural amenities, entertainment & nightlife, as well as employment and academic opportunities that are happened to be within half a mile of these transit stops.

While Boston is a highly walkable city, planning official should still work towards making public transportation more accessible to neighborhoods that are currently. Non-TOD and developing parcels near transit stations, which could alleviate the housing and amenity shortages at city center and reduce intra-regional disparities. Despite that Non-TOD neighborhood have higher car ownecrship rate, this does not mean that public transportation planning is not important in these neighborhoods. For places like the city center where it is hard to get to through driving and where business and intellectual opportunities accumulate, public transportation is the best way to connect different neighborhoods together. Indeed, this aligns with the city’s goal to “connect all neighborhood centers into a comprehensive transit network”.

With the current data we have, how do we improve public transit accessibility in Non-TOD neighborhoods in Boston? We attempted to answer this question with a bivariate map, considering the possibility of extending existing transit lines and adding a transit station in Non-TOD neighborhoods where there’s both a high concentration of aging and minority population (two population subgroups that are often excluded in TOD).The map below highlights an area in southern Boston with high number of both population subgroups. We encouraged transportation planners to further investigate the possibility of future transit developments in these locations.

bi_data <- bi_class(boston19, x = pctMinority, y = pctAging, style = "quantile", dim = 3) #  a 3*3 color-schemed bi-variate choropleth 


bi_map <- ggplot() +
  geom_sf(data = bi_data, mapping = aes(fill = bi_class), color = "transparent", show.legend = FALSE) +
  bi_scale_fill(pal = "DkCyan", dim = 3) + 
  labs(title = "Bivariate Choropleth of Pct Minority over Pct Aging Population") + 
  geom_sf(data = boston_stops, colour = "yellow", alpha = 0.8, size = 0.5) +
  labs(caption = "Data: American Community Survey 2019") +
  theme_light() +
  theme(plot.title = element_text(size = 12, face = "bold"))

legend <- bi_legend(pal = "DkCyan",
                    dim = 3,
                    xlab = "PctMinority ",
                    ylab = "PctAging ",
                    size = 5.8)

ggdraw() +
  draw_plot(bi_map, 0, 0, 1, 1) +
  draw_plot(legend, 0.1, 0.2, 1.2, 0.23)

Finally, there is one important limitation of this study that has to be addressed. We mentioned before that the MBTA transit lines go beyond the boundary of Boston cities. In this case, it is likely that some census tracts at the edge of Boston that are categorized as Non-TOD are actually TOD.Future study at a regional scale should consider examine TOD development in Boston considering all transit stops in the Greater Boston Areas as well as all types of transit stops.

---
title: "Transit Oriented Development in Boston"
author: "Emily Zhou"
date: "`r Sys.Date()`"
output:
  html_document:
    theme: simplex
    toc: true
    toc_float: true
    code_folding: hide
    code_download: true
editor_options:
  markdown:
    wrap: sentence
---

```{r setup, include=FALSE, cache=TRUE}
knitr::opts_chunk$set(echo = TRUE)
```

Version 1.0 \| First Created Sep 15, 2023 \| Updated Sep 20, 2023

Keywords: Multiple Ring Buffer, Indicators, Transit Oriented Development, Gentrification, Boston

# Introduction

In city planning, transit-oriented development (TOD) is a type of urban development practice that aims to maximize the amount of residential, business, and leisure space within walking distance of public transport. TOD offers a range of benefits that all contribute to creating a more vibrant, sustainable, and inclusive communities. For example, a community with a strong and dependable transit system and streetscaping elements can discourage vehicle dependence and congestion. This further leads to reduction in greenhouse gas emissions per capita and a smaller carbon footprint. In addition, as cities become more walkable, TOD can stimulate economic growth by attracting business and creating jobs. This includes bringing in mixed-use developments, such as retail, dining, and entertainment options within easy reach of resident. Moreover, TOD increases property values in surrounding areas, which is particularly important for real-estate developers and governments. 

Transit oriented development has been a large part of Boston’s growth. The [Boston Redevelopment Authority](https://www.bostonplans.org/planning/planning-initiatives/fostering-transit) has been undertaking a TOD initiative to build on and reinforce Boston's existing and emerging neighborhood centers. Their goal is to **connect all neighborhood centers into a comprehensive transit network** and to **support that network with appropriate development patterns near these neighborhood centers**. To contribute to theri work, this report conducts a comprehensive analysis of existing TOD patterns in Boston. Through visualizing temporal changes in demographic, social, and economic characteristics of neighborhoods across Boston, examining the spatial distribution of transit stops, and comparing the differences of development patterns between TOD and Non-TOD tracts, we aim to address the following questions: 

1. Who is living around transit stations in Boston? 

2. How does the social, economic, and demographic characteristics of households living near transit stations in Boston change over time? 

3. Are Bostonians willing to pay a premium to live in transit-rich areas? If residents value these locations, officials should consider changing the zoning code to allow increased density around transit.

4. How can planners in Boston be more strategic about developments in transit-rich neighborhoods in order to satisfy the needs of those who favor transit proximity? 

The GitHub repository for this report is available [here](https://github.com/emilyzhou112/MUSA5080-Boston-TOD). 

# Data Manipulation

Before we begin our analysis and visualization, there are a couple of data cleaning and manipulation steps that we need to perform as well as some analytical decisions to specify.  

```{r setup_packages, warning = FALSE, include=FALSE}

# list of required packages
packages <- c( "tidyverse", "ggplot2", "svDialogs", "tidycensus", "sf", "knitr", "rmarkdown", "kableExtra", "stringr", "biscale", "cowplot")

# install and load required packages
package.check <- lapply(
  packages,
  FUN = function(x) {
    if (!require(x, character.only = TRUE)) {
      install.packages(x, dependencies = TRUE, quietly=TRUE)
      library(x, character.only = TRUE)
    }
  }
)

# global options 
options(scipen=999)
options(tigris_class = "sf")
options(digits = 3)

# load in multiple ring buffer function
source("https://raw.githubusercontent.com/urbanSpatial/Public-Policy-Analytics-Landing/master/functions.r")

```


```{r load_key, warning = FALSE, message=FALSE, include=FALSE}

census_api_key(dlgInput(
  "Enter a Census API Key", # ask for an api key
  Sys.getenv("CENSUS_API_KEY")
)$res,
overwrite = TRUE)

```

## ACS Data

We are using the America Community Survey (5-year estimate) for year 2010 and 2019 in this report. We chose the ACS data because it collects detailed demographic, social, economic, and housing data on a wide range of topics, providing a more comprehensive view of a community's characteristics in Boston. We chose 2019 as the end year because COVID-19 since 2020 has drastically changed people's work and travel behaviors: the widespread adoption of remote work during the pandemic would certainly led to decline in the demand for office space and the need for transit-oriented office developments. In order to better understand if there's a difference between individuals living in TOD and Non-TOD neighborhood over time, we chose the following ACS variables, categorized into demographic, social, and economic characteristics, for our analysis. 

| Variable Name              | ACS Name    | 
| :----------: | :----------:|  
| **Demographic**            |             |
| Total Population           | B02001_001  |
| White Population           | B02001_002  |
| Black Population           | B02001_003  |
| Asian Population           | B02001_005  |
| Latinx Population          | B03002_012  |
| Male 70-74                 | B01001_022  |
| Male 75-79                 | B01001_023  |
| Male 80-84                 | B01001_024  |
| Male over 85               | B01001_025  |
| Female 70-74               | B01001_046  |
| Female 75-79               | B01001_047  |
| Female 80-84               | B01001_048  |
| Female over 85             | B01001_049  |
| Non-US Citizen             | B05001_006  |
| **Economic**               |             |
| Median Household Income    | B19013_001  |
| Median Rent                | B25058_001  |
| Population under Poverty   | B06012_002  |
| **Social**                 |             |
| Bachelor Degree            | B06009_005  |
| Graduate Degree            | B06009_006  |
| Renter Occupied Household  | B25026_009  |
| Population without Vehicle | B08014_002  |

Based on these ACS variables, we further computed the following variables:


- **PctMinority** : percentage of minority population, where minority here refers to anyone who is Asian, Latinx, or Black.    
- **PctAging**: percentage of aging population, where aging here refers to anyone who is over 70 years old.   

- **PctForeign**: percentage of population who are non-US citizens.   

- **PctBachelors**: percentage of population with at least a Bachelor's degree.   

- **PctRenter**: percentage of renter occupied households.   

- **PctNoCar**:percentage of population without a vehicle. 

- **PctPoverty**: poverty rate.  

The following code gives an example of the data cleaning steps we performed using 2010 ACS data, where we downloaded the ACS data, reprojected the data, removed areas outside of Boston, renamed the columns, performed additional data manipulations, and discarded unnecessary variables. 

```{r define_acs_var, include=FALSE}

acs_var = c("B02001_001E", # total population
            "B02001_002E", # white population
            "B02001_003E", # black population
            "B02001_005E", # asian population
            "B03002_012E", # latinx population
            "B01001_022E", # male 70-74
            "B01001_023E", # male 75-79
            "B01001_024E", # male 80-85
            "B01001_025E", # male over 85
            "B01001_046E", # female 70-74
            "B01001_047E", # female 75-79
            "B01001_048E", # female 80-85
            "B01001_049E", # female over 85
            "B19013_001E", # median household income
            "B25058_001E", # median rent
            "B06012_002E", # total poverty
            "B06009_005E", # bachelor
            "B06009_006E", # graduate
            "B08014_002E", # no car
            "B25026_009E", # renter
            "B05001_006E" # non-us citizen
            )

```


```{r non_boston_county, warning = FALSE, message=FALSE, include=FALSE}

remove <- c("25025180400", "25025990101", "25025180301", "25025160400", "25025180500", "25025160601", "25025180200", "25025160300", "25025160200", "25025180101", "25025170502", "25025160502", "25025170701", "25025170501", "25025170800", "25025170200", "25025170400", "25025170100", "25025170100", "25025170601", "25025170300", "25025170702","25025160602", "25025160501", "25025160101")

```


```{r boston_2010_data, message=FALSE, warning=FALSE, cache=TRUE, class.source = 'fold-show', results="hide"}

boston10 <- 
  get_acs(geography = "tract", variables = acs_var, 
          year=2010, state='MA', county=025, geometry=T, output="wide") %>% 
  st_transform('EPSG:26986') %>% 
  filter(!(GEOID %in% remove)) %>% 
  rename(TotalPop = B02001_001E, 
         Whites = B02001_002E,
         African_Americans = B02001_003E,
         Asians = B02001_005E,
         Latinx = B03002_012E,
         M70_74 = B01001_022E,
         M75_79 = B01001_023E,
         M80_85 = B01001_024E,
         M85 = B01001_025E,
         FM70_74 = B01001_046E,
         FM75_79 = B01001_047E,
         FM80_85 = B01001_048E,
         FM85 = B01001_049E,
         MedHHInc = B19013_001E, 
         MedRent = B25058_001E,
         TotalPoverty = B06012_002E,
         College = B06009_005E,
         Graduate = B06009_006E,
         NoCar = B08014_002E,
         Renter = B25026_009E,
         Foreigner = B05001_006E
         ) %>% 
  dplyr::select(-starts_with("B"))%>%
  mutate(pctWhite = ifelse(TotalPop > 0, Whites / TotalPop, 0),
         pctMinority = ifelse(TotalPop > 0, (African_Americans + Asians + Latinx ) / TotalPop, 0),
         pctAging = ifelse(TotalPop > 0, (M70_74 + M75_79 + M80_85 + M85 + FM70_74 + FM75_79 + FM80_85 + FM85 ) / TotalPop, 0),
         pctBachelors = ifelse(TotalPop > 0, ((College + Graduate) / TotalPop), 0),
         pctPoverty = ifelse(TotalPop > 0, TotalPoverty / TotalPop, 0),
         pctNoCar = ifelse(TotalPop > 0, NoCar / TotalPop, 0),
         pctRenter = ifelse(TotalPop > 0, Renter / TotalPop, 0),
         pctForeigner = ifelse(TotalPop > 0, Foreigner / TotalPop, 0),
         year = "2010") %>% 
  dplyr::select(-Whites,-African_Americans, -Asians, -Latinx, -M70_74, -M75_79, -M80_85, -M85, -FM70_74, -FM75_79, -FM80_85, -FM85, -College, -Graduate, -TotalPoverty, -NoCar, -Renter, -Foreigner)


```

```{r get suffolk, warning = FALSE, cache=TRUE, message=FALSE, include=FALSE}
Suffolk <- 
  get_acs(geography = "tract", variables = "B02001_001E", 
          year=2010, state='MA', county=025, geometry=T, output="wide")
```


Note: Boston city is part of the larger Suffolk county in Massachusetts, which also includes the city of Chelsea, Revere, and Winthrop. To limit our analysis within Boston, we removed all areas outside Boston. The map below shows Boston relative to other cities in Suffolk county.  


```{r suffolk_county, warning = FALSE, cache=TRUE}

ggplot() + 
  geom_sf(data=st_union(Suffolk), fill= "#D3D3D3", color = "white", size = 2) +
  geom_sf(data=st_union(boston10), fill= "#bd7ebe", color = "white", size = 2) +
  labs(title = "Boston City in Suffolk County",
       caption = "Data: American Community Survey 2010 and 2019",
       color = "Boston") +
  theme_light() +
  theme(plot.title = element_text(size = 12, face = "bold")) +
  annotate(geom = "text",x = -71.10,y = 42.3, size = 4, label = "Boston",color = "white") +
  annotate(geom = "text",x = -71.04,y = 42.40, size = 3, label = "Chelsea",color = "black") +
  annotate(geom = "text",x = -71.01,y = 42.42, size = 3, label = "Revere",color = "black") +
  annotate(geom = "text",x = -70.97,y = 42.38, size = 3, label = "Winthrop",color = "black") 

```

Now, we repeat the same process to obtain the 2019 ACS data and combine it with the 2010 data. 

```{r boston_2019_data, warning = FALSE, message=FALSE, results="hide"}

boston19 <- 
  get_acs(geography = "tract", variables = acs_var, 
          year=2019, state='MA', county=025, geometry=T, output="wide") %>% 
  st_transform('EPSG:26986') %>% 
  filter(!(GEOID %in% remove)) %>% 
  rename(TotalPop = B02001_001E, 
         Whites = B02001_002E,
         African_Americans = B02001_003E,
         Asians = B02001_005E,
         Latinx = B03002_012E,
         M70_74 = B01001_022E,
         M75_79 = B01001_023E,
         M80_85 = B01001_024E,
         M85 = B01001_025E,
         FM70_74 = B01001_046E,
         FM75_79 = B01001_047E,
         FM80_85 = B01001_048E,
         FM85 = B01001_049E,
         MedHHInc = B19013_001E, 
         MedRent = B25058_001E,
         TotalPoverty = B06012_002E,
         College = B06009_005E,
         Graduate = B06009_006E,
         NoCar = B08014_002E,
         Renter = B25026_009E,
         Foreigner = B05001_006E
         ) %>% 
  dplyr::select(-starts_with("B")) %>% 
    mutate(pctWhite = ifelse(TotalPop > 0, Whites / TotalPop, 0),
         pctMinority = ifelse(TotalPop > 0, (African_Americans + Asians + Latinx ) / TotalPop, 0),
         pctAging = ifelse(TotalPop > 0, (M70_74 + M75_79 + M80_85 + M85 + FM70_74 + FM75_79 + FM80_85 + FM85 ) / TotalPop, 0),
         pctBachelors = ifelse(TotalPop > 0, ((College + Graduate) / TotalPop), 0),
         pctPoverty = ifelse(TotalPop > 0, TotalPoverty / TotalPop, 0),
         pctNoCar = ifelse(TotalPop > 0, NoCar / TotalPop, 0),
         pctRenter = ifelse(TotalPop > 0, Renter / TotalPop, 0),
         pctForeigner = ifelse(TotalPop > 0, Foreigner / TotalPop, 0),
         year = "2019") %>% 
  dplyr::select(-Whites,-African_Americans, -Asians, -Latinx, -M70_74, -M75_79, -M80_85, -M85, -FM70_74, -FM75_79, -FM80_85, -FM85, -College, -Graduate, -TotalPoverty, -NoCar, -Renter, -Foreigner)

allTracts <- rbind(boston10,boston19)
```


## Transit Data

The Boston transit data is provided by [MassGIS](https://www.mass.gov/info-details/massgis-data-mbta-rapid-transit#overview-), through which we downloaded the raw file and uploaded it to our GitHub repository. To ensure data consistency, we have repojected this layer for it to match with our census data, and have re-categorized transit lines. 


```{r transit_data, warning = FALSE, message=FALSE, class.source = 'fold-show', results="hide"}

boston_stops <- st_read("https://raw.githubusercontent.com/emilyzhou112/MUSA5080-Boston-TOD/main/data/boston_stps.geojson") %>%
  st_transform(st_crs(boston10)) %>% 
  mutate(stops = ifelse(grepl("/", LINE), "OTHER", LINE)) 


```

The MBTA Rapid Transit layers represent the core subway system lines and stations within the Greater Boston area. There are five subways lines in Boston: Red, Green, Orange, Blue, and Silver, which services many Boston neighborhoods plus parts of Cambridge, Somerville, Revere, Malden, Brookline, Newton, Milton, and Quincy. To limit our scope of analysis within Boston, we have removed subway stations outside of Boston. The map below shows transit stops used in our study as well as those that we have excluded. 

```{r visualize_stops, warning = FALSE, results="hide"}

MBTA <- st_read("https://raw.githubusercontent.com/emilyzhou112/MUSA5080-Boston-TOD/main/data/MBTA_node.geojson") %>%
  st_transform(st_crs(boston10))

ggplot() + 
  geom_sf(data=st_union(boston10), fill= "#D3D3D3", color = "white", size = 2) +
  geom_sf(data=MBTA, color = "black", size= 1) + 
  geom_sf(data=boston_stops, 
          aes(colour = stops %>% str_to_title()),
          show.legend = "point", size= 1.5) + 
  scale_colour_manual(values = c("#7eb0d5","#b2e061", "#ffb55a", "#ffee65", "#fd7f6f", "#bd7ebe")) +
  labs(title = "Boston MBTA Rapid Transit Stops",
       caption = "Data: MassGIS 2022",
       colour = "Lines") +
  theme_light() +
  theme(plot.title = element_text(size = 12, face = "bold"))

```

To delineate census tracts that are in close proximity to transit stations, we delineated a half mile area for each transit stations, and compared them to all the centroids (center ponits) of census tracts. If the centroid is located within the half mile area, we categorize that tract as "TOD"; otherwise, this tract is considered "Non-TOD". An additional data manipulation here is to adjust median rent and median household income for inflation.  

```{r delineate_TOD, warning = FALSE, message=FALSE, class.source = 'fold-show'}

stops_buffer <- st_union(st_buffer(boston_stops, 804)) %>% st_sf() 

allTracts.group <- 
  rbind(
    st_centroid(allTracts)[stops_buffer,] %>%
      st_drop_geometry() %>%
      left_join(allTracts) %>%
      st_sf() %>%
      mutate(TOD = "TOD"),
    st_centroid(allTracts)[stops_buffer, op = st_disjoint] %>%
      st_drop_geometry() %>%
      left_join(allTracts) %>%
      st_sf() %>%
      mutate(TOD = "Non-TOD")) %>%
  mutate(MedRent.inf = ifelse(year == "2010", MedRent * 1.17, MedRent)) %>% 
  mutate(MedHHInc.inf = ifelse(year == "2010", MedHHInc * 1.17, MedHHInc)) 

```

The map below shows the distribution of TOD and Non-TOD tracts in Boston, using the above method. 

```{r visualize_tod,  warning = FALSE}

allTracts.group %>% 
  filter(year == 2010 ) %>% 
  ggplot() + 
  geom_sf(aes(fill = TOD), color = "transparent") +
  scale_fill_manual(values = c("#7eb0d5", "#b2e061")) + 
  geom_sf(data=boston_stops, 
          color = "#fd7f6f", size= 0.8)+
  labs(title = "Transient Oriented Development Tracts",
       subtitle = "Tracts within 0.5 mile of transit stops",
       caption = "Data: American Community Survey 2010 and 2019",
       fill = "Tracts Types") +
  theme_light() +   
  theme(plot.subtitle = element_text(size = 9,face = "italic"),
        plot.title = element_text(size = 12, face = "bold"))


```

# Data Analysis

## Demographic Characteristics 

We first looked at the temporal changes in **aging population** distribution. The map below reveals that the number of populations that are 70 and above increases in Boston overall. Yet, most aging population live in Non-TOD tracts. This pattern is consistent through the years as we can see that TOD tracts are relatively "younger" than Non-TOD tracts. This loosely implies that in Boston, transit-rich neighborhoods are favored by the younger cohort, probably because those people have a higher demand for transit to get to school and work. 

```{r percentage_aging, warning = FALSE}

allTracts.group %>% 
  ggplot()+
  geom_sf(aes(fill=pctAging), color="transparent") +
  scale_fill_continuous(low = "#FAF9F6", high = "blue4", name= "Percentage Aging", breaks = c(0, 0.1, 0.2, 0.3))+
  facet_wrap(~year) +
  geom_sf(data=stops_buffer, color = "red", fill = "transparent", linewidth = 0.4) +
  theme(legend.position = "bottom",
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        plot.subtitle = element_text(size = 9,face = "italic"),
        plot.title = element_text(size = 12, face = "bold"),
        panel.background = element_blank(),
        panel.border = element_rect(colour = "grey", fill=NA, size=0.8)
        ) + 
  labs(title = "Percentage of Aging Population in Boston",
       subtitle = "Red circles indicate areas within 0.5 mile of a transit stops",
       caption = "Data: American Community Survey 2010 and 2019 \n MassGIS 2022") 
 

```

We then looked at the distribution of **Non-US citizens** in Boston. We selected this demographic variable because we expected that non-US citizens are more likely to live in transit-rich neighborhoods because these places often provide easy access to major job centers, business districts, and employment opportunities, which is especially attractive to foreigners seeking employment in urban areas. Yet, the map below reveals only a slight difference between number of non-US citizens between TOD and Non-TOD tracts, where TOD tracts attract slightly more foreigners than Non-TOD tracts, and this pattern is consistent throughout years. 

```{r percent_foreigner, warning = FALSE}

allTracts.group %>% 
  ggplot()+
  geom_sf(aes(fill=pctForeigner), color="transparent") +
  scale_fill_continuous(low = "#FAF9F6", high = "blue4", name= "Percentage Foreigner", breaks = c(0, 0.2, 0.4, 0.6))+
  facet_wrap(~year) +
  geom_sf(data=stops_buffer, color = "red", fill = "transparent", linewidth = 0.4) +
 theme(legend.position = "bottom",
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        plot.subtitle = element_text(size = 9,face = "italic"),
        plot.title = element_text(size = 12, face = "bold"),
        panel.background = element_blank(),
        panel.border = element_rect(colour = "grey", fill=NA, size=0.8)
        ) + 
  labs(title = "Percentage of Non-US Citizens in Boston",
       subtitle = "Red circles indicate areas within 0.5 mile of a transit stops",
       caption = "Data: American Community Survey 2010 and 2019 \n MassGIS 2022") 
 
```

We also looked at the distribution of **racial minority population** in Boston. The map below shows that minority population are concentrated in southern Boston, mainly in Roxbury and Dorchester. More importantly, these neighborhoods are mostly made up of Non-TOD tracts. This pattern is consistent over time and in 2019, we are seeing an increasing number of minority population living in the same neighborhoods as in 2010. 

```{r percent_minorities, warning = FALSE}
allTracts.group %>% 
  ggplot()+
  geom_sf(aes(fill=pctMinority), color="transparent") +
  scale_fill_continuous(low = "#FAF9F6", high = "blue4", name= "Percentage Minority", breaks = c(0, 0.3, 0.6, 0.9))+
  facet_wrap(~year) +
  geom_sf(data=stops_buffer, color = "red", fill = "transparent", linewidth = 0.4) +
 theme(legend.position = "bottom",
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        plot.subtitle = element_text(size = 9,face = "italic"),
        plot.title = element_text(size = 12, face = "bold"),
        panel.background = element_blank(),
        panel.border = element_rect(colour = "grey", fill=NA, size=0.8)
        ) + 
  labs(title = "Percentage of Minorities in Boston",
       subtitle = "Red circles indicate areas within 0.5 mile of a transit stops",
       caption = "Data: American Community Survey 2010 and 2019 \n MassGIS 2022") 
```


## Economic Characteristics

For economic variables, we first looked at **median rent** in Boston. The map below reveals that rent is significantly higher in TOD tracts than Non-TOD tracts in both 2010 and 2019. This implies that transit-rich neighborhoods are also more expensive. In addition, after adjusting for inflation, we also see that median rent has increased significantly in all census tracts in Boston, and this increase in rent is most salient in city center, where multiple transit lines meet. 

```{r median_rent, warning = FALSE}

allTracts.group %>% 
  ggplot()+
  geom_sf(aes(fill=MedRent.inf), color="transparent") +
  scale_fill_continuous(low = "#FAF9F6", high = "seagreen4", name= "Rent")+
  facet_wrap(~year) +
  geom_sf(data=stops_buffer, color = "red", fill = "transparent", linewidth = 0.4) +
   theme(legend.position = "bottom",
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        plot.subtitle = element_text(size = 9,face = "italic"),
        plot.title = element_text(size = 12, face = "bold"),
        panel.background = element_blank(),
        panel.border = element_rect(colour = "grey", fill=NA, size=0.8)
        ) + 
  labs(title = "Median Rent in Boston",
       subtitle = "Red circles indicate areas within 0.5 mile of a transit stops \n2019 data adjusted for inflation",
       caption = "Data: American Community Survey 2010 and 2019 \n MassGIS 2022") 

```

Looking at the percentage of **people under poverty** in Boston, we can see that in both 2010 and 2019, neighborhoods in southern Boston (Roxbury and Dorchester) have higher poverty rates than other neighborhoods. These neighborhoods are mostly located in Non-TOD tracts. Yet, we are also able to see poverty rate is decreasing over the years. In 2010, a significant number of TOD tracts in western Boston have higher poverty rate than others. In 2019, however, we see that the condition improved, with most census tract with higher poverty rate concentrated in south Boston.  

```{r poverty, warning = FALSE}

allTracts.group %>% 
  ggplot()+
  geom_sf(aes(fill=pctPoverty), color="transparent") +
  scale_fill_continuous(low = "#FAF9F6", high = "seagreen4", name= "Percentage Poverty")+
  facet_wrap(~year) +
  geom_sf(data=stops_buffer, color = "red", fill = "transparent", linewidth = 0.4) +
 theme(legend.position = "bottom",
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        plot.subtitle = element_text(size = 9,face = "italic"),
        plot.title = element_text(size = 12, face = "bold"),
        panel.background = element_blank(),
        panel.border = element_rect(colour = "grey", fill=NA, size=0.8)
        ) + 
  labs(title = "Percentage Poverty in Boston",
       subtitle = "Red circles indicate areas within 0.5 mile of a transit stops",
       caption = "Data: American Community Survey 2010 and 2019 \n MassGIS 2022")

```

The **median household income** data is not ideal as it contains a lot of missing data entries. However, it still informs us that in both 2010 and 2019, households living in TOD tracts have higher median household income. This is consistent with previous maps, where TOD tracts also have higher rent and lower poverty rates. It is also worth pointing out that median household income in TOD tracts is the highest in the city center. As the distance from city center increases, the difference in median household income between TOD and Non-TOD tracts become less prominent. 

```{r median_income, warning = FALSE}

allTracts.group %>% 
  mutate(inc = MedHHInc.inf/1000) %>% 
  ggplot()+
  geom_sf(aes(fill=inc), color="transparent") +
  scale_fill_continuous(low = "#FAF9F6", high = "seagreen4", name= "Income (thousands)")+
  facet_wrap(~year) +
  geom_sf(data=stops_buffer, color = "red", fill = "transparent", linewidth = 0.4) +
  theme(legend.position = "bottom",
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        plot.subtitle = element_text(size = 9,face = "italic"),
        plot.title = element_text(size = 12, face = "bold"),
        panel.background = element_blank(),
        panel.border = element_rect(colour = "grey", fill=NA, size=0.8)
        ) + 
  labs(title = "Median Household Income in Boston",
       subtitle = "Red circles indicate areas within 0.5 mile of a transit stops\n2019 data adjusted for inflation",
       caption = "Data: American Community Survey 2010 and 2019 \n MassGIS 2022")

```


## Social Characteristics 

From the map below, we can see that in both 2010 and 2019, the majority of houses in Boston are **renter occupied**, with houses in TOD tracts, especially those tracts in city center, appear slightly more preferable to renters. Considering that there are a lot of universities in Boston and that some of these universities have their own transit station on campus, we suspect that a large number of renters of students, who chose to live around.  

```{r renters, warning = FALSE}

allTracts.group %>% 
  ggplot()+
  geom_sf(aes(fill=pctRenter), color="transparent") +
  scale_fill_continuous(low = "#FAF9F6", high = "#0047AB", name= "Percentage Renters")+
  facet_wrap(~year) +
  geom_sf(data=stops_buffer, color = "red", fill = "transparent", linewidth = 0.4) +
   theme(legend.position = "bottom",
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        plot.subtitle = element_text(size = 9,face = "italic"),
        plot.title = element_text(size = 12, face = "bold"),
        panel.background = element_blank(),
        panel.border = element_rect(colour = "grey", fill=NA, size=0.8)
        ) + 
  labs(title = "Percentage Renters in Boston",
       subtitle = "Red circles indicate areas within 0.5 mile of a transit stops",
       caption = "Data: American Community Survey 2010 and 2019 \n MassGIS 2022") 

```

We see a clear difference in spatial pattern here when considering whether people living in Boston **own a car** or not. Compared to Non-TOD tracts, a significantly higher number of people in TOD tracts do not have a car. This could imply that people chose to live in transit-rich neighborhood because of the "transit" itself.  

```{r no_car, warning = FALSE}

allTracts.group %>% 
  ggplot()+
  geom_sf(aes(fill=pctNoCar), color="transparent") +
  scale_fill_continuous(low = "#FAF9F6", high = "#0047AB", name= "Percentage No Vehicle", breaks = c(0, 0.2, 0.4, 0.6))+
  facet_wrap(~year) +
  geom_sf(data=stops_buffer, color = "red", fill = "transparent", linewidth = 0.4) +
   theme(legend.position = "bottom",
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        plot.subtitle = element_text(size = 9,face = "italic"),
        plot.title = element_text(size = 12, face = "bold"),
        panel.background = element_blank(),
        panel.border = element_rect(colour = "grey", fill=NA, size=0.8)
        ) + 
  labs(title = "Percentage of Population without Car in Boston",
       subtitle = "Red circles indicate areas within 0.5 mile of a transit stops",
       caption = "Data: American Community Survey 2010 and 2019 \n MassGIS 2022") 


```

Lastly, we looked at **education attainment** for Bostonians. From 2010 to 2019, the number of people with a least a bachelor’s degree increased in Boston, and this increase is the most salient in TOD tracts. Again, census tracts with low education attainment are Non-TOD tracts, and are concentrated in south Boston. 


```{r percentage_bachelor, warning = FALSE}

allTracts.group %>% 
  ggplot()+
  geom_sf(aes(fill=pctBachelors), color="transparent") +
  scale_fill_continuous(low = "#FAF9F6", high = "#0047AB", name= "Percentage Bachelors", breaks = c(0, 0.3, 0.6, 0.9))+
  facet_wrap(~year) +
  geom_sf(data=stops_buffer, color = "red", fill = "transparent", linewidth = 0.4) +
   theme(legend.position = "bottom",
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        plot.subtitle = element_text(size = 9,face = "italic"),
        plot.title = element_text(size = 12, face = "bold"),
        panel.background = element_blank(),
        panel.border = element_rect(colour = "grey", fill=NA, size=0.8)
        ) + 
  labs(title = "Percentage of Population with at Least a Bacehlor's Degree in Boston",
       subtitle = "Red circles indicate areas within 0.5 mile of a transit stops",
       caption = "Data: American Community Survey 2010 and 2019 \n MassGIS 2022") 

```

## Quick Summary 

To support our observations above, we further calculated the mean for all of our variables, which allows us to better examine changes in population characteristics across space and time as a whole. 

```{r all_variables_summarize, warning = FALSE, class.source = 'fold-show', results='hide', message=FALSE}

allTracts.Summary <- 
  st_drop_geometry(allTracts.group) %>%
  group_by(year, TOD) %>%
  summarize(Rent = mean(MedRent.inf, na.rm = T),
            Income = mean(MedHHInc.inf, na.rm = T),
            Pop = mean(TotalPop, na.rm = T),
            PctMinority = mean(pctMinority, na.rm = T),
            PctAging = mean(pctAging, na.rm = T),
            PctBach = mean(pctBachelors, na.rm = T),
            PctPoverty = mean(pctPoverty, na.rm = T),
            PctNoCar = mean(pctNoCar, na.rm = T),
            PctRenter = mean(pctRenter, na.rm = T),
            PctForeign = mean(pctForeigner, na.rm = T))

```

The result is shown in the table as well as the chart below. 

```{r all_variables_table, warning = FALSE}

kable(allTracts.Summary, format = "html") %>%
 kable_styling(full_width = T, 
               bootstrap_options = c("striped", "hover"),
               font_size = 12) %>% 
  footnote(general_title = "\n",
           general = "Table 1: Comparaing Population Characteristics Across Space and Time ") %>% 
  row_spec(0, font_size = 14) %>% 
  row_spec(1, color = "white", background = "#7eb0d5") %>% 
  row_spec(2, color = "white", background = "#b2e061") %>% 
  row_spec(3, color = "white", background = "#7eb0d5") %>% 
  row_spec(4, color = "white", background = "#b2e061") %>% 
  column_spec(1:2, bold = T)

# 

```

Who is living around transit stations in Boston, what are their social, economic, and demographic characteristics, and do they vary across time? Our analysis have found the following to answer our first and second questions: 1)	Populations living in transit-rich neighborhoods in Boston are of relatively younger age, are mostly renters, have higher education attainment, have higher median household income, and do not own any vehicles. 2) On the other hand, Non-TOD neighborhoods in Boston tend to have higher concentration of aging population and minority population as well as lower education attainment. 3) Gentrification remains prevalent in the city as we see citywide increase in median household rent and median household income. In addition, the city is becoming more and more diverse; its poverty rate has been reducing while overall and individuals’ education attainment has been increasing. 


```{r all_variables_plot, warning = FALSE}

allTracts.Summary %>%
  gather(Variable, Value, -year, -TOD) %>%
  ggplot(aes(year, Value, fill = TOD)) +
  geom_bar(stat = "identity", position = "dodge") +
  facet_wrap(~Variable, scales = "free", ncol=5) +
  scale_fill_manual(values = c("#7eb0d5", "#b2e061")) +  
  labs(title = "Indicator Differences Across Time and Space",
       caption = "Data: American Community Survey 2010 and 2019") +
  theme(legend.position = "bottom",
        axis.ticks.x=element_blank(),
        axis.ticks.y=element_blank(),
        plot.title = element_text(size = 12, face = "bold"),
        panel.background = element_blank(),
        panel.border = element_rect(colour = "grey", fill=NA, size=0.8)
        ) 
```

## Further Analysis 

We did some further analysis to rearrange our tables and calculate percentage point change. This includes pivoting the table twice to make year 2010 and 2019 two separate columns 

```{r percentage_change, warning = FALSE, class.source = 'fold-show'}

allTracts.Long <- allTracts.Summary %>%
  pivot_longer(cols = c(Rent, Income, Pop, PctMinority, PctAging, PctBach, PctPoverty, PctNoCar, PctRenter, PctForeign), names_to = "Variable", values_to = "Value") %>%
  pivot_wider(names_from = year, values_from = Value, names_prefix = "Year") %>% 
  mutate(pct_change = (Year2019-Year2010) / Year2010)

```

Looking at the percentage change of variables between Non-TOD and TOD tracts, we may see that from 2010 to 2019, 1) median house rent and median household income increases significantly in TOD tracts compared to Non-TOD, 2) poverty rate was significantly reduced in TOD tracts compared to Non-TOD, 3) population increase is more salient in Non-TOD tracts than TOD. 

```{r all_variables_bar, warning = FALSE}

allTracts.Long %>% 
  ggplot() +
  geom_bar(aes(x = Variable, y = pct_change, fill = TOD),
           stat = "identity") +
  facet_wrap(~TOD) +
  coord_flip() +
  scale_fill_manual(values = c("#7eb0d5", "#b2e061")) +  
  labs(title = "Percentage Change 2010-2019 for TOD and Non-TOD Tracts",
       caption = "Data: American Community Survey 2010 and 2019") +
  ylab ("Percentage Change") +
  theme(legend.position = "bottom",
        axis.ticks.x=element_blank(),
        axis.ticks.y=element_blank(),
        plot.title = element_text(size = 12, face = "bold"),
        panel.background = element_blank(),
        panel.border = element_rect(colour = "grey", fill=NA, size=0.8)
        ) 
  


```

We also looked into see if there's intra-regional variations of our variables among TOD tracts, using population and median house rent as an example. To achieve that, we joined the centroids of census tracts into the half mile buffer of transit stations, which we then summarized the total population and median rent within half mile of each transit station. 


```{r graduated_prep, warning = FALSE, message=FALSE, class.source = 'fold-show'}

boston_stops$ID <- seq_along(boston_stops$STATION) # unique id for each station

stop_attributes <- st_buffer(boston_stops, 804) %>% 
  st_sf() %>% 
  st_intersection(st_centroid(boston10)) %>% 
  st_drop_geometry() %>% 
  group_by(ID) %>% 
  summarize(
    total_population = sum(TotalPop),
    median_rent = mean(MedRent, na.rm = T))

stops_summary <- boston_stops %>% 
  left_join(stop_attributes, by = "ID")

```

Not surprisingly, the map below shows that transit station in city center attracts more people to live nearby while as distance from the city center gets greater, the dot size becomes smaller, meaning there's less people living nearby. 

```{r format_dots, warning=FALSE, include=FALSE}
boston10_4326 <- st_transform(boston10, crs = 4326)
stops_summary_4326 <- st_transform(stops_summary, crs = 4326)

stops_summary_coords <- st_coordinates(stops_summary_4326)
stops_summary_df <- data.frame(
  X = stops_summary_coords[, 1],
  Y = stops_summary_coords[, 2],
  total_population = stops_summary$total_population,  # Include other attributes here
  rent = stops_summary$median_rent, 
  stops = stops_summary$stops# Include additional attributes as needed
)

```



```{r graduated_pop, warning=FALSE}

ggplot() +
  geom_sf(data=st_union(boston10_4326), fill= "#D3D3D3", color = "white", size = 2) +
  geom_point(data = stops_summary_df, aes(x = X, y = Y, size = total_population, color = str_to_title(stops))) +
  scale_size(range = c(0.5, 4.5), name = "Population") +    
  scale_colour_manual(values = c("#7eb0d5","#b2e061", "#ffb55a", "#ffee65", "#fd7f6f", "#bd7ebe"), name = "Stops") +
  labs(title = "Population within 0.5 Mile of Tansit Stops",
       subtitle = "Dot size proportional to population",
       caption = "Data: American Community Survey 2019") +
  theme_light() +   
  theme(plot.subtitle = element_text(size = 9,face = "italic"),
        plot.title = element_text(size = 12, face = "bold"))
```


Similarly, median rent is the highest around transit stations in the city center.This shows that the influence of transit station on nearby population and settlements characteristics are not the same throughout the city.  

```{r graduated_rent, warning=FALSE, warning=FALSE}


ggplot() +
  geom_sf(data=st_union(boston10_4326), fill= "#D3D3D3", color = "white", size = 2) +
  geom_point(data = stops_summary_df, aes(x = X, y = Y, size = rent, color = str_to_title(stops))) +
  scale_size(range = c(0.5, 4), name = "Rent") +    
  scale_colour_manual(values = c("#7eb0d5","#b2e061", "#ffb55a", "#ffee65", "#fd7f6f", "#bd7ebe"), name = "Stops") +
  labs(title = "Median Rent within 0.5 Mile of Tansit Stops",
       subtitle = "Dot size proportional to rent",
       caption = "Data: American Community Survey 2019") +
  theme_light() +   
  theme(plot.subtitle = element_text(size = 9,face = "italic"),
        plot.title = element_text(size = 12, face = "bold"))
```

Moreover, we were curious if there is a linear relationship between distance from transit stops and rent? We therefore created a multiple ring buffer, where each buffer is of half a mile distance to the previous one, to answer this question.  

```{r boston_MRB, warning = FALSE, results='hide'}

boston_MRB <- multipleRingBuffer(st_union(boston_stops), 14484, 804)
ggplot() +
    geom_sf(data=boston_MRB, aes(fill = distance), color = "#708090") +
    scale_fill_continuous(low = "#FAF9F6", high = "#beb9db", name= "Distance")+
    geom_sf(data=st_union(boston10), fill= "#D3D3D3", color = "white", size = 2) +
    geom_sf(data=boston_stops, 
          color = "#fd7f6f", size= 0.8) +    
    labs(title="Half Mile Buffers",
         subtitle = "Distance in-between each buffer is 0.5 mile",
         caption = "Data: American Community Survey 2010 and 2019 \nMassGIS 2022") +
    theme_light() +   
    theme(plot.subtitle = element_text(size = 9,face = "italic"),
          plot.title = element_text(size = 12, face = "bold"))

```

The figure below shows that overall, rent decreases as distance from transit stops gets greater. However, the relationship is not linear as we see that rent is increasing from 2500 to 4000 meters. This implies that transit stations is not the only factor defining rent. In other words, it could be households are willing to pay more for transit amenities, or that they pay for other types of amenities in neighborhoods that happen to also be transit-rich: proximity to transit station only partially influence their decisions to live in nearby locations. 

```{r line_plot, warning = FALSE, message = FALSE, results='hide'}

allTracts.rings <-
  st_join(st_centroid(dplyr::select(allTracts, GEOID, year)),
          boston_MRB) %>%
  st_drop_geometry() %>%
  left_join(dplyr::select(allTracts, GEOID, MedRent, year), 
            by=c("GEOID"="GEOID", "year"="year")) %>%
  st_sf() 


allTracts.rings.summary <- st_drop_geometry(allTracts.rings) %>%
  group_by(distance, year) %>%
  summarize(Mean_Rent = mean(MedRent, na.rm=T))


ggplot(allTracts.rings.summary, aes(x = distance, y = Mean_Rent)) + 
  geom_line(aes(color = year), size = 2) + 
  scale_color_manual(values = c("#7eb0d5", "#b2e061"), name = "Year") +
  labs(title = "Mean Rent as a Function of Distance to Subway Stations",
       caption = "Data: American Community Survey 2010 and 2019") +
  xlab ("Distance from Transit Stops") +
  ylab("Mean Rent") + 
  theme(legend.position = "bottom",
        plot.title = element_text(size = 12, face = "bold"),
        panel.background = element_blank(),
        panel.border = element_rect(colour = "grey", fill=NA, size=0.8)
        ) 

```

# Future Planning 

To quickly summarize, Boston is gentrifying between 2010 and 2019, with increasing population, diversity, median household income, median rent, and education attainment as well as decreasing poverty rate throughout the city. There's difference in terms of population characteristics between TOD and Non-TOD regions, where TOD attracts mostly renters who are younger, wealthier, and well-educated, whereas Non-TOD tracts' households consist larger concentration of minority and aging population with lower education attainment. 

In addition, there's intra-regional difference within TOD tracts, where median rent for houses next to transit stations that are in city center being the most expensive. Further, rent did not only decrease as we move away from transit stations. This leads us to conclude that not all transit stations are as appealing as each other. Proximity to transit station only affects people's decision to live in nearby locations to a certain extent. **TOD neighborhoods in Boston is able to attract younger and wealthier individual not only because they offer convenience in public transportation, but also could because Bostonians value the cultural amenities, entertainment & nightlife, as well as employment and academic opportunities that are happened to be within half a mile of these transit stops.** 

While Boston is a highly walkable city, planning official should still work towards making public transportation more accessible to neighborhoods that are currently. Non-TOD and developing parcels near transit stations, which could alleviate the housing and amenity shortages at city center and reduce intra-regional disparities. Despite that Non-TOD neighborhood have higher car ownecrship rate, this does not mean that public transportation planning is not important in these neighborhoods. For places like the city center where it is hard to get to through driving and where business and intellectual opportunities accumulate, public transportation is the best way to connect different neighborhoods together.  Indeed, this aligns with the city's goal to "connect all neighborhood centers into a comprehensive transit network". 

With the current data we have, how do we improve public transit accessibility in Non-TOD neighborhoods in Boston? We attempted to answer this question with a bivariate map, considering the possibility of extending existing transit lines and adding a transit station in Non-TOD neighborhoods where there's both a high concentration of aging and minority population (two population subgroups that are often excluded in TOD).The map below highlights an area in southern Boston with high number of both population subgroups. We encouraged transportation planners to further investigate the possibility of future transit developments in these locations. 

```{r bivariate_map, warning = FALSE}


bi_data <- bi_class(boston19, x = pctMinority, y = pctAging, style = "quantile", dim = 3) #  a 3*3 color-schemed bi-variate choropleth 


bi_map <- ggplot() +
  geom_sf(data = bi_data, mapping = aes(fill = bi_class), color = "transparent", show.legend = FALSE) +
  bi_scale_fill(pal = "DkCyan", dim = 3) + 
  labs(title = "Bivariate Choropleth of Pct Minority over Pct Aging Population") + 
  geom_sf(data = boston_stops, colour = "yellow", alpha = 0.8, size = 0.5) +
  labs(caption = "Data: American Community Survey 2019") +
  theme_light() +
  theme(plot.title = element_text(size = 12, face = "bold"))

legend <- bi_legend(pal = "DkCyan",
                    dim = 3,
                    xlab = "PctMinority ",
                    ylab = "PctAging ",
                    size = 5.8)

ggdraw() +
  draw_plot(bi_map, 0, 0, 1, 1) +
  draw_plot(legend, 0.1, 0.2, 1.2, 0.23)
```

Finally, there is one important limitation of this study that has to be addressed. We mentioned before that the MBTA transit lines go beyond the boundary of Boston cities. In this case, it is likely that some census tracts at the edge of Boston that are categorized as Non-TOD are actually TOD.Future study at a regional scale should consider examine TOD development in Boston considering all transit stops in the Greater Boston Areas as well as all types of transit stops. 

