= gpd.read_file('######')
pa_tracts # dissolve geometry to get the boundary
= pa_tracts.dissolve()
pa_bound # convert the geometry into a format suitable for gee
= ee.Geometry.Polygon(list(pa_bound['geometry'].iloc[0].exterior.coords))
pa_geom= ee.FeatureCollection(pa_geom) aoi
Raster Data and Google Earth Engine
To get remote sensing data, we used the following packages. The ee
package (Earth Engine Python API) facilitates interaction with Google Earth Engine for large-scale geospatial processing. geemap
supports visualization and analysis of Earth Engine data in Python. os
is used for interacting with the file system, and requests
handles HTTP requests to retrieve data from web sources. The datetime
module manages date and time operations, while geopandas
and pandas
are essential for working with spatial and tabular data, respectively, to load and further process downloaded data.
Note: Since the raster data used in this project are relatively large in size, all operations were conducted on Google Colab, which provides a more seamless environment for interacting with Google Earth Engine and is better at handling the processing of large datasets. The code here are just for demonstration purposes. All file paths have been removed.
If you wish to reproduce this study, this project requires authentication with Google Earth Engine (GEE) to access its geospatial data and computational capabilities. To begin, users must authenticate using ee.Authenticate()
and initialize the connection to their GEE project with ee.Initialize(project="your_project_name")
. Ensure you have a valid Google Earth Engine account and have created a project within your GEE console. These steps enable the seamless integration of Earth Engine operations within the Python environment.
Additionally, users need to establish a connection with your Google Drive to store and manage data files. This is done using the following commands: from google.colab import drive
then drive.mount('/content/drive')
Study Area Preparation
To limit our study area to the state of Pennsylvania, we retrieved the geojson file of all Pennsylvania census tracts from the Open Data Pennsylvania. Next, the code dissolves the geometries of all features in the GeoDataFrame pa_tracts
into a single geometry representing the boundary of Pennsylvania. Then, it extracts the exterior coordinates of the dissolved boundary geometry and converts them into a list format suitable for google earth engine. This list of coordinates is used to create a polygon geometry pa_geom
using the ee.Geometry.Polygon(
) function from the Earth Engine Python API.
The polygon geometry representing the boundary of Pennsylvania is converted into an Earth Engine FeatureCollection (aoi). This FeatureCollection serves as the study area extent for subsequent analysis within Google Earth Engine and is used later in the study to clip the satellite images.
We also need to simplify the geometries in pa_tracts
by reducing the number of vertices in each geometry using the specified tolerance of 0.01. The simplify method ensures that the topological integrity of the geometries is preserved. After simplification, the modified GeoDataFrame is converted to a Google Earth Engine (GEE) object using the geemap.geopandas_to_ee
function, allowing for further geospatial analysis in the GEE platform, specifically when performing zonal statistics.
= 0.01
tolerance 'geometry'] = pa_tracts['geometry'].simplify(tolerance, preserve_topology=True)
pa_tracts[= geemap.geopandas_to_ee(pa_tracts) pa_tracts_ee
Loading Landsat Data
We acquired all Landsat 8 level 2 images satellite images from Spring 2022 to Spring 2023 over the study area. These images were obtained via the Google Earth Engine API and are coded in the Earth Engine Data Catalog as USGS Landsat 8 Level 2, Collection 2, Tier 1. We acquired those images by season and took the average before further processing.
This specific version and level of processing of Landsat 8 satellite imagery provided by the United States Geological Survey uses Surface Reflectance Code to generate products with geometrical, radiometric, and atmospheric corrections. These products have a spatial resolution of 30 m. The products used in this study as predictors are the surface reflectance OLI bands, brightness temperature (BT), and some pre-processed indexes, such as the normalized difference vegetation index (NDVI), the soil-adjusted vegetation index (SAVI), and the enhanced vegetation index (EVI). Moreover, the images were processed to scale alll the bands and remove cloud coverage.
Since we are taking into the account the impact of seasonal variabtions of vegetation indices on COPD, let’s first set up the constants that define the temporal extent of our study and transform them into a format that Earth Engine expects.
## Define Time Period
= datetime(2022, 3, 1) # spring
startSpring = datetime(2022, 5, 31)
endSpring = datetime(2022, 6, 1) # summer
startSummer = datetime(2022, 8, 31)
endSummer = datetime(2022, 9, 1) # fall
startFall = datetime(2022, 11, 30)
endFall = datetime(2022, 12, 1) # winter
startWinter = datetime(2023, 2, 28)
endWinter
# Format dates into strings that Earth Engine expects ("YYYY-MM-DD")
= startSpring.strftime('%Y-%m-%d')
startSpring= endSpring.strftime('%Y-%m-%d')
endSpring = startSummer.strftime('%Y-%m-%d')
startSummer = endSummer.strftime('%Y-%m-%d')
endSummer = startFall.strftime('%Y-%m-%d')
startFall = endFall.strftime('%Y-%m-%d')
endFall = startWinter.strftime('%Y-%m-%d')
startWinter = endWinter.strftime('%Y-%m-%d') endWinter
The function apply_scale_factors
applies scale and offset adjustments to Landsat satellite imagery bands, specifically for optical and thermal bands. In the Landsat data, pixel values are often represented as digital numbers (DN) which require conversion to physical units like reflectance or temperature.
## Helper Function - Scale Bands
def apply_scale_factors(image):
# Scale and offset values for optical bands
= image.select('SR_B.').multiply(0.0000275).add(-0.2)
optical_bands
# Scale and offset values for thermal bands
= image.select('ST_B.*').multiply(0.00341802).add(149.0)
thermal_bands
# Add scaled bands to the original image
return image.addBands(optical_bands, None, True) \
None, True) .addBands(thermal_bands,
The cloud_mask function
is designed to create a binary mask for identifying and masking out pixels affected by clouds and cloud shadows in Landsat satellite imagery. It plays a crucial role in pre-processing Landsat imagery by removing cloud and cloud shadow effects to enhance data quality and reliability for downstream analysis.
## Helper Function - Mask Clouds
def cloud_mask(image):
# Define cloud shadow and cloud bitmask (Bits 3 and 5)
= 1 << 3
cloud_shadow_bit_mask = 1 << 5
cloud_bit_mask
# Select the Quality Assessment (QA) band for pixel quality information
= image.select('QA_PIXEL')
qa
# Create a binary mask to identify clear conditions (both cloud and cloud shadow bits set to 0)
= qa.bitwiseAnd(cloud_shadow_bit_mask).eq(0) \
mask 0))
.And(qa.bitwiseAnd(cloud_bit_mask).eq(
# Update the original image, masking out cloud and cloud shadow-affected pixels
return image.updateMask(mask)
As we breifly mentioned in the introduction, we acquired four different indices based on the bands of our satellite images. They are NDVI, SAVI, EVI, and LST. This function below will calculate all the indices we need all at once. Among those, NDVI is a common vegetation index used to assess the presence and health of vegetation based on the difference in reflectance between near-infrared (NIR) and red light wavelengths.
The formula is NDVI = (Band 5 – Band 4) / (Band 5 + Band 4)
SAVI is a vegetation index similar to NDVI but incorporates a soil brightness correction factor to account for variations in soil reflectance.
The formulas is SAVI = ((Band 5 – Band 4) / (Band 5 + Band 4 + 0.5)) * (1.5)
EVI is a vegetation index designed to minimize the influence of atmospheric conditions and background noise on vegetation assessments.
The formula is EVI = 2.5 * ((Band 5 – Band 4) / (Band 5 + 6 * Band 4 – 7.5 * Band 2 + 1))
Finally, the calculation of land surface temperature (LST) needs to be broken down into the following steps:
Minimum and Maximum NDVI Calculation: It calculates the minimum and maximum NDVI values within the AOI using the reduceRegion() method. The reducer parameter specifies the type of aggregation (in this case,
min()
andmax()
)Fraction of Vegetation (FV) Calculation: It computes the Fraction of Vegetation (FV) using the
NDVI
values,NDVI_min
, andNDVI_max
obtained in the previous step. The formula calculates the square of the normalized difference betweenNDVI
andNDVI_min
divided by the difference betweenNDVI_max
andNDVI_min
. `Emissivity (EM) Calculation: It calculates the emissivity using the FV values obtained from the previous step. The formula computes the emissivity based on the FV values according to the provided equation.
Land Surface Temperature (LST) Calculation: It computes the Land Surface Temperature (LST) using the thermal band (Band 10) from the Landsat imagery and the emissivity values calculated earlier. The formula calculates the LST based on the Planck’s Law, considering the thermal band as temperature in Kelvin and the calculated emissivity.
def calculate_seasonal_indices(image_collection, aoi, season_name):
# Calculate NDVI
= image_collection.normalizedDifference(['SR_B5', 'SR_B4']).rename('NDVI')
ndvi
# Calculate SAVI
= image_collection.expression(
savi '1.5 * (NIR - RED) / (NIR + RED + 0.5)', {
'NIR': image_collection.select('SR_B5'),
'RED': image_collection.select('SR_B4')
}'SAVI')
).rename(
# Calculate EVI
= image_collection.expression(
evi '2.5 * ((NIR - RED) / (NIR + 6 * RED - 7.5 * BLUE + 1))', {
'NIR': image_collection.select('SR_B5'),
'RED': image_collection.select('SR_B4'),
'BLUE': image_collection.select('SR_B2')
}'EVI')
).rename(
# NDVI min and max for Fraction of Vegetation (FV) calculation
= ndvi.reduceRegion(
ndvi_min =ee.Reducer.min(),
reducer=aoi,
geometry=30,
scale=1e9
maxPixels'NDVI')
).get(
= ndvi.reduceRegion(
ndvi_max =ee.Reducer.max(),
reducer=aoi,
geometry=30,
scale=1e9
maxPixels'NDVI')
).get(
# Convert NDVI_min and NDVI_max to ee.Number
= ee.Number(ndvi_min)
ndvi_min = ee.Number(ndvi_max)
ndvi_max
# Fraction of Vegetation (FV)
= ndvi.subtract(ndvi_min).divide(ndvi_max.subtract(ndvi_min)).pow(2).rename('FV')
fv
# Emissivity (EM)
= fv.multiply(0.004).add(0.986).rename('EM')
em
# Thermal band (Band 10)
= image_collection.select('ST_B10').rename('thermal')
thermal
# Land Surface Temperature (LST)
= thermal.expression(
lst '(TB / (1 + (0.00115 * (TB / 1.438)) * log(em))) - 273.15',
{'TB': thermal.select('thermal'), # Thermal band temperature in Kelvin
'em': em # Emissivity
}'LST')
).rename(
= ndvi.addBands([savi, evi, fv, em, lst])
seasonal_image return seasonal_image
We define a dictionary that maps each season to its corresponding start and end dates. The code then iterates through each season, filters a Landsat 8 image collection based on the region of interest (aoi) and the defined seasonal date range, applies scale factors and a cloud mask, computes the median composite of the images, and clips the result to the aoi. For each season, the processed image collection is passed to the calculate_seasonal_indices function
, which calculates season-specific indices (e.g., NDVI, SAVI).
= {
seasons 'spring': (startSpring, endSpring),
'summer': (startSummer, endSummer),
'fall': (startFall, endFall),
'winter': (startWinter, endWinter)
}
= {}
seasonal_results for season, (start_date, end_date) in seasons.items():
= ee.ImageCollection("LANDSAT/LC08/C02/T1_L2") \
image_collection \
.filterBounds(aoi) \
.filterDate(start_date, end_date) map(apply_scale_factors) \
.map(cloud_mask) \
.\
.median()
.clip(aoi)
= calculate_seasonal_indices(image_collection, aoi, season) seasonal_results[season]
Next, the function below helps to summarize our calculated indicies into census tracts using zonal statistics. This is one of the easiest method of brining raster information into vector geometries. In Earth Engine, zonal statistics can be calculated by aggregating pixel values within geometries defined by feature collections (in our case, the census tracts). Common summary statistics computed for each zone include mean, sum, minimum, maximum, standard deviation, and percentile values (in our case, we will calculate the mean). We wrote the export zonal statistics function and by default, it will write the output directly to google drive.
# Function to export zonal stats to Google Drive
def export_zonal_stats(image, reducer, file_name, folder_name="######"):
= image.reduceRegions(
zonal_stats =pa_tracts_ee,
collection= ee.Reducer.mean()
reducer =30 # Resolution of the analysis
scale
)
= ee.batch.Export.table.toDrive(
task =zonal_stats,
collection='CSV',
fileFormat=file_name.replace('.csv', ''),
fileNamePrefix=folder_name
folder
)
task.start()print(f"Export started for {file_name}. Check Google Drive for the results.")
The code below will give us 16 separate csv files, each with four columns. The system:index is a unique identifier for each census tract, the mean is the mean pixels values, the .geo is the geometry of each census tracts, and GEOID is the identifier for each census tract.
= {
seasonal_results "spring": seasonal_results['spring'],
"summer": seasonal_results['summer'],
"fall": seasonal_results['fall'],
"winter": seasonal_results['winter']
}
= ['NDVI', 'EVI', 'SAVI', 'LST']
bands
for season, image in seasonal_results.items():
for band in bands:
= image.select(band) # Extract specific band
band_image = f"{band.lower()}_{season}.csv" # File name e.g., ndvi_spring.csv
file_name =band_image, reducer=reducer, file_name=file_name) export_zonal_stats(image
Loading Land Cover Data
We also acquired USGS National Land Cover Data for our study area. These images were also obtained via the Google Earth Engine API and are coded in the Earth Engine Data Catalog. We use the 2021 data because that is the latest release avaialbe on Earth Engine and we assume that the landcover of 2021 would match with that of 2022. Afterloading the data, filter is applied to extract the land cover data for the year 2021. The select method isolates the landcover band, which contains the land cover classification. Then, the data is clipped to the specified area of interest (aoi), creating a subset focused on Pennsylvania.
= ee.ImageCollection('USGS/NLCD_RELEASES/2021_REL/NLCD')
dataset = dataset.filter(ee.Filter.eq('system:index', '2021')).first()
nlcd2021 = nlcd2021.select('landcover')
landcover = landcover.clip(aoi) pa_landcover
This landcover dataset is publicly available and consists of 8 main land cover classes and 20 sub-classes of land cover. These classes include Water (open water and perennial ice/snow), Developed (high, medium, low and open space), Barren (rock/sand/clay), Forest (deciduous, evergreen and mixed), Shrubland (dwarf shrub and shrub), Herbaceous (grassland, sedge, lichens and moss), Planted/ Cultivated (pasture/hay and cultivated crops) and Wetlands (woody and emergent herbaceous). For detailed land cover classification, refer to the chart below:
In this study, various land cover types are grouped into broader categories based on their respective NLCD codes for simplicty of analysis. For example, high-density areas are defined as land cover types 23 and 24, which typically represent urban environments. Similarly, low-density areas (codes 21 and 22) correspond to open spaces like barren land and low-density residential areas. Forested areas are grouped under codes 41, 42, and 43, while grasses are categorized with codes 52, 71, 81, and 82. Wetlands and open water are also identified using their specific codes (90, 95, and 11). This categorization simplifies the analysis by grouping similar land cover types together based on their characteristics, enabling focused analysis of specific land use patterns.
= pa_landcover.eq(23).Or(pa_landcover.eq(24))
high_density = pa_landcover.eq(21).Or(pa_landcover.eq(22))
low_density = pa_landcover.eq(41).Or(pa_landcover.eq(42)).Or(pa_landcover.eq(43))
forest = pa_landcover.eq(52).Or(pa_landcover.eq(71)).Or(pa_landcover.eq(81)).Or(pa_landcover.eq(82))
grasses = pa_landcover.eq(90).Or(pa_landcover.eq(95))
wetlands = pa_landcover.eq(11) open_water
As briefly mentioned in the workflow, given that these proportions can be similar across multiple census tracts despite the underlying distribution of land cover being different (e.g., two census tracts can have the same percentage of highly developed land, but that land can be geographically concentrated in one tract and distributed in the other), we introduced additional features to represent the distribution of land cover features. For each census tract we calculate features representing i) the proportion of high developed land pixels neighboring another high developed land pixel and ii) the proportion of forest pixels neighboring another forest pixel. These features were intended to capture the notion that developed land corresponds to higher level of pollution while forests and trees can be a source of pollen, both of which have adverse impact on patients suffering from asthma and COPD.
The code below achieves this goal. It first creates a kernel for identifying neighboring pixels, then calculates the total number of land cover pixels within each region. It also computes the number of neighboring land cover pixels using a convolution operation to identify adjacent pixels. These calculations are performed using the reduceRegions
method to aggregate the data for each region of interest (e.g., census tracts) from the input land cover mask. The results are then merged, associating the total land cover and neighboring land cover values with their corresponding region identifiers (GEOID). Finally, the function exports the aggregated data as a CSV file to Google Drive.
def neighboring_landcover_metrics(landcover_mask, file_name):
# Define the kernel for neighboring pixels
= ee.Kernel.square(radius=1, units='pixels') # 3x3 neighborhood
kernel = landcover_mask.convolve(kernel).gte(1) # At least one neighbor
neighbors
# Calculate total landcover pixels
= landcover_mask.reduceRegions(
total_landcover =pa_tracts_ee,
collection=ee.Reducer.sum(),
reducer=30
scale'sum'], ['total_landcover'])
).select([
# Calculate neighboring landcover pixels
= neighbors.reduceRegions(
neighbor_landcover =pa_tracts_ee,
collection=ee.Reducer.sum(),
reducer=30
scale'sum'], ['neighbor_landcover'])
).select([
# Merge FeatureCollections and retain geoid
= total_landcover.map(lambda feature:
merged_fc set(
feature.'neighbor_landcover',
filter(ee.Filter.eq('system:index', feature.get('system:index')))
neighbor_landcover.
.first()'neighbor_landcover')
.get(set(
).'geoid', pa_tracts_ee.filter(ee.Filter.eq('system:index', feature.get('system:index')))
.first()'GEOID')
.get(
)
)
# Export the merged FeatureCollection
= ee.batch.Export.table.toDrive(
export_task =merged_fc.select(['geoid', 'total_landcover', 'neighbor_landcover']),
collection='#######',
folder=file_name,
fileNamePrefix='CSV'
fileFormat
)
export_task.start()print(f"Export task started: {file_name}")
For forest and high-density land cover types, the neighboring_landcover_metrics
function is applied to calculate the number of neighboring land cover pixels within a 3x3 pixel neighborhood.
neighboring_landcover_metrics(=forest,
landcover_mask='forest_landcover_metrics'
file_name
)
neighboring_landcover_metrics(=high_density,
landcover_mask='high_density_landcover_metrics'
file_name )
For other land cover types such as grasses, low density, wetlands, and open water, the summarize_landcover_pixels
function is used instead. This function simply calculates the total number of land cover pixels within each region, without considering neighboring pixels. The results for both types of metrics are then exported as CSV files to Google Drive for further analysis.
def summarize_landcover_pixels(landcover_mask, file_name):
# Calculate total landcover pixels
= landcover_mask.reduceRegions(
total_landcover =pa_tracts_ee,
collection=ee.Reducer.sum(),
reducer=30
scalemap(lambda feature: feature.set(
).'geoid', feature.get('GEOID')
))
# Export the results to Drive
= ee.batch.Export.table.toDrive(
export_task =total_landcover.select(['geoid', 'sum']),
collection='#######',
folder=file_name,
fileNamePrefix='CSV'
fileFormat
)
export_task.start()print(f"Export task started: {file_name}")
= [
landcover_list 'mask': grasses, 'file_name': 'grasses_landcover'},
{'mask': low_density, 'file_name': 'low_density_landcover'},
{'mask': wetlands, 'file_name': 'wetlands_landcover'},
{'mask': open_water, 'file_name': 'open_water_landcover'}
{
]
for landcover in landcover_list:
'mask'], landcover['file_name']) summarize_landcover_pixels(landcover[