Emily Zhou

Masters of City Planning Candidate at University of Pennsylvania

This site is maintained by emilyzhou112.

Learning Spatial Analysis with CyberGIS

What are the fundamental differences between conducting spatial analytical research on CyberGIS and desktop GIS softwares? From the perspective of Wang, the answer lies in part in the scope that each mode of research entails and the possibilities they have brought to the scientific community. Cyber infrastructures, he argues, is in analogous to large-scale physical infrastructures of roads, bridges, and water systems that support modern society. It serves as a digital ecosystem that host high-performance computers, data-intensive knowledge systems, and information technologies, which innovates research, education, scholarships: mainly all kinds of activities in the academia.

From a learning perspective, if we parallel practicing GIS to constructing houses, doing so through the desktop GIS is more like decorating a readily available house whereas using through CyberGIS, it is to co-work with someone else to develop a design blueprint and start from bricks and stones. Yet, while the outcome of the former is that one gets a good understanding of the structure and style of their own houses, the outcome of the latter is that one gets equipped with the fundamental skills of housing construction in general, including how to come up with a design plan, how to pick a location, how to choose the building material, how to lay the foundation, and how to do the decoration, which allows them to build houses of various kinds and sizes in the future and for others to learn from their work.

In other words, to contextualize it with GIS, CyberGIS enables us to collaboratively address problems of larger scales, greater complexity and data intensity, which are beyond the capacities of desktop GIS. Indeed, it has also enhanced the traditional mode of research by a holistic integration of advanced computational facilities, data storage and visualization tools. In this process, one no longer deals with simple spatial problems, but is exposed to multi-scale geospatial problems, often interdisciplinary and requires collaboration, and is expected to improve research productivity and enable scientific advances.

Through experimenting with the computational notebook from Kang et al (2020) based on their research mapping medical resources accessibility in Illinois under Covid-19, I have found several ways in which a research compendium in CyberGIS has exceled traditional GIS softwares in facilitating the learning of spatial analysis. Firstly, it broadens one’s understanding in terms of the scope and capabilities of GIS, both as a tool and a science. As a tool, GIS is not limited to desktop GIS soft wares, such as QGIS, that deal with problems of a particular scale. As a science, GIS is not only applied in the field of geography, but also integrated into other disciplines, such as medial health, and is capable to address problems of various kinds and complexity. (You may refer to a previous blog post here for more discussions on GIS as a tool and science)

Secondly, the rise of cyber infrastructures parallels the movement of open science and the age of big data. In the research paper, Kang et al (2020) provided us with pseudo-code to better understand their workflow in calculating the catchment area for each individual hospital. In the python notebook, spatial analysis code is treated as “text” and becomes a means for co-production of algorithms knowledge. Learning the study along with the code made us aware of the more specific details behind the implementation of any spatial analyses that are often hidden or ignored when using desktop GIS. The code illustrates the number of considerations that needs to be taken into account during the analysis. Questions, such as how we should clean and transform the raw data sets to better suit the analysis, which data structure is more appropriate to store the outputs, and which algorithm performs more efficiently to handle the amount of input data, could all be properly answered with the code. In our case, we learned from the code that the street network is not ready for analysis until we removed all the nodes with 0 out-degree, since hospitals assigned to such a node would be unreachable. We learned that the hospital data is stored in a geo data frame, much like a “list” data structure but with the geometry: fetching, changing, and adding information to the data set is done by indexing and appending to the list. We also get a better sense of how parallel implementation functions to greatly reduce computational time for the analysis. (See here for discussions on the reproducibility and replicability of GIscience.)

Most importantly, the research compendium that’s published along with the paper enables reproducibility for credibility and replicability for generalization, which facilitate knowledge production and validation. Being able to execute the code leads us to critically evaluate several approaches that the original researchers have taken and to capture potential errors and uncertainties. For example, we notice that the ego-graph is calculated on the nearest-node of the hospital instead of using the exact point of the hospital. This could introduce certain degree of uncertainties to the results of the analysis as it has generalized the location of each hospital. In addition, we also notice that while some hospitals are in Chicago, it also serves population who live in surrounding counties outside of Chicago. However, the study region of the original research is restricted within the boundary of Chicago, which could underestimate the potential of each hospital.

Yet, even with all the advantages associated with spatial analysis using CyberGIS, it is unnecessary to discredit the importance of traditional GIS softwares in facilitating one’s learning. For many who are complete novice in the field GIS, the code that implements the spatial analysis is a steep learning curve. Many lines of code in Kang et al ’s study are not well-commented and explanations on complicated functions that include several smaller functions are insufficient. The understanding of how calculating the catchment area serve as a way to estimate the potential of each hospital in this analysis, for example, could be better understood if one is familiar with the gravity model of spatial interaction and have visualized such model using a smaller study area in a desktop GIS.

NOTE: if you are interested in Kang’s study, you may find here the report that examines Kang’s methodology and conducts a re-analysis, as well as the repository with all the python code helpful.

===

Wang, S. (2019). Cyberinfrastructure. The Geographic Information Science & Technology Body of Knowledge (2nd Quarter 2019 Edition), John P. Wilson (Ed.). DOI: 10.22224/gistbok/2019.2.4

Kang, J. Y., A. Michels, F. Lyu, Shaohua Wang, N. Agbodo, V. L. Freeman, and Shaowen Wang. 2020. Rapidly measuring spatial accessibility of COVID-19 healthcare resources: a case study of Illinois, USA. International Journal of Health Geographics 19 (1):1–17. DOI:10.1186/s12942-020-00229-x.

Kedron, Peter and Joseph Holler, 2021-08-23, Geospatial Fellows Webinar Series: Working with students to reproduce COVID-19 research to establish the credibility of findings and accelerate policymaker adoption

===

panda Chengdu, Sichuan, China

If CyberGIS is more powerful than desktop GIS,
then is Geopanda more powerful than, panda^ ?

Back to Main Page