Page 189 - GIS for Science, Volume 3 Preview
P. 189

TEACHING SPATIAL DATA SCIENCE TO DATA SCIENCE STUDENTS
As demand for data science specialists grows and graduates with these skills command higher average salaries, universities have rushed to create data science classes and programs. Enrollments in such programs often eclipse enrollments in geography, environmental science, urban planning, and other more traditional homes for geographic information science classes. Spatial location is integral to the increasingly large datasets that scientists use to train machine learning models. As such, spatial analysis is a key skill needed in such disciplines as environmental science, public health, urban planning, chemistry, business analytics, and neuroscience.
In a more traditional GIS curriculum, undergraduate and graduate students normally take GIS programming, data science, and machine learning courses near the end of their programs. These courses are based on previous coursework, which typically includes foundations of cartography, spatial analysis, and experience with a desktop or web-based GIS. In new data science programs, GIS and spatial analysis is a completely new topic, as lower-division training largely focuses on the theory and practice of data science, data structures and algorithms, statistics, and Python coding. For this large new cohort of undergraduate students, a path to spatial data science is also necessary. Most importantly, this different path must be practical and project-based, letting students work with real data and develop solutions to real challenges posed by local governments and communities or identified in research projects.
Students encounter these key principles in the Spatial Data Science and Applications, an upper-level elective course that the new UCSD Data Science Program has offered in collaboration with Esri since spring 2019. While several remarkable spatial data science programs have been created at the graduate level, this class was among the first of its kind at the undergraduate level, focused on a combination of machine learning and Python coding with spatial analysis and GIS.
When UCSD students take this class, they are proficient with Python and several key data science libraries, such as Pandas, Matplotlib, Seaborn, scikit-learn, and Keras/TensorFlow. They have learned key machine learning and statistical analysis techniques, and many of them have taken image processing and similar classes. With a few exceptions (typically, graduate students from other departments), the students had no previous classes with elements of mapping and spatial data management and analysis.
The program structured the course as a sequence of modules, gradually introducing spatial analysis concepts, techniques, and platforms, building on what students know at the start of each module:
1. Foundations of spatial data, online mapping and analysis, using open- source Python packages (GeoPandas, Shapely, etc.) Getting started with this environment was simple for students trained in Python and Pandas and proficient with Jupyter notebooks they developed on the UCSD Data Science and Machine Learning Platform. This familiarity helped us avoid technical distractions as we explored foundational concepts: layered organization of spatial information, key topological and distance-based operations, spatial join, and techniques for online mapping, projections, spatial data structures, spatial data quality, and metadata.
2. Advanced feature management and analysis, with ArcGIS API for Python and ArcGIS Online. In this module, students learn components of the ArcGIS ecosystem for spatial data management, and experiment with more advanced concepts that are especially useful for data science projects: geoenrichment to generate variables for classification and regression models, operations for joining and aggregating spatial information from various data layers, geocoding, geoprocessing, and network analysis.
3. Raster analysis and modeling, with ArcGIS API for Python on the ArcGIS Enterprise platform. This module introduces raster data structures and operations over grids of various types, common indexes computed over Landsat and Sentinel imagery, map algebra operations, and rules of map combination for suitability modeling.
4. Spatial statistics, primarily with ArcGIS API for Python and PySAL: including understanding of spatial weights, geographically weighted regression, computing hot and cold spots in spatial distributions, and point pattern analysis.
5. Machine learning and deep learning with spatial data. While machine learning examples appear in all modules of the course, this module summarizes techniques and tools for feature engineering based on spatial relationships to improve model accuracy, and experiment with deep-learning models.
Each lecture in the 10-week course includes Jupyter notebooks showing spatial analysis concepts and coding recipes. The idea is to demonstrate how these new concepts help address practical challenges that researchers encounter as they work with spatial information. For example, projections are introduced once students try to plot a map with data from various sources in GeoPandas, and the results are not what they expect. Results of binary spatial operations often appear incorrect because of geometric imperfections or different spatial resolutions of input layers. These results lead to a discussion of spatial uncertainties and formal data quality descriptions in spatial metadata, while the need to ensure topological integrity triggers a discussion of spatial data structures.
Class projects
Learning new concepts and understanding caveats of using spatial data and spatial analysis techniques by resolving practical roadblocks is the most important part of the course. Students learn these skills through a series of mini-projects culminating with a final project at the end of the quarter. Unlike typical assignments in previous classes, these projects are open ended. Students are given a general prompt, for which they are expected to find appropriate datasets, apply spatial operations and machine learning models, and critically discuss their findings using Jupyter notebooks.
Teams of two students work on each of the mini-projects (except the first one, which is intended to establish individual baseline proficiency with spatial operations) so that they can discuss design choices and limitations of their approach. For final projects, some student teams choose ArcGIS Pro to develop and edit data layers or ArcGIS StoryMaps to present their results. Students received no instructions on the use of these Esri products but often found them easy to use after experiencing GIS through the lens of Python packages. The next section presents a selection of example student projects.
Teaching Spatial Data Science and Deep Learning 177



















































































   187   188   189   190   191