Page 46 - GIS for Science, Volume 3 Preview
P. 46

GOOD MODELS REQUIRE GOOD TRAINING DATA
The increased availability of remotely sensed data has advanced our understanding of the physical and biological processes that govern life on our planet. But making sense of that influx of data requires accurate and vetted field observations. The success of predictive models depends on the availability and quality of model training data in the form of verified species occurrences.
The primary source for training data used in models is NatureServe’s biodiversity location database. Starting with circles drawn on topographic quad maps in the 1970s and progressing to today’s sophisticated GPS-enabled field data collection systems, NatureServe network biologists have compiled millions of records of rare and imperiled species. These records are centrally managed in a web-enabled biodiversity information management system, which uses a unified taxonomy and consistent application of shared data standards and methodology. Range-wide point and polygon data are easy to extract for modeling.
NatureServe’s biodiversity location data provide a foundation for building habitat models in the MoBI project. For species with insufficient data for modeling, we supplemented records from specimen and citizen science portals and academic researchers (See GISforScience.com for the full list of data-gathering organizations employed.) Data from citizen science sources such as iNaturalist and eBird require careful screening for appropriate useas model training data, because they can have biases in where people record observations, large locational uncertainty, or misidentifications.
The quantity and quality of species occurrence records is one of the most important inputs into the modeling process. Model refinements may be made by obtaining species locality data from a wider variety of sources, developing methods for better filtering of citizen science data, and using the models themselves to guide additional field data collection. The latter is a particularly powerful means for improving model outcomes; model-guided field inventory is used to validate results and generate new presence and absence data that can be fed into an iterative model refinement process.
    Element occurrences
for Critically Imperiled (G1), Imperiled (G2), and Endangered Species listed under the US Endangered Species Act
NatureServe element occurrence data (light blue dots) for critically imperiled and imperiled species, and other species listed as threatened or endangered under the US Endangered Species Act.
  Application data case study: Golden-cheeked warbler
The accuracy and relevance of species locality data used to train models of habitat suitability are an important determinant of modeling success. This map shows research-grade observations for the golden-cheeked warbler collected by citizen scientists using the iNaturalist application (yellow with black dots) in the area around Lake Travis, northwest of Austin, Texas. While these are likely to be legitimate observations of this easy-to-identify species, many records are located in suburban environments or over water where individuals may have flown by, but these locations may not represent areas important to the persistence of the species. Using these records to train a model could contribute to erroneous predictions. Element occurrence records collected by the Texas Natural Diversity Database are shown in blue, which represent areas of significance for the persistence of warbler populations. Drawing model training samples from within these areas increases the likelihood of successful modeling outcomes. Citizen science data can be valuable for modeling, but for the reasons illustrated here, they require careful screening for use as inputs to species habitat models.
 34
GIS for Science























































































   44   45   46   47   48