Emergent Filters: Automated Data Verification in a Large-Scale Citizen Science Project.

Published in The IEEE International Conference on eScience Workshop, 2011

Research projects that use the efforts of volunteers (“citizen scientists”) to collect data on organism occurrence must address issues of observer variability and species misidentification. While citizen science projects can engage a very large number of volunteers to collect volumes of data, they are prone to contain reporting errors. Our experience with eBird, a citizen science project that engages tens of thousands of volunteers to collect bird observations, has shown that a massive effort by volunteer experts is needed to screen data, identify outliers and flag them in the database. But the increasing volume of data being collected by eBird places a huge burden on these volunteer experts. In order to minimize this human effort, we explored whether previously collected eBird data can be used to create automated quality filters that emerge from the data. We do this through a two-step process. First a data-based method detects outliers (i.e., observations that are unusual for a given region and week of the year). Next, a novel machine learning method that estimates observer expertise is used to decide if the unusual observation should be flagged or not. Our preliminary findings indicate that this automated process reliably identifies outliers and accurately classifies them as either an error or represents a potentially valuable observation.

Download paper here