Modeling Misidentification of Bird Species by Citizen Scientists.
Published in The Twenty-seventh Conference on Neural Information Processing Systems (NIPS), 2013
Data quality is a common source of concern with large-scale citizen science projects like eBird. In the case of eBird, poor quality data is often due to misidentification of bird species by inexperienced contributors. One approach for improv- ing data quality is to identify commonly misidentified bird species and to teach inexperienced birders the differences between these species. In this paper, we develop a latent variable model, based on a multi-species extension of the classic occupancy-detection model in the ecology literature, that we can apply to eBird data to discover pairs of bird species that observers often confuse for each other.