Clustering Species Accumulation Curves to Identify Skill Levels of Citizen Scientists Participating in the eBird Project.
Published in the Twenty-Sixth Annual Conference on Innovative Applications of Artificial Intelligence (IAAI), 2014
Although citizen science projects such as eBird can compile large volumes of data over broad spatial and temporal extents, the quality of this data is a concern due to differences in the skills of volunteers at identifying bird species. Species accumulation curves, which plot the number of unique species observed over time, are an effective way to quantify the skill level of an eBird participant. Intuitively, more skilled observers can identify a greater number of species per unit time than inexperienced birders, resulting in a steeper curve. We propose a mixture model for clustering species accumulation curves. These clusters enable the identification of distinct skill levels of eBird participants, which can then be used to build more accurate species distribution models and to develop automated data quality filters.
Although citizen science projects such as eBird can compile large volumes of data over broad spatial and temporal extents, the quality of this data is a concern due to differences in the skills of volunteers at identifying bird species. Species accumulation curves, which plot the number of unique species observed over time, are an effective way to quantify the skill level of an eBird participant. Intuitively, more skilled observers can identify a greater number of species per unit time than inexperienced birders, resulting in a steeper curve. We propose a mixture model for clustering species accumulation curves. These clusters enable the identification of distinct skill levels of eBird participants, which can then be used to build more accurate species distribution models and to develop automated data quality filters.