0000000000612181
AUTHOR
Johanna ÄRje
Statistical classification and proportion estimation - an application to a macroinvertebrate image database
We apply and compare a random Bayes forest classifier and three traditional classification methods to a dataset of complex benthic macroinvertebrate images of known taxonomical identity. Since in biomonitoring changes in benthic macroinvertebrate taxa proportions correspond to changes in water quality, their correct estimation is pivotal. As classification errors are passed on to the allocated proportions, we explore a correction method known as a confusion matrix correction. Classification methods were compared using the misclassification error and the χ2 distance measures of the true proportions to the allocated and to the corrected proportions. Using low misclassification error and small…
Tilastollisia luokittelumenetelmiä koneelliseen tunnistamiseen : sovellus pohjaeläinaineistoon
Pohjaeläimiä käytetään biologisessa seurannassa, jolla tutkitaan ihmistoiminnan vaikutuksia vesistöjen ympäristön tilaan. Perinteisesti pohjaeläimet tunnistetaan manuaalisesti. Tässä työssä tarkastellaan, miten pohjaeläimiä tunnistetaan koneellisesti käyttäen luokittelumenetelmiä, jotka ovat tuottaneet hyviä tuloksia planktoneilla. Pohjaeläinten tapauksessa on tärkeää saavuttaa mahdollisimman tarkat estimaatit lajien suhteellisille osuuksille. Tätä varten tarkastellaan sekaannusmatriisikorjauksena tunnettua menetelmää lajiosuuksien estimaateille. Pohjaeläimet ovat vesistöjen pohjassa eläviä selkärangattomia eläimiä, jotka reagoivat nopeasti ympäristön muutoksiin. Niiden runsaussuhteiden muu…
Breaking the curse of dimensionality in quadratic discriminant analysis models with a novel variant of a Bayes classifier enhances automated taxa identification of freshwater macroinvertebrates
Macroinvertebrate samples are commonly used in biomonitoring to study changes on aquatic ecosystems. Traditionally, specimens are identified manually to taxa by human experts being time-consuming and cost intensive. Using the image data of 35 taxa and 64 features, we propose a novel variant of the quadratic discriminant analysis for breaking the curse of dimensionality in quadratic discriminant analysis models. Our variant, called a random Bayes array (RBA), uses bagging and random feature selection similar to random forest. We explore several variations of RBA. We consider three classification (i.e taxa identification) decisions: majority vote, averaged posterior probabilities, and a novel…
Opetusteknologia koulun arjessa
Opetusteknologia on osa monen suomalaiskoulun – opettajien ja oppilaiden – arkea eri puolilla Suomea. Opetusteknologian hyödyntäminen on avannut luokkahuoneiden ovia ympäröivään maailmaan ja samalla tuonut oppimisen maailmoista kiinnostuneita tahoja lähemmäksi koulun arkea. Edelleen on kuitenkin haasteita, jotta kaikki suomalaislapset ja opettajat saadaan innovatiivisten, inspiroivien ja luovuutta edistävien oppimisympäristöjen ja -kokemusten äärelle. Tässä kirjassa esitellään kansallisen Opetusteknologia koulun arjessa -tutkimushankkeen ensimmäisiä tuloksia. Tekes-rahoitteinen hanke on koonnut yhteen tutkimusryhmiä kahdeksasta yliopistosta ja 13:sta tutkimuslaitoksesta, liikemaailman edust…
Benchmark database for fine-grained image classification of benthic macroinvertebrates
Managing the water quality of freshwaters is a crucial task worldwide. One of the most used methods to biomonitor water quality is to sample benthic macroinvertebrate communities, in particular to examine the presence and proportion of certain species. This paper presents a benchmark database for automatic visual classification methods to evaluate their ability for distinguishing visually similar categories of aquatic macroinvertebrate taxa. We make publicly available a new database, containing 64 types of freshwater macroinvertebrates, ranging in number of images per category from 7 to 577. The database is divided into three datasets, varying in number of categories (64, 29, and 9 categori…
Empirical Bayes improves assessments of diversity and similarity when overdispersion prevails in taxonomic counts with no covariates
Abstract The assessment of diversity and similarity is relevant in monitoring the status of ecosystems. The respective indicators are based on the taxonomic composition of biological communities of interest, currently estimated through the proportions computed from sampling multivariate counts. In this work we present a novel method to estimate the taxonomic composition able to work even with a single sample and no covariates, when data are affected by overdispersion. The presence of overdispersion in taxonomic counts may be the result of significant environmental factors which are often unobservable but influence communities. Following the empirical Bayes approach, we combine a Bayesian mo…
Automatic image‐based identification and biomass estimation of invertebrates
Understanding how biological communities respond to environmental changes is a key challenge in ecology and ecosystem management. The apparent decline of insect populations necessitates more biomonitoring but the time-consuming sorting and expert-based identification of taxa pose strong limitations on how many insect samples can be processed. In turn, this affects the scale of efforts to map and monitor invertebrate diversity altogether. Given recent advances in computer vision, we propose to enhance the standard human expert-based identification approach involving manual sorting and identification with an automatic image-based technology. We describe a robot-enabled image-based identificat…
Improving statistical classification methods and ecological status assessment for river macroinvertebrates
Aquatic ecosystems are facing a growing number of human-induced stressors and the need to implement more biomonitoring to assess the ecological status of water bodies is eminent. This dissertation aims at providing tools to reduce the costs and improve the accuracy of freshwater benthic macroinvertebrate biomonitoring. To improve the cost-e ciency, we consider automated classi cation and develop a novel classi er suitable for complex macroinvertebrate image data. To enhance the accuracy of macroinvertebrate biomonitoring, we study the statistical properties of the Percent Model A nity index crucial to current Finnish biomonitoring and the factors a ecting these statistics. Finally, we perfo…
Automatic image-based identification and biomass estimation of invertebrates
1. Understanding how biological communities respond to environmental changes is a key challenge in ecology and ecosystem management. The apparent decline of insect populations necessitates more biomonitoring but the time-consuming sorting and expert-based identification of taxa pose strong limitations on how many insect samples can be processed. In turn, this affects the scale of efforts to map and monitor invertebrate diversity altogether. Given recent advances in computer vision, we propose to enhance the standard human expert-based identification approach involving manual sorting and identification with an automatic image-based technology. 2. We describe a robot-enabled image-based ident…
Evaluating the performance of artificial neural networks for the classification of freshwater benthic macroinvertebrates
Abstract Macroinvertebrates form an important functional component of aquatic ecosystems. Their ability to indicate various types of anthropogenic stressors is widely recognized which has made them an integral component of freshwater biomonitoring. The use of macroinvertebrates in biomonitoring is dependent on manual taxa identification which is currently a time-consuming and cost-intensive process conducted by highly trained taxonomical experts. Automated taxa identification of macroinvertebrates is a relatively recent research development. Previous studies have displayed great potential for solutions to this demanding data mining application. In this research we have a collection of 1350 …
Human experts vs. machines in taxa recognition
The step of expert taxa recognition currently slows down the response time of many bioassessments. Shifting to quicker and cheaper state-of-the-art machine learning approaches is still met with expert scepticism towards the ability and logic of machines. In our study, we investigate both the differences in accuracy and in the identification logic of taxonomic experts and machines. We propose a systematic approach utilizing deep Convolutional Neural Nets with the transfer learning paradigm and extensively evaluate it over a multi-pose taxonomic dataset with hierarchical labels specifically created for this comparison. We also study the prediction accuracy on different ranks of taxonomic hier…