6533b824fe1ef96bd1280036

RESEARCH PRODUCT

Discrimination of fish populations using parasites: Random Forests on a ‘predictable’ host-parasite system

Mercedes FernándezJohn BarrettJ. A. RagaFrancisco E. MonteroAneta KostadinovaAna Pérez-del-olmo

subject

0106 biological sciencesMediterranean climatePopulation DynamicsPopulation01 natural sciencesHost-Parasite Interactions030308 mycology & parasitologyFish Diseases03 medical and health sciencesMediterranean SeaAnimalsParasite hostingParasites14. Life underwatereducationAtlantic OceanEcosystem0303 health scienceseducation.field_of_studybiologyEcology010604 marine biology & hydrobiologyBoops boopsbiology.organism_classificationPerciformesRandom forestInfectious DiseasesPopulation modelSpainSample size determinationSpatial ecologyAnimal Science and ZoologyParasitologyBiologieAlgorithms

description

SUMMARYWe address the effect of spatial scale and temporal variation on model generality when forming predictive models for fish assignment using a new data mining approach, Random Forests (RF), to variable biological markers (parasite community data). Models were implemented for a fish host-parasite system sampled along the Mediterranean and Atlantic coasts of Spain and were validated using independent datasets. We considered 2 basic classification problems in evaluating the importance of variations in parasite infracommunities for assignment of individual fish to their populations of origin: multiclass (2–5 population models, using 2 seasonal replicates from each of the populations) and 2-class task (using 4 seasonal replicates from 1 Atlantic and 1 Mediterranean population each). The main results are that (i) RF are well suited for multiclass population assignment using parasite communities in non-migratory fish; (ii) RF provide an efficient means for model cross-validation on the baseline data and this allows sample size limitations in parasite tag studies to be tackled effectively; (iii) the performance of RF is dependent on the complexity and spatial extent/configuration of the problem; and (iv) the development of predictive models is strongly influenced by seasonal change and this stresses the importance of both temporal replication and model validation in parasite tagging studies.

https://doi.org/10.1017/s0031182010000739