Search results for "Random forest"

showing 10 items of 121 documents

Strategies to develop radiomics and machine learning models for lung cancer stage and histology prediction using small data samples

2021

Abstract Predictive models based on radiomics and machine-learning (ML) need large and annotated datasets for training, often difficult to collect. We designed an operative pipeline for model training to exploit data already available to the scientific community. The aim of this work was to explore the capability of radiomic features in predicting tumor histology and stage in patients with non-small cell lung cancer (NSCLC). We analyzed the radiotherapy planning thoracic CT scans of a proprietary sample of 47 subjects (L-RT) and integrated this dataset with a publicly available set of 130 patients from the MAASTRO NSCLC collection (Lung1). We implemented intra- and inter-sample cross-valida…

Lung NeoplasmsComputer scienceBiophysicsGeneral Physics and AstronomySample (statistics)Cross validationMachine learningcomputer.software_genreCross validation; Machine learning; Non-small cell lung cancer; Radiomics; Humans; Lung; Machine Learning; Neoplasm Staging; Carcinoma Non-Small-Cell Lung; Lung NeoplasmsCross-validationSet (abstract data type)Machine LearningNon-small cell lung cancerCarcinoma Non-Small-Cell LungmedicineHumansRadiology Nuclear Medicine and imagingStage (cooking)Lung cancerNon-Small-Cell LungLungNeoplasm StagingSmall dataRadiomicsbusiness.industryCarcinomaGeneral Medicinemedicine.diseaseRandom forestSupport vector machineArtificial intelligencebusinesscomputer

researchProduct

Dynamic integration with random forests

2006

Random Forests (RF) are a successful ensemble prediction technique that uses majority voting or averaging as a combination function. However, it is clear that each tree in a random forest may have a different contribution in processing a certain instance. In this paper, we demonstrate that the prediction performance of RF may still be improved in some domains by replacing the combination function with dynamic integration, which is based on local performance estimates. Our experiments also demonstrate that the RF Intrinsic Similarity is better than the commonly used Heterogeneous Euclidean/Overlap Metric in finding a neighbourhood for local estimates in the context of dynamic integration of …

researchProduct

Smart load prediction analysis for distributed power network of Holiday Cabins in Norwegian rural area

2020

Abstract The Norwegian rural distributed power network is mainly designed for Holiday Cabins with limited electrical loading capacity. Load prediction analysis, within such type of network, is necessary for effective operation and to manage the increasing demand of new appliances (e. g. electric vehicles and heat pumps). In this paper, load prediction of a distributed power network (i.e. a typical Norwegian rural area power network of 125 cottages with 478 kW peak demand) is carried out using regression analysis techniques for establishing autocorrelations and correlations among weather parameters and occurrence time in the period of 2014–2018. In this study, the regression analysis for loa…

Mathematical optimizationRenewable Energy Sustainability and the EnvironmentComputer science020209 energyStrategy and Management05 social sciencesAutocorrelationDistributed powerRegression analysis02 engineering and technologyLoad profileIndustrial and Manufacturing EngineeringRandom forestAutoregressive modelPeak demand050501 criminology0202 electrical engineering electronic engineering information engineeringSymmetric mean absolute percentage error0505 lawGeneral Environmental ScienceJournal of Cleaner Production

researchProduct

Use of Guided Regularized Random Forest for Biophysical Parameter Retrieval

2018

This paper introduces a feature selection method based on random forest -the Guided Regularized Random Forest (GRRF)- which can be used in classification and regression tasks. The method is based on the regularization of the information gain in the random forest nodes to obtain a subset of relevant and non-redundant features. The proposed method is used as a preliminary step In the process of retrieving biophysical parameters from a hyperspectral image. Preliminary experiments show that we can reduce the RMSE of the retrievals by around 7% for the Leaf Area Index and around 8% for the fraction of vegetation cover when compared to the results using random forest features.

Mean squared error22/3 OA procedurebusiness.industryComputer scienceFeature extractionHyperspectral images0211 other engineering and technologiesHyperspectral imagingPattern recognitionFeature selection02 engineering and technologyBiophysical parameter retrievalRegularization (mathematics)RegressionRandom forestFeature selection0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingArtificial intelligenceLeaf area indexbusinessRandom forest021101 geological & geomatics engineeringIGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium

researchProduct

Spatial Distribution and Abundance of Mesopelagic Fish Biomass in the Mediterranean Sea

2020

Mesopelagic fish, being in the middle of the trophic web, are important key species for the marine environment; yet limited knowledge exists about their biology and abundance. This is particularly true in the Mediterranean Sea where no regional assessment is currently undertaken regarding their biomass and/or distribution. This study evaluates spatial and temporal patterns of mesopelagic fish biomass in the 1994–2011 period. We do that for the whole Mediterranean Sea using two well-established statistical models, the Generalized Additive Model (GAM) and Random Forest (RF). Results indicate that the bathymetry played an important role in the estimation of mesopelagic fish biomass and in its …

Mediterranean climateGlobal and Planetary ChangeBiomass (ecology)Random Forestlcsh:QH1-199.5Mesopelagic zonemesopelagic fishGeneralized additive modelOcean Engineeringlcsh:General. Including nature conservation geographical distributionAquatic Sciencegeneralized additive modelOceanographySpatial distributionspatial modelOceanographyMediterranean seaAbundance (ecology)biomass distributionMediterranean Sealcsh:Qlcsh:ScienceWater Science and TechnologyTrophic level

researchProduct

Identification and Handling of Critical Irradiance Forecast Errors Using a Random Forest Scheme – A Case Study for Southern Brazil

2015

Abstract Large forecast errors of solar power prediction cause challenges for the management of electric grids. Here, the classification technique Random Forests is applied to analyze the possible linkage of hourly or daily forecast errors to the actual situation given by a set of meteorological variables. This form a prediction of the forecast error and is thus usable to update the forecast. The performance of this scheme is assessed for the example of irradiance forecasts in Brazil. While limited to none improvements are obtained for next-hour forecasts, significant improvements are obtained for the next-day forecasts.

Meteorologybusiness.industryComputer sciencepost-processingIrradianceLinkage (mechanical)Forecast verificationRandom forestlaw.inventionSet (abstract data type)Identification (information)Energy(all)lawsolar irradiance forecastsbusinessConsensus forecastRandom Forest classificationSolar powerEnergy Procedia

researchProduct

Contextual factors predicting compliance behavior during the COVID-19 pandemic: A machine learning analysis on survey data from 16 countries.

2022

Voluntary isolation is one of the most effective methods for individuals to help prevent the transmission of diseases such as COVID-19. Understanding why people leave their homes when advised not to do so and identifying what contextual factors predict this non-compliant behavior is essential for policymakers and public health officials. To provide insight on these factors, we collected data from 42,169 individuals across 16 countries. Participants responded to items inquiring about their socio-cultural environment, such as the adherence of fellow citizens, as well as their mental states, such as their level of loneliness and boredom. We trained random forest models to predict whether someo…

MultidisciplinaryPhysical Distancingsocial distancingCOVID-19:Ciências Sociais::Psicologia [Domínio/Área Científica]lockdownMachine Learningvoluntary isolationCommunicable Disease ControlHumansmulti-national studySettore M-PSI/05 - Psicologia SocialePandemicsrandom forestPloS one

researchProduct

Assessing and mapping multi-hazard risk susceptibility using a machine learning technique

2020

AbstractThe aim of the current study was to suggest a multi-hazard probability assessment in Fars Province, Shiraz City, and its four strategic watersheds. At first, we construct maps depicting the most effective factors on floods (12 factors), forest fires (10 factors), and landslides (10 factors), and used the Boruta algorithm to prioritize the impact of each respective factor on the occurrence of each hazard. Subsequently, flood, landslides, and forest fire susceptibility maps prepared using a Random Forest (RF) model in the R statistical software. Results indicate that 42.83% of the study area are not susceptible to any hazards, while 2.67% of the area is at risk of all three hazards. T…

MultidisciplinaryWatershed010504 meteorology & atmospheric sciencesFlood mythGini coefficientScienceFlooding (psychology)QNatural hazardsRLandslide010501 environmental sciences01 natural sciencesHazardArticleRandom forestMulti hazard13. Climate actionEnvironmental scienceMedicineHydrologyCartography0105 earth and related environmental sciencesScientific Reports

researchProduct

Assessment of the statistical significance of classifications in infrared spectroscopy based diagnostic models.

2014

Fourier transform infrared (IR) spectroscopy in combination with multivariate data analysis is a versatile tool that can be applied to disease diagnosis. However, a rigorous validation of the obtained models is necessary in order to obtain robust results. This work evaluates the advantages of the use of permutation testing for determining the statistical significance of the misclassification errors obtained from IR based diagnostic models through cross validation (CV). The model performance, estimated by CV, is compared to a distribution of CV-performance values obtained using randomly permuted class labels. The distribution of ‘random CV-values’ is considered as a null distribution and use…

Multivariate analysisFeature selectionClinical Chemistry Tests02 engineering and technology01 natural sciencesBiochemistryCross-validationAnalytical ChemistryResamplingStatisticsDiagnosisSpectroscopy Fourier Transform InfraredElectrochemistryNull distributionEnvironmental ChemistryHumansSpectroscopyMathematicsModels Statistical010401 analytical chemistryEstimatorContrast (statistics)Discriminant AnalysisReproducibility of Results021001 nanoscience & nanotechnology0104 chemical sciencesRandom forest0210 nano-technologyThe Analyst

researchProduct

GIS-based groundwater potential mapping in Shahroud plain, Iran. A comparison among statistical (bivariate and multivariate), data mining and MCDM ap…

2019

Abstract In arid and semi-arid areas, groundwater resource is one of the most important water sources by the humankind. Knowledge of groundwater distribution over space, associated flow and basic exploitation measures can play a significant role in planning sustainable development, especially in arid and semi-arid areas. Groundwater potential mapping (GWPM) fits in this context as the tool used to predict the spatial distribution of groundwater. In this research we tested four GIS-based models for GWPM, consisting of: i) random forest (RF); ii) weight of evidence (WoE); iii) binary logistic regression (BLR); and iv) technique for order preference by similarity to ideal solution (TOPSIS) mul…

Multivariate statisticsEnvironmental EngineeringGeographic information system010504 meteorology & atmospheric sciencesContext (language use)Land coverBinary logistic regression010501 environmental sciences01 natural sciencesStatisticsEnvironmental ChemistrySemi-arid regionWaste Management and Disposal0105 earth and related environmental sciencesbusiness.industryTOPSISWeight of evidencePollution22/4 OA procedureWater resourcesThematic mapITC-ISI-JOURNAL-ARTICLEEnvironmental sciencebusinessDecision makingGroundwaterRandom forest

researchProduct