Search results for "Random forest"
showing 10 items of 121 documents
Strategies to develop radiomics and machine learning models for lung cancer stage and histology prediction using small data samples
2021
Abstract Predictive models based on radiomics and machine-learning (ML) need large and annotated datasets for training, often difficult to collect. We designed an operative pipeline for model training to exploit data already available to the scientific community. The aim of this work was to explore the capability of radiomic features in predicting tumor histology and stage in patients with non-small cell lung cancer (NSCLC). We analyzed the radiotherapy planning thoracic CT scans of a proprietary sample of 47 subjects (L-RT) and integrated this dataset with a publicly available set of 130 patients from the MAASTRO NSCLC collection (Lung1). We implemented intra- and inter-sample cross-valida…
Dynamic integration with random forests
2006
Random Forests (RF) are a successful ensemble prediction technique that uses majority voting or averaging as a combination function. However, it is clear that each tree in a random forest may have a different contribution in processing a certain instance. In this paper, we demonstrate that the prediction performance of RF may still be improved in some domains by replacing the combination function with dynamic integration, which is based on local performance estimates. Our experiments also demonstrate that the RF Intrinsic Similarity is better than the commonly used Heterogeneous Euclidean/Overlap Metric in finding a neighbourhood for local estimates in the context of dynamic integration of …
Smart load prediction analysis for distributed power network of Holiday Cabins in Norwegian rural area
2020
Abstract The Norwegian rural distributed power network is mainly designed for Holiday Cabins with limited electrical loading capacity. Load prediction analysis, within such type of network, is necessary for effective operation and to manage the increasing demand of new appliances (e. g. electric vehicles and heat pumps). In this paper, load prediction of a distributed power network (i.e. a typical Norwegian rural area power network of 125 cottages with 478 kW peak demand) is carried out using regression analysis techniques for establishing autocorrelations and correlations among weather parameters and occurrence time in the period of 2014–2018. In this study, the regression analysis for loa…
Use of Guided Regularized Random Forest for Biophysical Parameter Retrieval
2018
This paper introduces a feature selection method based on random forest -the Guided Regularized Random Forest (GRRF)- which can be used in classification and regression tasks. The method is based on the regularization of the information gain in the random forest nodes to obtain a subset of relevant and non-redundant features. The proposed method is used as a preliminary step In the process of retrieving biophysical parameters from a hyperspectral image. Preliminary experiments show that we can reduce the RMSE of the retrievals by around 7% for the Leaf Area Index and around 8% for the fraction of vegetation cover when compared to the results using random forest features.
Spatial Distribution and Abundance of Mesopelagic Fish Biomass in the Mediterranean Sea
2020
Mesopelagic fish, being in the middle of the trophic web, are important key species for the marine environment; yet limited knowledge exists about their biology and abundance. This is particularly true in the Mediterranean Sea where no regional assessment is currently undertaken regarding their biomass and/or distribution. This study evaluates spatial and temporal patterns of mesopelagic fish biomass in the 1994–2011 period. We do that for the whole Mediterranean Sea using two well-established statistical models, the Generalized Additive Model (GAM) and Random Forest (RF). Results indicate that the bathymetry played an important role in the estimation of mesopelagic fish biomass and in its …
Identification and Handling of Critical Irradiance Forecast Errors Using a Random Forest Scheme – A Case Study for Southern Brazil
2015
Abstract Large forecast errors of solar power prediction cause challenges for the management of electric grids. Here, the classification technique Random Forests is applied to analyze the possible linkage of hourly or daily forecast errors to the actual situation given by a set of meteorological variables. This form a prediction of the forecast error and is thus usable to update the forecast. The performance of this scheme is assessed for the example of irradiance forecasts in Brazil. While limited to none improvements are obtained for next-hour forecasts, significant improvements are obtained for the next-day forecasts.
Contextual factors predicting compliance behavior during the COVID-19 pandemic: A machine learning analysis on survey data from 16 countries.
2022
Voluntary isolation is one of the most effective methods for individuals to help prevent the transmission of diseases such as COVID-19. Understanding why people leave their homes when advised not to do so and identifying what contextual factors predict this non-compliant behavior is essential for policymakers and public health officials. To provide insight on these factors, we collected data from 42,169 individuals across 16 countries. Participants responded to items inquiring about their socio-cultural environment, such as the adherence of fellow citizens, as well as their mental states, such as their level of loneliness and boredom. We trained random forest models to predict whether someo…
Assessing and mapping multi-hazard risk susceptibility using a machine learning technique
2020
AbstractThe aim of the current study was to suggest a multi-hazard probability assessment in Fars Province, Shiraz City, and its four strategic watersheds. At first, we construct maps depicting the most effective factors on floods (12 factors), forest fires (10 factors), and landslides (10 factors), and used the Boruta algorithm to prioritize the impact of each respective factor on the occurrence of each hazard. Subsequently, flood, landslides, and forest fire susceptibility maps prepared using a Random Forest (RF) model in the R statistical software. Results indicate that 42.83% of the study area are not susceptible to any hazards, while 2.67% of the area is at risk of all three hazards. T…
Assessment of the statistical significance of classifications in infrared spectroscopy based diagnostic models.
2014
Fourier transform infrared (IR) spectroscopy in combination with multivariate data analysis is a versatile tool that can be applied to disease diagnosis. However, a rigorous validation of the obtained models is necessary in order to obtain robust results. This work evaluates the advantages of the use of permutation testing for determining the statistical significance of the misclassification errors obtained from IR based diagnostic models through cross validation (CV). The model performance, estimated by CV, is compared to a distribution of CV-performance values obtained using randomly permuted class labels. The distribution of ‘random CV-values’ is considered as a null distribution and use…
GIS-based groundwater potential mapping in Shahroud plain, Iran. A comparison among statistical (bivariate and multivariate), data mining and MCDM ap…
2019
Abstract In arid and semi-arid areas, groundwater resource is one of the most important water sources by the humankind. Knowledge of groundwater distribution over space, associated flow and basic exploitation measures can play a significant role in planning sustainable development, especially in arid and semi-arid areas. Groundwater potential mapping (GWPM) fits in this context as the tool used to predict the spatial distribution of groundwater. In this research we tested four GIS-based models for GWPM, consisting of: i) random forest (RF); ii) weight of evidence (WoE); iii) binary logistic regression (BLR); and iv) technique for order preference by similarity to ideal solution (TOPSIS) mul…