0000000000582497
AUTHOR
Chady Abou Jaoude
Efficient anomaly detection on sampled data streams with contaminated phase I data
International audience; Control chart algorithms aim to monitor a process over time. This process consists of two phases. Phase I, also called the learning phase, estimates the normal process parameters, then in Phase II, anomalies are detected. However, the learning phase itself can contain contaminated data such as outliers. If left undetected, they can jeopardize the accuracy of the whole chart by affecting the computed parameters, which leads to faulty classifications and defective data analysis results. This problem becomes more severe when the analysis is done on a sample of the data rather than the whole data. To avoid such a situation, Phase I quality must be guaranteed. The purpose…
SCCF Parameter and Similarity Measure Optimization and Evaluation
Neighborhood-based Collaborative Filtering (CF) is one of the most successful and widely used recommendation approaches; however, it suffers from major flaws especially under sparse environments. Traditional similarity measures used by neighborhood-based CF to find similar users or items are not suitable in sparse datasets. Sparse Subspace Clustering and common liking rate in CF (SCCF), a recently published research, proposed a tunable similarity measure oriented towards sparse datasets; however, its performance can be maximized and requires further analysis and investigation. In this paper, we propose and evaluate the performance of a new tuning mechanism, using the Mean Absolute Error (MA…