6533b825fe1ef96bd1281f8b
RESEARCH PRODUCT
Efficient anomaly detection on sampled data streams with contaminated phase I data
Rayane El SibaiAbdallah MakhoulChady Abou JaoudeJoseph AssakerJacques DemerjianJacques Bou Abdosubject
Computer scienceSample (material)0211 other engineering and technologies02 engineering and technology[INFO.INFO-SE]Computer Science [cs]/Software Engineering [cs.SE]01 natural sciences[INFO.INFO-IU]Computer Science [cs]/Ubiquitous Computing010104 statistics & probabilitysymbols.namesake[INFO.INFO-CR]Computer Science [cs]/Cryptography and Security [cs.CR]ChartControl chartEWMA chart0101 mathematics021103 operations researchData stream miningbusiness.industryPattern recognition[INFO.INFO-MO]Computer Science [cs]/Modeling and Simulation[INFO.INFO-MA]Computer Science [cs]/Multiagent Systems [cs.MA]OutliersymbolsAnomaly detection[INFO.INFO-ET]Computer Science [cs]/Emerging Technologies [cs.ET]Artificial intelligence[INFO.INFO-DC]Computer Science [cs]/Distributed Parallel and Cluster Computing [cs.DC]businessGibbs samplingdescription
International audience; Control chart algorithms aim to monitor a process over time. This process consists of two phases. Phase I, also called the learning phase, estimates the normal process parameters, then in Phase II, anomalies are detected. However, the learning phase itself can contain contaminated data such as outliers. If left undetected, they can jeopardize the accuracy of the whole chart by affecting the computed parameters, which leads to faulty classifications and defective data analysis results. This problem becomes more severe when the analysis is done on a sample of the data rather than the whole data. To avoid such a situation, Phase I quality must be guaranteed. The purpose of this paper is to introduce a new approach for applying EWMA chart to obtain accurate anomaly detection results over sampled data even if contaminations exist in Phase I. The new chart is applied to a real dataset, and its performance is evaluated on both sampled and not sampled data according to several criteria.
year | journal | country | edition | language |
---|---|---|---|---|
2020-08-04 |