Efficient anomaly detection on sampled data streams with contaminated phase I data
International audience; Control chart algorithms aim to monitor a process over time. This process consists of two phases. Phase I, also called the learning phase, estimates the normal process parameters, then in Phase II, anomalies are detected. However, the learning phase itself can contain contaminated data such as outliers. If left undetected, they can jeopardize the accuracy of the whole chart by affecting the computed parameters, which leads to faulty classifications and defective data analysis results. This problem becomes more severe when the analysis is done on a sample of the data rather than the whole data. To avoid such a situation, Phase I quality must be guaranteed. The purpose…