6533b7d8fe1ef96bd1269aa6
RESEARCH PRODUCT
Healthcare trajectory mining by combining multidimensional component and itemsets
Chedy RaïssiPascal PonceletElias EghoCatherine QuantinAmedeo NapoliNicolas JayDino IencoMaguelonne Teisseiresubject
Sequential PatternsComputer scienceDONNEE MEDICALE02 engineering and technologyReusecomputer.software_genreSynthetic dataDomain (software engineering)DATA MININGSet (abstract data type)Multi-dimensional Sequential Patterns020204 information systemsComponent (UML)SANTE0202 electrical engineering electronic engineering information engineeringPoint (geometry)SEQUENTIAL PATTERNMULTI DIMENSIONAL SEQUENTIAL PATTERNANALYSE DE DONNEES[INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB]BASE DE DONNEESTemporal databaseINFORMATIQUEScalabilityTRAJECTOIRE[SDE]Environmental Sciences020201 artificial intelligence & image processingData miningFOUILLEcomputerdescription
Sequential pattern mining is aimed at extracting correlations among temporal data. Many different methods were proposed to either enumerate sequences of set valued data (i.e., itemsets) or sequences containing multidimensional items. However, in real-world scenarios, data sequences are described as events of both multidimensional items and set valued information. These rich heterogeneous descriptions cannot be exploited by traditional approaches. For example, in healthcare domain, hospitalizations are defined as sequences of multi-dimensional attributes (e.g. Hospital or Diagnosis) associated with two sets, set of medical procedures (e.g. $ \lbrace $ Radiography, Appendectomy $\rbrace$) and set of medical drugs (e.g. $\lbrace $ Aspirin, Paracetamol $\rbrace$) . In this paper we propose a new approach called MMISP (Mining Multidimensional Itemset Sequential Patterns) to extract patterns from a complex sequences including both dimensional items and itemsets. The novelties of the proposal lies in: (i) the way in which the data can be efficiently compressed; (ii) the ability to reuse and adopt sequential pattern mining algorithms and (iii) the extraction of new kind of patterns. We introduce as a case-study, experimented on real data aggregated from a regional healthcare system and we point out the usefulness of the extracted patterns. Additional experiments on synthetic data highlights the efficiency and scalability of our approach.
year | journal | country | edition | language |
---|---|---|---|---|
2012-09-24 |