6533b834fe1ef96bd129e19f

RESEARCH PRODUCT

A text based indexing system for mammographic image retrieval and classification

Alfonso FarruggiaR. MagroSalvatore Vitabile

subject

Information retrievalComputer Networks and CommunicationsComputer sciencebusiness.industrySearch engine indexingBig datacomputer.software_genreDICOMSearch engineMedical images indexing and classificationHardware and ArchitectureInformation retrievalMedical documents indexing and classificationData miningMedical diagnosisbusinessClassifier (UML)computerSoftware

description

Abstract In modern medical systems huge amount of text, words, images and videos are produced and stored in ad hoc databases. Medical community needs to extract precise information from that large amount of data. Currently ICT approaches do not provide a methodology for content-based medical images retrieval and classification. On the other hand, from the Internet of Things (IoT) perspective, the ICT medical data can be produced by several devices. Produced data complies with all Big Data features and constraints. The IoT guidelines put at the center of the system a new smart software to manage and transform Big Data in a new understanding form. This paper describes a text based indexing system for mammographic images retrieval and classification. The system deals with text (structured reports) and images (mammograms) mining and classification in a typical Department of Radiology. DICOM structured reports, containing free text for medical diagnosis, have been analyzed and labeled in order to classify the corresponding mammographic images. Information Retrieval process is based on some text manipulation techniques, such as light semantic analysis, stop-word removing, and light medical natural language processing. The system includes also a Search Engine module, based on a Bayes Naive Classifier. The experimental results provide interesting performance in terms of Specificity and Sensibility. Two more indexes have been computed in order to assess the system robustness: the A z (Area under ROC Curve) index and the σ A z ( A z standard error) index. The dataset is composed of healthy and pathological DICOM structured reports. Two use case scenarios are presented and described to prove the effectiveness of the proposed approach.

https://doi.org/10.1016/j.future.2014.02.008