A text based indexing system for mammographic image retrieval and classification

6533b834fe1ef96bd129e19f

RESEARCH PRODUCT

A text based indexing system for mammographic image retrieval and classification

Alfonso Farruggia R. Magro Salvatore Vitabile

subject

Information retrieval Computer Networks and Communications Computer science business.industry Search engine indexing Big data computer.software_genre DICOM Search engine Medical images indexing and classification Hardware and Architecture Information retrieval Medical documents indexing and classification Data mining Medical diagnosis business Classifier (UML)computer Software

description

Abstract In modern medical systems huge amount of text, words, images and videos are produced and stored in ad hoc databases. Medical community needs to extract precise information from that large amount of data. Currently ICT approaches do not provide a methodology for content-based medical images retrieval and classification. On the other hand, from the Internet of Things (IoT) perspective, the ICT medical data can be produced by several devices. Produced data complies with all Big Data features and constraints. The IoT guidelines put at the center of the system a new smart software to manage and transform Big Data in a new understanding form. This paper describes a text based indexing system for mammographic images retrieval and classification. The system deals with text (structured reports) and images (mammograms) mining and classification in a typical Department of Radiology. DICOM structured reports, containing free text for medical diagnosis, have been analyzed and labeled in order to classify the corresponding mammographic images. Information Retrieval process is based on some text manipulation techniques, such as light semantic analysis, stop-word removing, and light medical natural language processing. The system includes also a Search Engine module, based on a Bayes Naive Classifier. The experimental results provide interesting performance in terms of Specificity and Sensibility. Two more indexes have been computed in order to assess the system robustness: the A z (Area under ROC Curve) index and the σ A z ( A z standard error) index. The dataset is composed of healthy and pathological DICOM structured reports. Two use case scenarios are presented and described to prove the effectiveness of the proposed approach.

year	journal	country	edition	language
2014-07-01	Future Generation Computer Systems

https://doi.org/10.1016/j.future.2014.02.008