6533b86cfe1ef96bd12c808c

RESEARCH PRODUCT

Feature Dimensionality Reduction for Mammographic Report Classification

Luca AgnelloAlbert ComelliSalvatore Vitabile

subject

Computer scienceLatent semantic analysisbusiness.industryDimensionality reductionData managementCosine similarityPattern recognitionLatent Semantic Analysis (LSA)02 engineering and technologySingular Value Decomposition (SVD)Medical Application03 medical and health sciencesMatrix (mathematics)0302 clinical medicineFeature Dimensionality ReductionFeature (computer vision)Singular value decompositionPrincipal component analysis0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processing030212 general & internal medicineArtificial intelligencebusinessPrincipal Component Analysis (PCA)

description

The amount and the variety of available medical data coming from multiple and heterogeneous sources can inhibit analysis, manual interpretation, and use of simple data management applications. In this paper a deep overview of the principal algorithms for dimensionality reduction is carried out; moreover, the most effective techniques are applied on a dataset composed of 4461 mammographic reports is presented. The most useful medical terms are converted and represented using a TF-IDF matrix, in order to enable data mining and retrieval tasks. A series of query have been performed on the raw matrix and on the same matrix after the dimensionality reduction obtained using the most useful techniques, such as LSI, PCA, and SVD. The obtained query results are comparable to the results achieved using the raw unprocessed matrix, where the processed matrix contains less than 13 % of the raw TF-IDF data using PCA-LSI techniques and less than 6 % of the raw TF-IDF data using SVD technique.

https://doi.org/10.1007/978-3-319-44881-7_15