Search results for "Similarity"
showing 10 items of 474 documents
The Whole Is Greater than the Sum of the Parts: A Multilayer Approach on Criminal Networks
2022
Traditional social network analysis can be generalized to model some networked systems by multilayer structures where the individual nodes develop relationships in multiple layers. A multilayer network is called multiplex if each layer shares at least one node with some other layer. In this paper, we built a unique criminal multiplex network from the pre-trial detention order by the Preliminary Investigation Judge of the Court of Messina (Sicily) issued at the end of the Montagna anti-mafia operation in 2007. Montagna focused on two families who infiltrated several economic activities through a cartel of entrepreneurs close to the Sicilian Mafia. Our network possesses three layers which sha…
Analysis of Users Behaviour from a Movie Preferences Perspective
2018
Despite their tremendous popularity, Online Social Networks (OSNs) have several issues related to the privacy of social users. These issues have motivated researchers to develop OSN services that take advantage of the decentralized platforms (such as P2P systems or opportunistic networks). Decentralized Online Social Networks (DOSNs) need specific approaches to manage the decentralization of social data. In particular, data availability is one of the main issues and current proposals exploit properties of the social relationships to manage it. At the best of our knowledge, there are no proposals which exploit similarity between users, expressed with the term homophily. Homophily has been we…
Dissimilarity Measures for the Identification of Earthquake Focal Mechanisms
2013
This work presents a study about dissimilarity measures for seismic signals, and their relation to clustering in the particular problem of the identification of earthquake focal mechanisms, i.e. the physical phenomena which have generated an earthquake. Starting from the assumption that waveform similarity implies similarity in the focal parameters, important details about them can be determined by studying waveforms related to the wave field produced by earthquakes and recorded by a seismic network. Focal mechanisms identification is currently investigated by clustering of seismic events, using mainly cross-correlation dissimilarity in conjunction with hierarchical clustering algorithm. By…
Graph Comparison and Artificial Models for Simulating Real Criminal Networks
2021
Network Science is an active research field, with numerous applications in areas like computer science, economics, or sociology. Criminal networks, in particular, possess specific topologies which allow them to exhibit strong resilience to disruption. Starting from a dataset related to meetings between members of a Mafia organization which operated in Sicily during 2000s, we here aim to create artificial models with similar properties. To this end, we use specific tools of Social Network Analysis, including network models (Barabási-Albert identified to be the most promising) and metrics which allow us to quantify the similarity between two networks. To the best of our knowledge, the DeltaCo…
A new feature selection strategy for K-mers sequence representation
2014
DNA sequence decomposition into k-mers (substrings of length k) and their frequency counting, defines a mapping of a sequence into a numerical space by a numerical feature vector of fixed length. This simple process allows to compute sequence comparison in an alignment free way, using common similarities and distance functions on the numerical codomain of the mapping. The most common used decomposition uses all the substrings of length k making the codomain of exponential dimension. This obviously can affect the time complexity of the similarity computation, and in general of the machine learning algorithm used for the purpose of sequence classification. Moreover, the presence of possible n…
Alignment free Dissimilarities for sequence classification
2015
One way to represent a DNA sequence is to break it down into substrings of length L, called L-tuples, and count the occurence of each L-tuple in the sequence. This representation defines a mapping of a sequence into a numerical space by a numerical feature vector of fixed length, that allows to measure sequence similarity in an alignment free way simply using disssimilarity functions between vectors. This work presents a benchmark study of 4 alignment free disssimilarity functions between sequences, computed on their L-tuples representation, for the purpose of sequence classification. In our experiments, we have tested the classes of geometric-based, correlation-based and information-based …
A Novel Time Series Kernel for Sequences Generated by LTI Systems
2017
The recent introduction of Hankelets to describe time series relies on the assumption that the time series has been generated by a vector autoregressive model (VAR) of order p. The success of Hankelet-based time series representations prevalently in nearest neighbor classifiers poses questions about if and how this representation can be used in kernel machines without the usual adoption of mid-level representations (such as codebook-based representations). It is also of interest to investigate how this representation relates to probabilistic approaches for time series modeling, and which characteristics of the VAR model a Hankelet can capture. This paper aims at filling these gaps by: deriv…
A Semantic Similarity Measure for the SIMS Framework
2008
The amount of currently available digital information grows rapidly. Relevant information is often spread over different information sources. An efficient and flexible framework to allow users to satisfy ef- fectively their information needs is required. The work presented in this paper describes SIMS (Semantic Information Management System), a ref- erence architecture for a framework performing semantic annotation, search and retrieval of information from multiple sources. The work pre- sented in this paper focuses on a specific SIMS module, the SIMS Semantic Content Navigator, proposing an algorithm and the related implementa- tion to calculate a semantic similarity measure inside an OWL …
An A* Based Semantic Tokenizer for Increasing the Performance of Semantic Applications
2013
Semantic Applications (SAs) makes use of ontolo- gies and their performance can depend on the syntactic labels of the modeled entities; even if several approaches have been devised to formalize ontologies, no formal approaches have been devised for naming their constituents, which look as long word concatenations without any particular separation. We present a novel semantic tokenizer that finds the sub-words through an application of the A* based search algorithm; the A* functions rely on a set of linguistic criteria and on the meta-cognitive perspective of the activity of reading.
An ontology-based retrieval system for mammographic reports
2015
In healthcare domain it can be useful to compare unstructured free-text clinical reports in order to enable the search for similar and/or relevant clinical cases. In data mining and text analysis tasks, the cosine similarity is usually used for texts comparison purposes. It is usually performed by computing the standard document vector cosine similarity between the two vectors representing the report pair under analysis. In this paper a novel system based on text pre-processing techniques and a modelled medical knowledge, using an improved radiological ontology, is proposed. Medical terms organized in a hierarchical tree can assess semantic similarity relationships between unstructured repo…