Search results for "Clustering"
showing 10 items of 446 documents
Identifying legitimate Web users and bots with different traffic profiles — an Information Bottleneck approach
2020
Abstract Recent studies reported that about half of Web users nowadays are intelligent agents (Web bots). Many bots are impersonators operating at a very high sophistication level, trying to emulate navigational behaviors of legitimate users (humans). Moreover, bot technology continues to evolve which makes bot detection even harder. To deal with this problem, many advanced methods for differentiating bots from humans have been proposed, a large part of which relies on supervised machine learning techniques. In this paper, we propose a novel approach to identify various profiles of bots and humans which combines feature selection and unsupervised learning of HTTP-level traffic patterns to d…
Foto2Events: From Photos to Event Discovery and Linking in Online Social Networks
2014
International audience; — Online social networking has become the predominant activity in the digital world thanks to multimedia data (mainly photos) sharing (e.g., photos now represent 93% of the top posts on Facebook). Discovering events where users are involved using their own posts and those shared by their friends would be of great importance. In this paper, we address this issue by providing an original approach able to detect, enrich and also link user's events using photos shared within his online social networks. Using metadata, our approach provides a multi-dimensional gathering of similar photos using their temporal, geographical, and social facets. To validate our approach, we i…
Semantic-based Merging of RSS Items
2009
Merging XML documents can be of key importance in several applications. For instance, merging the RSS news from same or different sources and providers can be beneficial for end-users in various scenarios. In this paper, we address this issue and explore the relatedness measure between RSS elements. We show here how to define and compute exclusive relations between any two elements and provide several predefined merging operators that can be extended and adapted to human needs. We also provide a set of experiments conducted to validate our approach. © Springer Science+Business Media, LLC 2009.
The Hierarchical Agglomerative Clustering with Gower index: a methodology for automatic design of OLAP cube in ecological data processing context
2015
In Press, Corrected Proof; International audience; The OLAP systems can be an improvement for ecological studies. In fact, ecology studies, follows and analyzes phenomenon across space and time and according to several parameters. OLAP systems can provide to ecologists browsing in a large dataset. One focus of the current research on OLAP system is the automatic design of OLAP cubes and of data warehouse schemas. This kind of works makes accessible OLAP technology to non information technology experts. But to be efficient, the automatic OLAP building must take into account various cases. Moreover the OLAP technology is based on the concept of hierarchy. Thereby the hierarchical clustering m…
Toward Artificial Intuition
2019
Mixed Driven Refinement Design of Multidimensional Models based on Agglomerative Hierarchical Clustering
2015
20 pages; International audience; Data warehouses (DW) and OLAP systems are business intelligence technologies allowing the on-line analysis of huge volume of data according to users' needs. The success of DW projects essentially depends on the design phase where functional requirements meet data sources (mixed design methodology) (Phipps and Davis, 2002). However, when dealing with complex applications existing design methodologies seem inefficient since decision-makers define functional requirements that cannot be deduced from data sources (data driven approach) and/or they have not sufficient application domain knowledge (user driven approach) (Sautot et al., 2014b). Therefore, in this p…
Sensory attributes of Rioja red wines and their relationship with quality perception of consumers
2014
Póster presentado en la Third Edition of International Conference Series on Wine Active Compounds (WAC2014), celebrada en Borgoña (Francia) del 26 al 28 de marzo de 2014.
Vers des systèmes de découverte et de filtrage d'information documentaire : quelle stratégie faut-il mettre en place?
2000
Le problème de l'exploitation de grands gisements d'information, celle de bases de données du type "datawarehouse" dont la constitution se généralise, de catalogues informatisés (OPACs) de bibliothèques, de bases de données spécialisées, d'Internet (en particulier du Web) est l'impossibilité pour l'usager de visualiser l'ensemble des réponses que les systèmes de recherche mettent à leur disposition. Par exemple, de recentes études empiriques effectuées sur les www-Opacs ou sur les moteurs de recherche (par exemple Spink 1999, Ihadjadene1999) montrent que les requêtes des usagers sont pauvrement formulées (moins de deux termes et pas d'opérateurs booléens), ne visualisent pas plus de deux pa…
Structural analyses in the study of behavior : From rodents to non-human primates
2022
Ajuts: J-BL's research was funded by Natural Sciences and Engineering Research Council of Canada (NSERC, Discovery Grant #: 2015-06034 to J-BL). MC, SA, and GC's research was funded by a grant from the University of Palermo, Italy. The term " structure " indicates a set of components that, in relation to each other, shape an organic complex. Such a complex takes on essential connotations of functionally unitary entity resulting from the mutual relationships of its constituent elements. In a broader sense, we can use the word " structure " to define the set of relationships among the elements of an emergent system that is not determined by the mere algebraic sum of these elements, but by the…
Analysis of the Structure and Dynamics of European Flight Networks
2022
We analyze structure and dynamics of flight networks of 50 airlines active in the European airspace in 2017. Our analysis shows that the concentration of the degree of nodes of different flight networks of airlines is markedly heterogeneous among airlines reflecting heterogeneity of the airline business models. We obtain an unsupervised classification of airlines by performing a hierarchical clustering that uses a correlation coefficient computed between the average occurrence profiles of 4-motifs of airline networks as similarity measure. The hierarchical tree is highly informative with respect to properties of the different airlines (for example, the number of main hubs, airline participa…