0000000000458064
AUTHOR
Marco La Cascia
NURBS Modelling in Virtual Reality
Combining Top-down and Bottom-up Visual Saliency for Firearms Localization
Object detection is one of the most challenging issues for computer vision researchers. The analysis of the human visual attention mechanisms can help automatic inspection systems, in order to discard useless information and improving performances and efficiency. In this paper we proposed our attention based method to estimate firearms position in images of people holding firearms. Both top-down and bottom-up mechanisms are involved in our system. The bottom-up analysis is based on a state-of-the-art approach. The top-down analysis is based on the construction of a probabilistic model of the firearms position with respect to the people’s face position. This model has been created by analyzi…
Object Matching in Distributed Video Surveillance Systems by LDA-Based Appearance Descriptors
Establishing correspondences among object instances is still challenging in multi-camera surveillance systems, especially when the cameras’ fields of view are non-overlapping. Spatiotemporal constraints can help in solving the correspondence problem but still leave a wide margin of uncertainty. One way to reduce this uncertainty is to use ap- pearance information about the moving objects in the site. In this paper we present the preliminary results of a new method that can capture salient appearance characteristics at each camera node in the network. A Latent Dirichlet Allocation (LDA) model is created and maintained at each node in the camera network. Each object is encoded in terms of the…
3D skeleton-based human action classification: A survey
In recent years, there has been a proliferation of works on human action classification from depth sequences. These works generally present methods and/or feature representations for the classification of actions from sequences of 3D locations of human body joints and/or other sources of data, such as depth maps and RGB videos.This survey highlights motivations and challenges of this very recent research area by presenting technologies and approaches for 3D skeleton-based action classification. The work focuses on aspects such as data pre-processing, publicly available benchmarks and commonly used accuracy measurements. Furthermore, this survey introduces a categorization of the most recent…
Towards a fully integrated CAD system in virtual reality environment
Multi-modal Medical Image Registration by Local Affine Transformations
Image registration is the process of finding the geometric transformation that, applied to the floating image, gives the registered image with the highest similarity to the reference image. Registering a pair of images involves the definition of a similarity function in terms of the parameters of the geometric transformation that allows the registration. This paper proposes to register a pair of images by iteratively maximizing the empirical mutual information through coordinate gradient descent. Hence, the registered image is obtained by applying a sequence of local affine transformations. Rather than adopting a uniformly spaced grid to select image blocks to locally register, as done by s…
Face Processing on Low-Power Devices
The research on embedded vision-based techniques is considered nowadays as one of the most interesting matters of computer vision. In this work we address the scenario in which a real-time face processing system is needed to monitor people walking through some locations. Some face detection (e.g., Viola-Jones face detector) and face recognition (e.g., eigenfaces) approaches have reached a certain level of maturity, so we focused on the development of such techniques on embedded systems taking into account both hardware and software constraints. Our goal is to detect the presence of some known individuals inside some sensitive areas producing a compact description of the observed people. Cap…
Depth-Aware Multi-object Tracking in Spherical Videos
This paper deals with the multi-object tracking (MOT) problem in videos acquired by 360-degree cameras. Targets are tracked by a frame-by-frame association strategy. At each frame, candidate targets are detected by a pre-trained state-of-the-art deep model. Associations to the targets known till the previous frame are found by solving a data association problem considering the locations of the targets in the scene. In case of a missing detection, a Kalman filter is used to track the target. Differently than works at the state-of-the-art, the proposed tracker considers the depth of the targets in the scene. The distance of the targets from the camera can be estimated by geometrical facts pec…
An on-line learning method for face association in personal photo collection
Due to the widespread use of cameras, it is very common to collect thousands of personal photos. A proper organization is needed to make the collection usable and to enable an easy photo retrieval. In this paper, we present a method to organize personal photo collections based on ''who'' is in the picture. Our method consists in detecting the faces in the photo sequence and arranging them in groups corresponding to the probable identities. This problem can be conveniently modeled as a multi-target visual tracking where a set of on-line trained classifiers is used to represent the identity models. In contrast to other works where clustering methods are used, our method relies on a probabilis…
Hop: Histogram of patterns for human action representation
This paper presents a novel method for representing actions in terms of multinomial distributions of frequent sequential patterns of different length. Frequent sequential patterns are series of data descriptors that occur many times in the data. This paper proposes to learn a codebook of frequent sequential patterns by means of an apriori-like algorithm, and to represent an action with a Bag-of-Frequent-Sequential-Patterns approach. Preliminary experiments of the proposed method have been conducted for action classification on skeletal data. The method achieves state-of-the-art accuracy value in cross-subject validation.
Automatic image representation for content-based access to personal photo album
The proposed work exploits methods and techniques for automatic characterization of images for content-based access to personal photo libraries. Several techniques, even if not reliable enough to address the general problem of content-based image retrieval, have been proven quite robust in a limited domain such as the one of personal photo album. In particular, starting from the observation that most personal photos depict a usually small number of people in a relatively small number of different contexts (e.g. Beach, Public Garden, Indoor, Nature, Snow, City, etc...) we propose the use of automatic techniques borrowed from the fields of computer vision and pattern recognition to index imag…
Gesture Modeling by Hanklet-Based Hidden Markov Model
In this paper we propose a novel approach for gesture modeling. We aim at decomposing a gesture into sub-trajectories that are the output of a sequence of atomic linear time invariant (LTI) systems, and we use a Hidden Markov Model to model the transitions from the LTI system to another. For this purpose, we represent the human body motion in a temporal window as a set of body joint trajectories that we assume are the output of an LTI system. We describe the set of trajectories in a temporal window by the corresponding Hankel matrix (Hanklet), which embeds the observability matrix of the LTI system that produced it. We train a set of HMMs (one for each gesture class) with a discriminative a…
Object Recognition and Modeling Using SIFT Features
In this paper we present a technique for object recognition and modelling based on local image features matching. Given a complete set of views of an object the goal of our technique is the recognition of the same object in an image of a cluttered environment containing the object and an estimate of its pose. The method is based on visual modeling of objects from a multi-view representation of the object to recognize. The first step consists of creating object model, selecting a subset of the available views using SIFT descriptors to evaluate image similarity and relevance. The selected views are then assumed as the model of the object and we show that they can effectively be used to visual…
A decision support system to assure high-performance maintenance service
PurposeThis study aims to propose a decision support system (DSS) for maintenance management of a service system, namely, a street cleaning service vehicle. Referring to the information flow management, the blockchain technology is integrated in the proposed DSS to assure data transparency and security.Design/methodology/approachThe DSS is designed to efficiently handle the data acquired by the network of sensors installed on selected system components and to support the maintenance management. The DSS supports the decision makers to select a subset of indicators (KPIs) by means of the DEcision-MAaking Trial and Evaluation Laboratory method and to monitor the efficiency of performed prevent…
<title>Multifeature image and video content-based storage and retrieval</title>
In this paper we present most recent evolution of JACOB, a system we developed for image and video content-based storage and retrieval. The system is based on two separate archives: a 'features DB' and a 'raw-data DB'. When a user puts a query, a search is done in the 'features DB'; the selected items are taken form the 'raw-data DB' and shown to the user. Two kinds of sessions are allowed: 'database population' and 'database querying'. During a 'database population' session the user inserts new data into the archive. The input data can consist of digital images or videos. Videos are split into shots and for each shot one or more representative frames are automatically extracted. Shots and …
Fake News Spreaders Detection: Sometimes Attention Is Not All You Need
Guided by a corpus linguistics approach, in this article we present a comparative evaluation of State-of-the-Art (SotA) models, with a special focus on Transformers, to address the task of Fake News Spreaders (i.e., users that share Fake News) detection. First, we explore the reference multilingual dataset for the considered task, exploiting corpus linguistics techniques, such as chi-square test, keywords and Word Sketch. Second, we perform experiments on several models for Natural Language Processing. Third, we perform a comparative evaluation using the most recent Transformer-based models (RoBERTa, DistilBERT, BERT, XLNet, ELECTRA, Longformer) and other deep and non-deep SotA models (CNN,…
Activity Monitoring Made Easier by Smart 360-degree Cameras
This paper proposes the use of smart 360-degree cameras for activity monitoring. By exploiting the geometric properties of these cameras and adopting off-the-shelf tracking algorithms adapted to equirectangular images, this paper shows how simple it becomes deploying a camera network, and detecting the presence of pedestrians in predefined regions of interest with minimal information on the camera, namely its height. The paper further shows that smart 360-degree cameras can enhance motion understanding in the environment and proposes a simple method to estimate the heatmap of the scene to highlight regions where pedestrians are more often present. Quantitative and qualitative results demons…
Unifying Textual and Visual Cues for Content-Based Image Retrieval on the World Wide Web
A system is proposed that combines textual and visual statistics in a single index vector for content-based search of a WWW image database. Textual statistics are captured in vector form using latent semantic indexing based on text in the containing HTML document. Visual statistics are captured in vector form using color and orientation histograms. By using an integrated approach, it becomes possible to take advantage of possible statistical couplings between the content of the document (latent semantic content) and the contents of images (visual statistics). The combined approach allows improved performance in conducting content-based search. Search performance experiments are reported for…
A multi-agent decision support system for dynamic supply chain organization
In this work, a multi-agent system (MAS) for supply chain dynamic con- figuration is proposed. The brain of each agent is composed of a Bayesian Decision Network (BDN); this choice allows the agent for taking the best decisions esti- mating benefits and potential risks of different strategies, analyzing and managing uncertain information about the collaborating companies. Each agent collects infor- mation about customer’s orders and current market prices, and analyzes previous experiences of collaborations with trading partners. The agent therefore performs a probabilistic inferential reasoning to filter information modeled in its knowledge base in order to achieve the best performance in t…
Automatic Video Database Indexing and Retrieval
The increasing development of advanced multimedia applications requires new technologies for organizing and retrieving by content databases of still digital images or digital video sequences. To this aim image and image sequence contents must be described and adequately coded. In this paper we describe a system allowing content-based annotation and querying in video databases. No user action is required during the database population step. The system automatically splits a video into a sequence of shots, extracts a few representative frames (said r-frames) from each shot and computes r-frame descriptors based on color, texture and motion. Queries based on one or more features are possible. …
Saliency Based Aesthetic Cut of Digital Images
Aesthetic cut of photos is a process well known to professional photographers. It consists of cutting the original photo to remove less relevant parts close to the borders leaving in this way the interesting subjects in a position that is perceived by the observer as more pleasant. In this paper we propose a saliency based technique to automatically perform aesthetic cut in images. We use a standard method to estimate the saliency map and propose some post processing on the map to make it more suitable for our scope. We then apply a greedy algorithm to determine the cut (i.e. the most important part of the original image) both in the cases of free and fixed aspect ratio. Experimental result…
Automatic Generation of Custom Tourist Routes
In this paper we present a new tool for the automatic generation of custom tourist routes. Starting from the user preferences (place to visit, starting and ending points, available time, favourite types of attractions) our system is able to extract information from the Web and to suggest a custom route to the user. Our system is based only onto online information, which is dynamically extracted at the query time, so that it can work for every location in the world, with no restrictions. Furthermore, our method does not require any user intervention, unless the input parameters. Our system is also able to give to the users supplementary information about the route stops, as a photo slideshow…
3D Object Modeling by Sharing Visual Attributes across Poses and Scales
Scene parsing aims at understanding a scene and the arrangements of the objects in it. While this is a task human beings are pretty good at [7], a machine needs to: recognize the kind of scene (indoor vs outdoor, bedroom vs. living room etc.)[4], detect and recognize 3D objects across multiple poses and scales [8, 5], infer the geometrical arrangement of the objects in the scene [2, 1], etc.. In the proposed framework, a 3D object is modeled as a graph. Each node in the graph represents a visual attribute automatically discovered by considering features that are consistently and repeatedly present across different poses and scales. Such visual attributes are different from “parts” [5], whic…
Probabilistic Corner Detection for Facial Feature Extraction
After more than 35 years of resarch, face processing is considered nowadays as one of the most important application of image analysis. It can be considered as a collection of problems (i.e., face detection, normalization, recognition and so on) each of which can be treated separately. Some face detection and face recognition techniques have reached a certain level of maturity, however facial feature extraction still represents the bottleneck of the entire process. In this paper we present a novel facial feature extraction approach that could be used for normalizing Viola-Jones detected faces and let them be recognized by an appearance-based face recognition method. For each observed featur…
A data association approach to detect and organize people in personal photo collections
In this paper we present a method to automatically segment a photo sequence in groups containing the same persons. Many methods in literature accom- plish to this task by adopting clustering techniques. We model the problem as the search for probable associations between faces detected in subsequent photos con- sidering the mutual exclusivity constraint: a person can not be in a photo two times, nor two faces in the same photo can be assigned to the same group. Associations have been found considering face and clothing descriptions. In particular, a two level architecture has been adopted: at the first level, associations are computed within meaningful temporal windows (situations); at the …
Particle Filtering for Tracking in 360 Degrees Videos Using Virtual PTZ Cameras
360 degrees cameras are devices able to record spherical images of the environment. Such images can be used to generate views of the scene by projecting the spherical surface onto planes tangent to the sphere. Each of these views can be considered as the output of a virtual PTZ (vPTZ) camera with specific pan, tilt and zoom parameters. This paper proposes to formulate the visual tracking problem as the one of selecting, at each time, the vPTZ camera to foveate on the target from the unlimited set of simultaneously generated vPTZ camera views. Assuming that the selected vPTZ camera is a stochastic variable, the paper proposes to model the posterior distribution of the underlying stochastic p…
Enabling Technologies on Hybrid Camera Networks for Behavioral Analysis of Unattended Indoor Environments and Their Surroundings
This paper presents a layered network architecture and the enabling technologies for accomplishing vision-based behavioral analysis of unattended environments. Specifically the vision network covers both the attended environment and its surroundings by means of multi-modal cameras. The layer overlooking at the surroundings is laid outdoor and tracks people, monitoring entrance/exit points. It recovers the geometry of the site under surveillance and communicates people positions to a higher level layer. The layer monitoring the unattended environment undertakes similar goals, with the addition of maintaining a global mosaic of the observed scene for further understanding. Moreover, it merges …
Clustering techniques for personal photo album management
In this work we propose a novel approach for the automatic representation of pictures achieving at more effective organization of personal photo albums. Images are analyzed and described in multiple representation spaces, namely, faces, background and time of capture. Faces are automatically detected, rectified and represented projecting the face itself in a common low-dimensional eigenspace. Backgrounds are represented with low-level visual features based on RGB histogram and Gabor filter bank. Faces, time and background information of each image in the collection is automatically organized using a mean-shift clustering technique. Given the particular domain of personal photo libraries, wh…
Method for Classifying a Digital Image
Population and Query Interface for a Content-Based Video Database
In this paper we describe the first full implementation of a content-based indexing and retrieval system for MPEG-2 and MPEG-4 videos. We consider a video as a collection of spatiotemporal segments called video objects; each video object is a sequence of video object planes. A set of representative video object planes is used to index each video object. During the database population, the operator, using a semi-automatic outlining tool we developed, manually selects video objects and insert some semantical information. Low-level visual features like color, texture, motion and geometry are automatically computed. The system has been implemented on a commercial relational DBMS and is based on…
Automatic Generation of Subject-Based Image Transitions
This paper presents a novel approach for the automatic generation of image slideshows. Counter to standard cross-fading, the idea is to operate the image transitions keeping the subject focused in the intermediate frames by automatically identifying him/her and preserving face and facial features alignment. This is done by using a novel Active Shape Model and time-series Image Registration. The final result is an aesthetically appealing slideshow which emphasizes the subject. The results have been evaluated with a users’ response survey. The outcomes show that the proposed slideshow concept is widely preferred by final users w.r.t. standard image transitions.
Automatic Image Annotation Using Random Projection in a Conceptual Space Induced from Data
The main drawback of a detailed representation of visual content, whatever is its origin, is that significant features are very high dimensional. To keep the problem tractable while preserving the semantic content, a dimen- sionality reduction of the data is needed. We propose the Random Projection techniques to reduce the dimensionality. Even though this technique is sub-optimal with respect to Singular Value Decomposition its much lower computational cost make it more suitable for this problem and in par- ticular when computational resources are limited such as in mobile terminals. In this paper we present the use of a "conceptual" space, automatically induced from data, to perform automa…
Distributed Image Databases: Hybrid Similarity Functions
Unsupervised Clustering in Personal Photo Collections
In this paper we propose a probabilistic approach for the automatic organization of collected pictures aiming at more effective representation in personal photo albums. Images are analyzed and described in two representation spaces, namely, faces and background. Faces are automatically detected, rectified and represented projecting the face itself in a common low dimensional eigenspace. Backgrounds are represented with low-level visual features based on RGB histogram and Gabor filter energy. Face and background information of each image in the collection is automatically organized by mean-shift clustering technique. Given the particular domain of personal photo libraries, where most of the …
Biologically Inspired Vision Architectures: a Software/Hardware Perspective
Even tough the field of computer vision has seen huge improvement in the last few decades, computer vision systems still lack, in most cases, the efficiency of biological vision systems. In fact biological vision systems routinely accomplish complex visual tasks such as object recognition, obstacle avoidance, and target tracking, which continue to challenge artificial systems. The study of biological vision system remains a strong cue for the design of devices exhibiting intelligent behaviour in visually sensed environments but current artificial systems are vastly different from biological ones for various reasons. First of all, biologically inspired vision architectures, which are continu…
Depth Map Generation by Image Classification
This paper presents a novel and fully automatic technique to estimate depth information from a single input image. The proposed method is based on a new image classification technique able to classify digital images (also in Bayer pattern format) as indoor, outdoor with geometric elements or outdoor without geometric elements. Using the information collected in the classification step a suitable depth map is estimated. The proposed technique is fully unsupervised and is able to generate depth map from a single view of the scene, requiring low computational resources.
An Integrated Architecture for Surveillance and Monitoring in an Archaeological Site
This paper describes an on-going work aimed at designing and deploying a system for the surveillance and monitoring of an archaeological site, namely the "Valley of the Temples" in Agrigento, Italy. Given the relevance of the site from an artistical and historical point of view, it is important to protect the monuments from malicious or simply incautious behavior; however, the vastity of the area to be monitored and the vague definition of its boundaries make it unpractical to provide extensive coverage through traditional sensors or similar devices. We describe the design of an architecture for the surveillance of the site and for the monitoring of the visitors' behavior consisting in an i…
Deep Motion Model for Pedestrian Tracking in 360 Degrees Videos
This paper proposes a deep convolutional neural network (CNN) for pedestrian tracking in 360◦ videos based on the target’s motion. The tracking algorithm takes advantage of a virtual Pan-Tilt-Zoom (vPTZ) camera simulated by means of the 360◦ video. The CNN takes in input a motion image, i.e. the difference of two images taken by using the vPTZ camera at different times by the same pan, tilt and zoom parameters. The CNN predicts the vPTZ camera parameter adjustments required to keep the target at the center of the vPTZ camera view. Experiments on a publicly available dataset performed in cross-validation demonstrate that the learned motion model generalizes, and that the proposed tracking algo…
A Decisional Multi-Agent Framework for Automatic Supply Chain Arrangement
In this work, a multi-agent system (MAS) for supply chain dynamic configuration is proposed. The brain of each agent is composed of a Bayesian Decision Network (BDN); this choice allows the agent for taking the best decisions estimating benefits and potential risks of different strategies, analyzing and managing uncertain information about the collaborating companies. Each agent collects information about customer's orders and current market prices, and analyzes previous experiences of collaborations with trading partners. The agent therefore performs a probabilistic inferential reasoning to filter information modeled in its knowledge base in order to achieve the best performance in the sup…
An Automated Visual Inspection System for the Classification of the Phases of Ti-6Al-4V Titanium Alloy
Metallography is the science of studying the physical properties of metal microstructures, by means of microscopes. While traditional approaches involve the direct observation of the acquired images by human experts, Com-puter Vision techniques may help experts in the analysis of the inspected mate-rials. In this paper we present an automated system to classify the phases of a Titanium alloy, Ti-6Al-4V. Our system has been tested to analyze the final products of a Friction Stir Welding process, to study the states of the micro-structures of the welded material.
JACOB: Just A COntent Based query system for video databases
The increasing development of advanced multimedia applications requires new technologies for organizing and retrieving by content databases of still digital images or digital video sequences. The authors describe JACOB, a prototypal system allowing content-based browsing and querying in video databases. The JACOB system automatically splits a video into a sequence of shots, extracts a few representative frames (said r-frames) from each shot and computes r-frame descriptors based on features like color and texture. No user action is required during the database population step. Queries exploit this image content description and may be direct or by example
Improved color interpolation using discrete wavelet transform
New approaches to Color Interpolation based on Discrete Wavelet Transform are described. The Bayer data are split into the three colour components; for each component the Wavelet Coefficient Interpolation (WCI) algorithm is applied and results are combined to obtain the final colour interpolated image. A further anti-aliasing algorithm can be applied in order to reduce false colours. A first approach consists of interpolating wavelet coefficients starting from a spatial analysis of the input image. It was considered an interpolation step based on threshold levels associated to the spatial correlation of the input image pixel. A second approach consists of interpolating wavelet coefficients …
A risk evaluation framework for the best maintenance strategy: the case of a marine salt manufacture firm
Highlights • This paper proposes a MCDM framework to support risk evaluation for maintenance activities. • The ANP is proposed to select the best maintenance strategy on the basis of real systems’ features. • The ELECTRE III is used to prioritise the main risks related to the interventions of the selected maintenance policy. • The proposed framework is applied to a core subsystem of a real-world marine salt manufacture firm.
Fast, Reliable Head Tracking under Varying Illumination: An Approach Based on Robust Registration of Texture-Mapped 3D Models
An improved technique for 3D head tracking under varying illumination conditions is proposed. The head is modeled as a texture mapped cylinder. Tracking is formulated as an image registration problem in the cylinder's texture map image. The resulting dynamic texture map provides a stabilized view of the face that can be used as input to many existing 2D techniques for face recognition, facial expressions analysis, lip reading, and eye tracking. To solve the registration problem in the presence of lighting variation and head motion, the residual error of registration is modeled as a linear combination of texture warping templates and orthogonal illumination templates. Fast and stable on-line…
Ensemble of Hankel Matrices for Face Emotion Recognition
In this paper, a face emotion is considered as the result of the composition of multiple concurrent signals, each corresponding to the movements of a specific facial muscle. These concurrent signals are represented by means of a set of multi-scale appearance features that might be correlated with one or more concurrent signals. The extraction of these appearance features from a sequence of face images yields to a set of time series. This paper proposes to use the dynamics regulating each appearance feature time series to recognize among different face emotions. To this purpose, an ensemble of Hankel matrices corresponding to the extracted time series is used for emotion classification withi…
Extension of the Depth of Field using Multifocus Input Images
Using Temporal Texture for Content-Based Video Retrieval
Textures evolving over time are called temporal textures and are very common in everyday life. Examples are the smoke flowing or the wavy water of a river. The idea explored in this paper is that image features based on temporal texture could allow a better performance of current content-based video retrieval systems that are mainly based on static characteristics of representative frames, like color and texture. To this aim we analyze the spatio-temporal nature of texture and its application in content-based access to video databases. In particular, we represent temporal texture using the spatio-temporal autoregressive (STAR) model and a variation of self-organizing maps (SOM) where each n…
Mix and Match Features: Relevance Feedback and Combined Similarity Metrics
Entropy-based Localization of Textured Regions
Appearance description is a relevant field in computer vision that enables object recognition in domains as re-identification, retrieval and classification. Important cues to describe appearance are colors and textures. However, in real cases, texture detection is challenging due to occlusions and to deformations of the clothing while person's pose changes. Moreover, in some cases, the processed images have a low resolution and methods at the state of the art for texture analysis are not appropriate. In this paper, we deal with the problem of localizing real textures for clothing description purposes, such as stripes and/or complex patterns. Our method uses the entropy of primitive distribu…
Pedestrian Tracking in 360 Video by Virtual PTZ Cameras
Since the data acquired by a PTZ camera change while adjusting the pan, tilt and zoom parameters, the results of tracking algorithms are difficult to reproduce; such diffi- culty limits the development and the comparison of tracking algorithms with PTZ cameras. The recently introduced 360- degree cameras acquire spherical views of the environment, generally stored as equirectangular images. Each pixel of an equirectangular image corresponds to a point on the spherical surface. A gnomonic projection can be used to project the points on the spherical surface onto a plane tangent to the sphere. Such tangent plane can be interpreted as the image plane of a virtual PTZ camera oriented towards th…
Keyword Based Keyframe Extraction in Online Video Collections
Keyframe extraction methods aim to find in a video sequence the most significant frames, according to specific criteria. In this paper we propose a new method to search, in a video database, for frames that are related to a given keyword, and to extract the best ones, according to a proposed quality factor. We first exploit a speech to text algorithm to extract automatic captions from all the video in a specific domain database. Then we select only those sequences (clips), whose captions include a given keyword, thus discarding a lot of information that is useless for our purposes. Each retrieved clip is then divided into shots, using a video segmentation method, that is based on the SURF d…
A Dataset of Annotated Omnidirectional Videos for Distancing Applications
Omnidirectional (or 360°) cameras are acquisition devices that, in the next few years, could have a big impact on video surveillance applications, research, and industry, as they can record a spherical view of a whole environment from every perspective. This paper presents two new contributions to the research community: the CVIP360 dataset, an annotated dataset of 360° videos for distancing applications, and a new method to estimate the distances of objects in a scene from a single 360° image. The CVIP360 dataset includes 16 videos acquired outdoors and indoors, annotated by adding information about the pedestrians in the scene (bounding boxes) and the distances to the camera of some point…
Hankelet-based dynamical systems modeling for 3D action recognition
This paper proposes to model an action as the output of a sequence of atomic Linear Time Invariant (LTI) systems. The sequence of LTI systems generating the action is modeled as a Markov chain, where a Hidden Markov Model (HMM) is used to model the transition from one atomic LTI system to another. In turn, the LTI systems are represented in terms of their Hankel matrices. For classification purposes, the parameters of a set of HMMs (one for each action class) are learned via a discriminative approach. This work proposes a novel method to learn the atomic LTI systems from training data, and analyzes in detail the action representation in terms of a sequence of Hankel matrices. Extensive eval…
Path Modeling and Retrieval in Distributed Video Surveillance Databases
We propose a framework for querying a distributed database of video surveillance data in order to retrieve a set of likely paths of a person moving in the area under surveillance. In our framework, each camera of the surveillance system locally pro- cesses the data and stores video sequences in a storage unit and the metadata for each detected person in the distributed database. A pedestrian’s path is formulated as a dynamic Bayesian network (DBN) to model the dependencies between subsequent observa- tions of the person as he makes his way through the camera net- work. We propose a tool by which the analyst can pose queries about where a certain person appeared while moving in the site duri…
Views selection for SIFT based object modeling and recognition
In this paper we focus on automatically learning object models in the framework of keypoint based object recognition. The proposed method uses a collection of views of the objects to build the model. For each object the collection is composed of N×M views obtained rotating the object around its vertical and horizontal axis. As keypoint based object recognition using a complete set of views is computationally expensive, we focused on the definition of a selection method that creates, for each object, a subset of the initial views that visually summarize the characteristics of the object and should be suited for recognition. We select the views by determining maxima and minima of a function, …
Image Digestion and Relevance Feedback in the ImageRover WWW Search Engine
Fully automatic saliency-based subjects extraction in digital images
In this paper we present a novel saliency-based technique for the automatic extraction of relevant subjects in digital images. We use enhanced saliency maps to determine the most relevant parts of the images and an image cropping technique on the map itself to extract one or more relevant subjects. The contribution of the paper is two-fold as we propose a technique to enhance the standard GBVS saliency map and a technique to extract the most salient parts of the image. The GBVS saliency map is enhanced by applying three filters particularly designed to optimize the performance for the task of relevant subjects extraction. The extraction of relevant subjects is demonstrated on a manually ann…
Boosting Hankel matrices for face emotion recognition and pain detection
HighligthsDynamics of face expression descriptors are modeled for emotion recognition.A set of Hankel matrices is built upon several multi-scale face representations.Boosting and random subspace projection are used for dynamics selection.Dynamics of Haar-like features and Gabor Energies are compared.Fine-grained dynamics of subtle expressions can be modeled at small spatial scales. Studies in psychology have shown that the dynamics of emotional expressions play an important role in face emotion recognition in humans. Motivated by these studies, in this paper the dynamics of face expressions are modeled and used for automatic emotion recognition and pain detection.Given a temporal sequence o…
Video object recognition and modeling by SIFT matching optimization
In this paper we present a novel technique for object modeling and object recognition in video. Given a set of videos containing 360 degrees views of objects we compute a model for each object, then we analyze short videos to determine if the object depicted in the video is one of the modeled objects. The object model is built from a video spanning a 360 degree view of the object taken against a uniform background. In order to create the object model, the proposed techniques selects a few representative frames from each video and local features of such frames. The object recognition is performed selecting a few frames from the query video, extracting local features from each frame and looki…
Why you trust in visual saliency
Image understanding is a simple task for a human observer. Visual attention is automatically pointed to interesting regions by a natural objective stimulus in a first step and by prior knowledge in a second step. Saliency maps try to simulate human response and use actual eye-movements measurements as ground truth. An interesting question is: how much corruption in a digital image can affect saliency detection respect to the original image? One of the contributions of this work is to compare the performances of standard approaches with respect to different type of image corruptions and different threshold values on saliency maps. If the corruption can be estimated and/or the threshold is fi…
Palmprint principal lines extraction
The palmprint recognition has become a focus in biological recognition and image processing fields. In this process, the features extraction (with particular attention to palmprint principal line extraction) is especially important. Although a lot of work has been reported, the representation of palmprint is still an open issue. In this paper we propose a simple, efficient, and accurate palmprint principal lines extraction method. Our approach consists of six simple steps: normalization, median filtering, average filters along four prefixed directions, grayscale bottom-hat filtering, combination of bottom-hat filtering, binarization and post processing. The contribution of our work is a new…
Integrating computer vision techniques and wireless sensor networks in video surveillance systems
Nowadays video-surveillance systems are essential tools to monitor sites and to guarantee the safety of people: automatic detection of moving objects in the scene and recognition of dangerous events are particularly interesting. Our project aims to realize tools and techniques for video surveillance systems in outdoor environment to detect people in an automatic real-time way without the direct control of a human operator. The reference framework consists of distributed stationary cameras coordinated with sensor networks. In particular, wireless sensors are used to sense characteristic quantities of the monitored site, such as variations in temperature, humidity, noise, vibrations, and so o…
Using Hankel matrices for dynamics-based facial emotion recognition and pain detection
This paper proposes a new approach to model the temporal dynamics of a sequence of facial expressions. To this purpose, a sequence of Face Image Descriptors (FID) is regarded as the output of a Linear Time Invariant (LTI) system. The temporal dynamics of such sequence of descriptors are represented by means of a Hankel matrix. The paper presents different strategies to compute dynamics-based representation of a sequence of FID, and reports classification accuracy values of the proposed representations within different standard classification frameworks. The representations have been validated in two very challenging application domains: emotion recognition and pain detection. Experiments on…
WhoSNext: Recommending Twitter Users to Follow Using a Spreading Activation Network Based Approach
The huge number of modern social network users has made the web a fertile ground for the growth and development of a plethora of recommender systems. To date, recommending a new user profile X to a given user U that could be interested in creating a relationship with X has been tackled using techniques based on content analysis, existing friendship relationships and other pieces of information coming from different social networks or websites. In this paper we propose a recommending architecture - called WhoSNext (WSN) - tested on Twitter and which aim is promoting the creation of new relationships among users. As recent researches show, this is an interesting recommendation problem: for a …
360° Tracking Using a Virtual PTZ Camera
Object tracking using still or PTZ cameras is a hard task for large spaces and needs several devices to completely cover the area or to track multiple subjects. The introduction of \(360^{\circ }\) camera technology offers a complete view of the scene in a single image and can be useful to reduce the number of devices needed in the tracking problem. In this paper we present a framework using \(360^{\circ }\) cameras to simulate an unlimited number of PTZ cameras and to be used for tracking. The proposed method to track a single target process an equirectangular view of the scene and obtains a model of the moving object in the image plane. The target is tracked analyzing the next frame of th…
Multi-modal non-rigid registration of medical images based on mutual information maximization
In this paper, a new multi-modal non-rigid registration technique for medical images is presented. Firstly, the registration problem is outlined and some of the most common approaches reported, then, the proposed algorithm is presented. The proposed technique is based on mutual information maximization and computes a deformation field through a suitable globally smoothed affine piecewise transformation. The algorithm has been conceived with particular attention to computational load and accuracy of results. Experimental results involving intra-patient, inter-patients and atlas images on brain CT and MR (T1, T2 and PD modalities) are reported.
Multifeature Image and Video Content-Based storage and retrieval
In this paper we present most recent evolution of JACOB, a system we developed for image and video content-based storage and retrieval. The system is based on two separate archives: a 'features DB' and a 'raw-data DB'. When a user puts a query, a search is done in the 'features DB'; the selected items are taken form the 'raw-data DB' and shown to the user. Two kinds of sessions are allowed: 'database population' and 'database querying'. During a 'database population' session the user inserts new data into the archive. The input data can consist of digital images or videos. Videos are split into shots and for each shot one or more representative frames are automatically extracted. Shots and …
Automatic image representation and clustering on mobile devices.
In this paper a novel approach for the automatic representation of pictures on mobile devices is proposed. With the wide diffusion of mobile digital image acquisition devices, the need of managing a large number of digital images is quickly increasing. In fact the storage capacity of such devices allow users to store hundreds or even thousands, of pictures that, without a proper organization, become useless. Users may be interested in using (i.e., browsing, saving, printing and so on) a subset of stored data according to some particular picture properties. A content-based description of each picture is needed to perform on-board image indexing. In our work the images are analyzed and descri…
Iterative Multiple Bounding-Box Refinements for Visual Tracking.
Single-object visual tracking aims at locating a target in each video frame by predicting the bounding box of the object. Recent approaches have adopted iterative procedures to gradually refine the bounding box and locate the target in the image. In such approaches, the deep model takes as input the image patch corresponding to the currently estimated target bounding box, and provides as output the probability associated with each of the possible bounding box refinements, generally defined as a discrete set of linear transformations of the bounding box center and size. At each iteration, only one transformation is applied, and supervised training of the model may introduce an inherent ambig…
Multi‐criteria decision‐making approach for modular enterprise resource planning sorting problems
[EN] Implementing Enterprise Resource Planning (ERP) systems is currently recognized as a best practice with wide associated possibilities of business improvement for companies. Integrating these kinds of systems with business processes in the most efficient way requires to endeavour as much as possible simplifications for final users, which can be pursued by optimizing crucial software characteristics. The present article proposes a novel Multi-Criteria Decision-Making (MCDM) approach to deal with such an issue. Specifically, the ELECTRE (ELimination Et Choix Traduisant la REalite) TRI technique is suggested to assign ERP modules into predefined and ordered categories according to maintain…
Restoration of out of focus images based on circle of confusion estimates
In this paper a new method for a fast out-of-focus blur estimation and restoration is proposed. It is suitable for CFA (Color Filter Array) images acquired by typical CCD/CMOS sensor. The method is based on the analysis of a single image and consists of two steps: 1) out-of-focus blur estimation via Bayer pattern analysis; 2) image restoration. Blur estimation is based on a block-wise edge detection technique. This edge detection is carried out on the green pixels of the CFA sensor image also called Bayer pattern. Once the blur level has been estimated the image is restored through the application of a new inverse filtering technique. This algorithm gives sharp images reducing ringing and c…
Toward an Integrated System for Surveillance and Behaviour Analysis of Groups and People
Security and INTelligence SYStem is an Italian research project which aims to create an integrated system for the analysis of multi-modal data sources (text, images, video, audio), to assist operators in homeland security applications. Within this project the Scientific Research Unit of the University of Palermo is responsible of the image and video analysis activity. The SRU of Palermo developed a web service based architecture that provides image and video analysis capabilities to the integrated analysis system. The developed architecture uses both state of the art techniques, adapted to cope with the particular problem at hand, and new algorithms to provide the following services: image …
Concurrent photo sequence organization
Personal photo album organization is a highly demanding domain where advanced tools are required to manage large photo collections. In contrast to many previous works, that try to solve the problem of organizing a single user photo sequence, we present a new technique to account for the concurrent photo sequence organization problem, that is the problem of organizing multiple photo sequences taken during the same event. Given a set of sequences acquired at the same place during the same temporal window by several users using different cameras, our framework is intended to capture the evolution of the event and groups photos based on temporal proximity and visual content. The method automati…
Hankelet-based action classification for motor intention recognition
Powered lower-limb prostheses require a natural, and an easy-to-use, interface for communicating amputee’s motor intention in order to select the appropriate motor program in any given context, or simply to commute from active (powered) to passive mode of functioning. To be widely accepted, such an interface should not put additional cognitive load at the end-user, it should be reliable and minimally invasive. In this paper we present a one such interface based on a robust method for detecting and recognizing motor actions from a low-cost wearable sensor network mounted on a sound leg providing inertial (accelerometer, gyrometer and magnetometer) data in real-time. We assume that the sensor…
Temporal segmentation of video data
A Combined Fuzzy and Probabilistic Data Descriptor for Distributed CBIR
With the wide diffusion of digital image acquisition devices, the cost of managing hundreds of digital images is quickly increasing. Currently, the main way to search digital image libraries is by keywords given by the user. However, users usually add ambiguos keywords for large set of images. A content-based system intended to automatically find a query image, or similar images, within the whole collection is needed. In our work we address the scenario where medical image collections, which nowadays are rapidly expanding in quantity and heterogeneity, are shared in a distributed system to support diagnostic and preventive medicine. Our goal is to produce an efficient content-based descript…
Mimicking biological mechanisms for sensory information fusion
Current Artificial Intelligence systems are bound to become increasingly interconnected to their surrounding environment in the view of the newly rising Ambient Intelligence (AmI) perspective. In this paper, we present a comprehensive AmI framework for performing fusion of raw data, perceived by sensors of different nature, in order to extract higher-level information according to a model structured so as to resemble the perceptual signal processing occurring in the human nervous system. Following the guidelines of the greater BICA challenge, we selected the specific task of user presence detection in a locality of the system as a representative application clarifying the potentialities of …
Extracting Touristic Information from Online Image Collections
In this paper, we present a Geographical Information Retrieval system, which aims to automatically extract and analyze touristic information from photos of online image collections (in our case of study Flickr). Our system collect all the photos, and the related information, that are associated to a specific city. We then use Google Maps service to geolocate the retrieved photos, and finally we analyze geo-referenced data to obtain our goals: 1) determining and locating the most interesting places of the city, i.e. the most visited locations, and 2) reconstructing touristic routes of the users visiting the city. Information is filtered by using a set of constraints, which we apply to select…
Motion and Color Based Video Indexing and Retrieval
In this paper we present a method for automatic motion and color based video indexing and retrieval. Our system automatically splits a video into a sequence of shots and extracts a few representative frames (r-frames) from each shot. For each r-frame we compute the optical flow field; motion features are then derived from the flow field. Color features are related to the three-dimensional RGB color histogram. Queries (direct or by example) are based on these features. Obtained results proved that motion and color based querying can play a central role in content based video retrieval
A Novel Time Series Kernel for Sequences Generated by LTI Systems
The recent introduction of Hankelets to describe time series relies on the assumption that the time series has been generated by a vector autoregressive model (VAR) of order p. The success of Hankelet-based time series representations prevalently in nearest neighbor classifiers poses questions about if and how this representation can be used in kernel machines without the usual adoption of mid-level representations (such as codebook-based representations). It is also of interest to investigate how this representation relates to probabilistic approaches for time series modeling, and which characteristics of the VAR model a Hankelet can capture. This paper aims at filling these gaps by: deriv…
Recognition of Human Actions Through Deep Neural Networks for Multimedia Systems Interaction
Nowadays, interactive multimedia systems are part of everyday life. The most common way to interact and control these devices is through remote controls or some sort of touch panel. In recent years, due to the introduction of reliable low-cost Kinect-like sensing technology, more and more attention has been dedicated to touchless interfaces. A Kinect-like devices can be positioned on top of a multimedia system, detect a person in front of the system and process skeletal data, optionally with RGBd data, to determine user gestures. The gestures of the person can then be used to control, for example, a media device. Even though there is a lot of interest in this area, currently, no consumer sy…
Semiautomatic Behavioral Change-Point Detection: A Case Study Analyzing Children Interactions With a Social Agent
The study of human behaviors in cognitive sciences provides clues to understand and describe people’s personal and interpersonal functioning. In particular, the temporal analysis of behavioral dynamics can be a powerful tool to reveal events, correlations and causalities but also to discover abnormal behaviors. However, the annotation of these dynamics can be expensive in terms of temporal and human resources. To tackle this challenge, this paper proposes a methodology to semi-automatically annotate behavioral data. Behavioral dynamics can be expressed as sequences of simple dynamical processes: transitions between such processes are generally known as change-points. This paper describes th…
On the use of Deep Reinforcement Learning for Visual Tracking: a Survey
This paper aims at highlighting cutting-edge research results in the field of visual tracking by deep reinforcement learning. Deep reinforcement learning (DRL) is an emerging area combining recent progress in deep and reinforcement learning. It is showing interesting results in the computer vision field and, recently, it has been applied to the visual tracking problem yielding to the rapid development of novel tracking strategies. After providing an introduction to reinforcement learning, this paper compares recent visual tracking approaches based on deep reinforcement learning. Analysis of the state-of-the-art suggests that reinforcement learning allows modeling varying parts of the tracki…
Mobile Interface for Content-Based Image Management
People make more and more use of digital image acquisition devices to capture screenshots of their everyday life. The growing number of personal pictures raise the problem of their classification. Some of the authors proposed an automatic technique for personal photo album management dealing with multiple aspects (i. e., people, time and background) in a homogenous way. In this paper we discuss a solution that allows mobile users to remotely access such technique by means of their mobile phones, almost from everywhere, in a pervasive fashion. This allows users to classify pictures they store on their devices. The whole solution is presented, with particular regard to the user interface impl…
Content based indexing of MPEG-4 video on relational DBMS
A Conceptual Probabilistic Model for the Induction of Image Semantics
In this paper we propose a model based on a conceptual space automatically induced from data. The model is inspired to a well-founded robotics cognitive architecture which is organized in three computational areas: sub-conceptual, linguistic and conceptual. Images are objects in the sub-conceptual area, that become "knoxels" into the conceptual area. The application of the framework grants the automatic emerging of image semantics into the linguistic area. The core of the model is a conceptual space induced automatically from a set of annotated images that exploits and mixes different information concerning the set of images. Multiple low level features are extracted to represent images and…
Ensemble of Hankel Matrices for Face Emotion Recognition
In this paper, a face emotion is considered as the result of the composition of multiple concurrent signals, each corresponding to the movements of a specific facial muscle. These concurrent signals are represented by means of a set of multi-scale appearance features that might be correlated with one or more concurrent signals. The extraction of these appearance features from a sequence of face images yields to a set of time series. This paper proposes to use the dynamics regulating each appearance feature time series to recognize among different face emotions. To this purpose, an ensemble of Hankel matrices corresponding to the extracted time series is used for emotion classification withi…
I-MALL An Effective Framework for Personalized Visits. Improving the Customer Experience in Stores
In this paper we present I-MALL, an ICT hardware and software infrastructure that enables the management of services related to places such as shopping malls, showrooms, and conferences held in dedicated facilities. I-MALL offers a network of services that perform customer behavior analysis through computer vision and provide personalized recommendations made available on digital signage terminals. The user can also interact with a social robot. Recommendations are inferred on the basis of the profile of interests computed by the system analysing the history of the customer visit and his/her behavior including information from his/her appearance, the route taken inside the facility, as well…