0000000000123056
AUTHOR
M. La Cascia
A Data Association Algorithm for People Re-Identification in Photo Sequences
In this paper, a new system is presented to support the user in the face annotation task. Every time a photo sequence becomes available, the system analyses it to detect and cluster faces in set corresponding to the same person. We propose to model the problem of people re-identification in photos as a data association problem. In this way, the system takes advantage from the assumption that each person can appear at most once in each photo. We propose a fully automated method for grouping facial images, the method does not require any initialization neither a priori knowledge of the number of persons that are in the photo sequence. We compare the results obtained with our method and with s…
Combining textual and visual cues for content-based image retrieval on the World Wide Web
A system is proposed that combines textual and visual statistics in a single index vector for content-based search of a WWW image database. Textual statistics are captured in vector form using latent semantic indexing (LSI) based on text in the containing HTML document. Visual statistics are captured in vector form using color and orientation histograms. By using an integrated approach, it becomes possible to take advantage of possible statistical couplings between the content of the document (latent semantic content) and the contents of images (visual statistics). The combined approach allows improved performance in conducting content-based search. Search performance experiments are report…
Mean shift clustering for personal photo album organization
In this paper we propose a probabilistic approach for the automatic organization of pictures in personal photo album. Images are analyzed in term of faces and low-level visual features of the background. The description of the background is based on RGB color histogram and on Gabor filter energy accounting for texture information. The face descriptor is obtained by projection of detected and rectified faces on a common low dimensional eigenspace. Vectors representing faces and background are clustered in an unsupervised fashion exploiting a mean shift clustering technique. We observed that, given the peculiarity of the domain of personal photo libraries where most of the pictures contain fa…
Restoration of out-of-focus images based on circle of confusion estimate
In this paper a new method for a fast out-of-focus blur estimation and restoration is proposed. It is suitable for CFA (Color Filter Array) images acquired by typical CCD/CMOS sensor. The method is based on the analysis of a single image and consists of two steps: 1) out-of-focus blur estimation via Bayer pattern analysis; 2) image restoration. Blur estimation is based on a block-wise edge detection technique. This edge detection is carried out on the green pixels of the CFA sensor image also called Bayer pattern. Once the blur level has been estimated the image is restored through the application of a new inverse filtering technique. This algorithm gives sharp images reducing ringing and c…
Head Tracking via Robust Registration in Texture Map Images.
A novel method for 3D head tracking in the presence of large head rotations and facial expression changes is described. Tracking is formulated in terms of color image registration in the texture map of a 3D surface model. Model appearance is recursively updated via image mosaicking in the texture map as the head orientation varies. The resulting dynamic texture map provides a stabilized view of the face that can be used as input to many existing 2D techniques for face recognition, facial expressions analysis, lip reading, and eye tracking. Parameters are estimated via a robust minimization procedure; this provides robustness to occlusions, wrinkles, shadows and specular highlights. The syst…
Video Indexing Using MPEG Motion Compensation Vectors
In the last years a lot of work has been done on color, textural, structural and semantic indexing of "content-based" video databases. Motion-based video indexing has been less explored, with approaches generally based on the analysis of optical flows. Compressed videos require the decompression of the sequences and the computation of optical flows, two steps computationally heavy. In this paper we propose some methods to index videos by motion features (mainly related to camera motion) and by motion-based spatial segmentation of frames, in a fully automatic way. Our idea is to use MPEG motion vectors as an alternative to optical flows. Their extraction is very simple and fast; it doesn't r…
Real-Time Object Detection in Embedded Video Surveillance Systems
In this paper we report a new method to detect both moving objects and new stationary objects in video sequences. On the basis of temporal consideration we classify pixels into three classes: background, midground and foreground to distinguish between long-term, medium-term and short-term changes. The algorithm has been implemented on a hardware platform with limited resources and it could be used in a wider system like a wireless sensor networks. Particular care has been put in realizing the algorithm so that the limited available resources are used in an efficient way. Experiments have been conducted on publicly available datasets and performance measures are reported.
Fully automatic, real-time detection of facial gestures from generic video
A technique for the detection of facial gestures from low resolution video sequences is presented. The technique builds upon the automatic 3D head tracker formulation of [M. La Cascia et al., 2000]. The tracker is based on the registration of a texture-mapped cylindrical model. Facial gesture analysis is performed in the texture map by assuming that the residual registration error can be modeled as a linear combination of facial motion templates. Two formulations are proposed and tested. In one formulation, the head and facial motion are estimated in a single, combined linear system. In the other formulation, head motion and then facial motion are estimated in a two-step process. The two-st…
3D Stereoscopic Image Pairs by Depth-map Generation
User detection through multi-sensor fusion in an AmI scenario
Recent advances in technology, with regard to sensing and transmission devices, have made it possible to obtain continuous and precise monitoring of a wide range of qualitatively diverse environments. This has boosted the research on the novel field of Ambient Intelligence, which aims at exploiting the information about the environment state in order to adapt it to the user’s preference. In this paper, we analyze the issue of detecting the user’s presence in a given region of the monitored area, which is crucial in order to trigger subsequent actions. In particular, we present a comprehensive framework that turns data perceived by sensors of different nature, and with possible imprecision, …
Video indexing using optical flow field
The increasing development of advanced multimedia applications requires new technologies for organizing and retrieving by content databases of digital video. Several content based features (color, texture, motion, etc.) are needed to perform a reliable content based retrieval. We present a method for automatic motion based video indexing and retrieval. A prototypal system has been developed to prove the validity of our approach. Our system automatically splits a video into a sequence of shots, extracts a few representative frames (said r-frames) from each shot and computes some motion based features related to the optical flow field. Motion based queries are then performed either in a quali…
A P2P Architecture for Multimedia Content Retrieval
The retrieval facilities of most Peer-to-Peer (P2P) systems are limited to queries based on unique identifiers or small sets of keywords. This approach can be highly labor-intensive and inconsistent. In this paper we investigate a scenario where a huge amount of multimedia resources are shared in a P2P network, by means of efficient content-based image and video retrieval functionalities. The challenge in such systems is to limit the number of sent messages, maximizing the usefulness of each peer contacted in the query process. We achieve this goal by the adoption of a novel algorithm for routing user queries. The proposed approach exploits compact representations of multimedia resources sh…
Real-time estimation of geometrical transformation between views in distributed smart-cameras systems
In this paper, we present a method to automatically estimate the geometric relations among the different views of cameras with partially overlapping fields of view in a wireless video-surveillance system. The method uses the locations of the detected moving objects visible at the same time in two or more views. The correspondences among objects are found by comparing their appearance models based on dominant colour descriptors while the geometric transformation are computed iteratively and may be used to solve the consistent labelling problem. As a significant part of the processing is performed on the smart cameras, the method has been conceived by taking into account the limited resources…
Method of obtaining a Depth Map from a digital Image
Notice of Violation of IEEE Publication Principles: Enhanced P2P Services Providing Multimedia Content
[This paper has been withdrawn by the publisher]Traditional peer-to-peer (P2P) services provide only basic searching facilities, based on unique identifiers or small sets of keywords. Unfortunately, this approach is very inadequate and inefficient when a huge amount of multimedia resources is shared. In this paper, we present an original image and video sharing system, in which a user is able to interactively search interesting resources by means of content-based image and video retrieval techniques. In order to limit the network traffic cost, maximizing the usefulness of each peer contacted in the query process, we also propose the adoption of an adaptive overlay routing algorithm, exploit…
Detection of Hate Speech Spreaders using Convolutional Neural Networks
In this paper we describe a deep learning model based on a Convolutional Neural Network (CNN). The model was developed for the Profiling Hate Speech Spreaders (HSSs) task proposed by PAN 2021 organizers and hosted at the 2021 CLEF Conference. Our approach to the task of classifying an author as HSS or not (nHSS) takes advantage of a CNN based on a single convolutional layer. In this binary classification task, on the tests performed using a 5-fold cross validation, the proposed model reaches a maximum accuracy of 0.80 on the multilingual (i.e., English and Spanish) training set, and a minimum loss value of 0.51 on the same set. As announced by the task organizers, the trained model presente…
Three-domain image representation for personal photo album management
In this paper we present a novel approach for personal photo album management. Pictures are analyzed and described in three representation spaces, namely, faces, background and time of capture. Faces are automatically detected and rectified using a probabilistic feature extraction technique. Face representation is then produced by computing PCA (Principal Component Analysis). Backgrounds are represented with low-level visual features based on RGB histogram and Gabor filter bank. Temporal data is obtained through the extraction of EXIF (Exchangeable image file format) data. Each image in the collection is then automatically organized using a mean-shift clustering technique. While many system…
A new algorithm for bit rate allocation in JPEG2000 tile encoding
A new algorithm for allocating a given bit rate to different image tiles in the JPEG2000 encoding system is proposed. The algorithm outperforms other approaches commonly used in implementations. The new algorithm is suitable when information content is not equally distributed across the image. It is based on the computation of an index of the information content of each tile. To implement the proposed approach, we modified JasPer, a free software-based JPEG2000 coder implementation (Adams, M.D. and Kossentini, F., Proc. IEEE Int. Conf. on Image Process., vol.2, p.53-6, 2000). The experimentation was carried out on a subset of the JPEG2000 test images. Experimental results are reported, show…
Fast, Reliable Head Tracking Under Varying Illumination
An improved technique for 3D head tracking under varying illumination conditions is proposed. The head is modeled as a texture mapped cylinder. Tracking is formulated as an image registration problem in the cylinder's texture map image. To solve the registration problem in the presence of lighting variation and head motion, the residual error of registration is modeled as a linear combination of texture warping templates and orthogonal illumination templates. Fast and stable on-line tracking is then achieved via regularized weighted least squares minimization of the registration error. The regularization term tends to limit potential ambiguities that arise in the warping and illumination te…
Tracking your detector performance: How to grow an effective training set in tracking-by-detection methods
In many tracking-by-detection approaches, a self-learning strategy is adopted to augment the training set with new positive and negative instances, and to refine the classifier weights. Previous works focus mainly on the learning algorithm and assume the detector is never wrong while classifying samples at the current frame; the most confident sample is chosen as the target, and the training set is augmented with samples selected in its surrounding area. A wrong choice of such samples may degrade the classifier parameters and cause drifting during tracking. In this paper, the focus is on how samples are chosen while retraining the classifier. A particle filtering framework is used to infer …
Texture classification for content-based image retrieval
An original approach to texture-based classification of regions, for image indexing and retrieval, is presented. The system addresses automatic macro-textured ROI detection, and classification: we focus our attention on those objects that can be characterized by a texture as a whole, like trees, flowers, walls, clouds, and so on. The proposed architecture is based on the computation of the /spl lambda/ vector from each selected region, and classification of this feature by means of a pool of suitably trained support vector machines (SVM). This approach is an extension of the one previously developed by some of the authors to classify image regions on the basis of the geometrical shape of th…
Content Based Indexing of Image and Video Databases by Global and Shape Features
Indexing and retrieval methods based on the image content are required to effectively use information from the large repositories of digital images and videos currently available. Both global (colour, texture, motion, etc.) and local (object shape, etc.) features are needed to perform a reliable content based retrieval. We present a method for automatic extraction of global image features, like colour and motion parameters, and their use for data restriction in video database querying. Further retrieval is therefore accomplished, in a restricted set of images, by shape feature (skeleton, local symmetry moments, correlation, etc.) local search. The proposed indexing methodology has been deve…
Notice of Violation of IEEE Publication Principles: Distributed Multimedia Digital Libraries on Peer-to-Peer Networks
This paper presents an original approach to image sharing in large, distributed digital libraries, in which a user is able to interactively search interesting resources by means of content-based image retrieval techniques. The approach described here addresses the issues arising when the content is managed through a peer-to-peer architecture. In this case, the retrieval facilities are likely to be limited to queries based on unique identifiers or small sets of keywords, which may be quite inadequate, so we propose a novel algorithm for routing user queries that exploits compact representations of multimedia resources shared by each peer in order to dynamically adapt the network topology to …
ImageRover: A Content-Based Image Browser for the World Wide Web
ImageRover is a search-by-image-content navigation tool for the World Wide Web (WWW). To gather images expediently, the image collection subsystem utilizes a distributed fleet of WWW robots running on different computers. The image robots gather information about the images they find, computing the appropriate image decompositions and indices, and store this extracted information in vector form for searches based on image content. At search time, users can iteratively guide the search through the selection of relevant examples. Search performance is made efficient through the use of an approximate, optimized k-d tree algorithm. The system employs a novel relevance feedback algorithm that se…