0000000001247718
AUTHOR
Sabato Marco Siniscalchi
Efficient rapid prototyping of image and video processing algorithms
Image and video processing tasks are often confined for real-time execution on large size workstations or expensively custom designed hardware. The current availability of mature reconfigurable hardware, like Field Programmable Gate Arrays (FPGAs), coupled with the usage of hardware programming languages offers a good path for porting such applications on portable devices. This paper explores the rapid prototyping of a real-time road sign recognition system on a FPGA, using an algorithmic-like hardware programming language: the Handel-C language. We investigate the relationship between efficient Handel-C data, structures, constructs and the related high level C data, structures, constructs.…
Experimental studies on continuous speech recognition using neural architectures with “adaptive” hidden activation functions
The choice of hidden non-linearity in a feed-forward multi-layer perceptron (MLP) architecture is crucial to obtain good generalization capability and better performance. Nonetheless, little attention has been paid to this aspect in the ASR field. In this work, we present some initial, yet promising, studies toward improving ASR performance by adopting hidden activation functions that can be automatically learned from the data and change shape during training. This adaptive capability is achieved through the use of orthonormal Hermite polynomials. The “adaptive” MLP is used in two neural architectures that generate phone posterior estimates, namely, a standalone configuration and a hierarch…
A multimodal retina-iris biometric system using the Levenshtein distance for spatial feature comparison
Abstract The recent developments of information technologies, and the consequent need for access to distributed services and resources, require robust and reliable authentication systems. Biometric systems can guarantee high levels of security and multimodal techniques, which combine two or more biometric traits, warranting constraints that are more stringent during the access phases. This work proposes a novel multimodal biometric system based on iris and retina combination in the spatial domain. The proposed solution follows the alignment and recognition approach commonly adopted in computational linguistics and bioinformatics; in particular, features are extracted separately for iris and…
A Study of Perceptron Mapping Capability to Design Speech Event Detectors
Event detection is a fundamental yet critical component in automatic speech recognition (ASR) systems that attempt to extract knowledge-based features at the front-end level. In this context, it is common practice to design the detectors inside well-known frameworks based on artificial neural network (ANN) or support vector machine (SVM). In the case of ANN, speech scientists often design their detector architecture relying on conventional feed-forward multi-layer perceptron (MLP) with sigmoidal activation function. The aim of this paper is to introduce other ANN architectures inside the context of detection-based ASR. In particular, a bank of feed-forward MLPs using sinusoidal activation f…
Application of EαNets to Feature Recognition of Articulation Manner in Knowledge-Based Automatic Speech Recognition
Speech recognition has become common in many application domains. Incorporating acoustic-phonetic knowledge into Automatic Speech Recognition (ASR) systems design has been proven a viable approach to rise ASR accuracy. Manner of articulation attributes such as vowel, stop, fricative, approximant, nasal, and silence are examples of such knowledge. Neural networks have already been used successfully as detectors for manner of articulation attributes starting from representations of speech signal frames. In this paper, a set of six detectors for the above mentioned attributes is designed based on the E-αNet model of neural networks. This model was chosen for its capability to learn hidden acti…
Application of Enets to Feature Recognition of Articulation Manner in Knowledge-based Automatic Speech Recognition
Efficient FPGA Implementation of a Knowledge-Based Automatic Speech Classifier
Speech recognition has become common in many application domains, from dictation systems for professional practices to vocal user interfaces for people with disabilities or hands-free system control. However, so far the performance of Automatic Speech Recognition (ASR) systems are comparable to Human Speech Recognition (HSR) only under very strict working conditions, and in general far lower. Incorporating acoustic-phonetic knowledge into ASR design has been proven a viable approach to rise ASR accuracy. Manner of articulation attributes such as vowel, stop, fricative, approximant, nasal, and silence are examples of such knowledge. Neural networks have already been used successfully as dete…