0000000000162096

AUTHOR

Domenico Amato

On the Suitability of Neural Networks as Building Blocks for the Design of Efficient Learned Indexes

With the aim of obtaining time/space improvements in classic Data Structures, an emerging trend is to combine Machine Learning techniques with the ones proper of Data Structures. This new area goes under the name of Learned Data Structures. The motivation for its study is a perceived change of paradigm in Computer Architectures that would favour the use of Graphics Processing Units and Tensor Processing Units over conventional Central Processing Units. In turn, that would favour the use of Neural Networks as building blocks of Classic Data Structures. Indeed, Learned Bloom Filters, which are one of the main pillars of Learned Data Structures, make extensive use of Neural Networks to improve…

research product

Classification of Sequences with Deep Artificial Neural Networks: Representation and Architectural Issues

DNA sequences are the basic data type that is processed to perform a generic study of biological data analysis. One key component of the biological analysis is represented by sequence classification, a methodology that is widely used to analyze sequential data of different nature. However, its application to DNA sequences requires a proper representation of such sequences, which is still an open research problem. Machine Learning (ML) methodologies have given a fundamental contribution to the solution of the problem. Among them, recently, also Deep Neural Network (DNN) models have shown strongly encouraging results. In this chapter, we deal with specific classification problems related to t…

research product

Learned Sorted Table Search and Static Indexes in Small-Space Data Models

Machine-learning techniques, properly combined with data structures, have resulted in Learned Static Indexes, innovative and powerful tools that speed up Binary Searches with the use of additional space with respect to the table being searched into. Such space is devoted to the machine-learning models. Although in their infancy, these are methodologically and practically important, due to the pervasiveness of Sorted Table Search procedures. In modern applications, model space is a key factor, and a major open question concerning this area is to assess to what extent one can enjoy the speeding up of Binary Searches achieved by Learned Indexes while using constant or nearly constant-space mod…

research product

Standard Vs Uniform Binary Search and Their Variants in Learned Static Indexing: The Case of the Searching on Sorted Data Benchmarking Software Platform

Learned Indexes are a novel approach to search in a sorted table. A model is used to predict an interval in which to search into and a Binary Search routine is used to finalize the search. They are quite effective. For the final stage, usually, the lower_bound routine of the Standard C++ library is used, although this is more of a natural choice rather than a requirement. However, recent studies, that do not use Machine Learning predictions, indicate that other implementations of Binary Search or variants, namely k-ary Search, are better suited to take advantage of the features offered by modern computer architectures. With the use of the Searching on Sorted Sets SOSD Learned Indexing bench…

research product

Muscle Histopathological Abnormalities in a Patient With a CCT5 Mutation Predicted to Affect the Apical Domain of the Chaperonin Subunit.

Recognition of diseases associated with mutations of the chaperone system genes, e.g., chaperonopathies, is on the rise. Hereditary and clinical aspects are established, but the impact of the mutation on the chaperone molecule and the mechanisms underpinning the tissue abnormalities are not. Here, histological features of skeletal muscle from a patient with a severe, early onset, distal motor neuropathy, carrying a mutation on the CCT5 subunit (MUT) were examined in comparison with normal muscle (CTR). The MUT muscle was considerably modified; atrophy of fibers and disruption of the tissue architecture were prominent, with many fibers in apoptosis. CCT5 was diversely present in the sarcolem…

research product

Learned Sorted Table Search and Static Indexes in Small Model Space

Machine Learning Techniques, properly combined with Data Structures, have resulted in Learned Static Indexes, innovative and powerful tools that speed-up Binary Search, with the use of additional space with respect to the table being searched into. Such space is devoted to the ML model. Although in their infancy, they are methodologically and practically important, due to the pervasiveness of Sorted Table Search procedures. In modern applications, model space is a key factor and, infact, a major open question concerning this area is to assess to whatextent one can enjoy the speed-up of Learned Indexes while using constant or nearly constant space models.We address it here by (a) introducing…

research product

Recurrent Deep Neural Networks for Nucleosome Classification

Nucleosomes are the fundamental repeating unit of chromatin. A nucleosome is an 8 histone proteins complex, in which approximately 147–150 pairs of DNA bases bind. Several biological studies have clearly stated that the regulation of cell type-specific gene activities are influenced by nucleosome positioning. Bioinformatic studies have improved those results showing proof of sequence specificity in nucleosomes’ DNA fragment. In this work, we present a recurrent neural network that uses nucleosome sequence features representation for their classification. In particular, we implement an architecture which stacks convolutional and long short-term memory layers, with the main purpose to avoid t…

research product

A Tour of Learned Static Sorted Sets Dictionaries: From Specific to Generic with an Experimental Performance Analysis

In recent years, in the era of Big Data, studying new methods to improve the performance of well-known procedures, such as searching in a Sorted Set, has become crucial in many fields. A new trend emerging in this scenario combines Machine Learning models with Data Structures, generating the so-called Learned Data Structures. In this thesis, we provide an in-depth experimental study of the use of these models, starting from some evidence known to experts in the field but not experimentally investigated concerning the use of very complex models such as Neural Networks. Then, we document a time/space trade-off scenario that is very important for practitioners and designers users. Furthermore,…

research product

Learning from Data to Speed-up Sorted Table Search Procedures: Methodology and Practical Guidelines

Sorted Table Search Procedures are the quintessential query-answering tool, with widespread usage that now includes also Web Applications, e.g, Search Engines (Google Chrome) and ad Bidding Systems (AppNexus). Speeding them up, at very little cost in space, is still a quite significant achievement. Here we study to what extend Machine Learning Techniques can contribute to obtain such a speed-up via a systematic experimental comparison of known efficient implementations of Sorted Table Search procedures, with different Data Layouts, and their Learned counterparts developed here. We characterize the scenarios in which those latter can be profitably used with respect to the former, accounting …

research product

CORENup: a combination of convolutional and recurrent deep neural networks for nucleosome positioning identification

Abstract Background Nucleosomes wrap the DNA into the nucleus of the Eukaryote cell and regulate its transcription phase. Several studies indicate that nucleosomes are determined by the combined effects of several factors, including DNA sequence organization. Interestingly, the identification of nucleosomes on a genomic scale has been successfully performed by computational methods using DNA sequence as input data. Results In this work, we propose CORENup, a deep learning model for nucleosome identification. CORENup processes a DNA sequence as input using one-hot representation and combines in a parallel fashion a fully convolutional neural network and a recurrent layer. These two parallel …

research product

A Learned Sorted Table Search Library

This library includes a collection of methods for performing element search in ordered tables, starting from textbook implementations to more complex algorithms

research product

A Benchmarking Platform for Atomic Learned Indexes

This repository provides a benchmarking platform to evaluate how Feed Forward Neural Networks can be effectively used as index data structures.

research product