Search results for "Multiset"
showing 10 items of 10 documents
An extension of the Burrows-Wheeler Transform and applications to sequence comparison and data compression
2005
We introduce a generalization of the Burrows-Wheeler Transform (BWT) that can be applied to a multiset of words. The extended transformation, denoted by E, is reversible, but, differently from BWT, it is also surjective. The E transformation allows to give a definition of distance between two sequences, that we apply here to the problem of the whole mitochondrial genome phylogeny. Moreover we give some consideration about compressing a set of words by using the E transformation as preprocessing.
A new Euler–Mahonian constructive bijection
2011
AbstractUsing generating functions, MacMahon proved in 1916 the remarkable fact that the major index has the same distribution as the inversion number for multiset permutations, and in 1968 Foata gave a constructive bijection proving MacMahon’s result. Since then, many refinements have been derived, consisting of adding new constraints or new statistics.Here we give a new simple constructive bijection between the set of permutations with a given number of inversions and those with a given major index. We introduce a new statistic, mix, related to the Lehmer code, and using our new bijection we show that the bistatistic (mix,INV) is Euler–Mahonian. Finally, we introduce the McMahon code for …
A loopless algorithm for generating the permutations of a multiset
2003
AbstractMany combinatorial structures can be constructed from simpler components. For example, a permutation can be constructed from cycles, or a Motzkin word from a Dyck word and a combination. In this paper we present a constructor for combinatorial structures, called shuffle on trajectories (defined previously in a non-combinatorial context), and we show how this constructor enables us to obtain a new loopless generating algorithm for multiset permutations from similar results for simpler objects.
An extension of the Burrows-Wheeler Transform
2007
AbstractWe describe and highlight a generalization of the Burrows–Wheeler Transform (bwt) to a multiset of words. The extended transformation, denoted by ebwt, is reversible. Moreover, it allows to define a bijection between the words over a finite alphabet A and the finite multisets of conjugacy classes of primitive words in A∗. Besides its mathematical interest, the extended transform can be useful for applications in the context of string processing. In the last part of this paper we illustrate one such application, providing a similarity measure between sequences based on ebwt.
SORTING CONJUGATES AND SUFFIXES OF WORDS IN A MULTISET
2014
In this paper we are interested in the study of the combinatorial aspects related to the extension of the Burrows-Wheeler transform to a multiset of words. Such study involves the notion of suffixes and conjugates of words and is based on two different order relations, denoted by <lex and ≺ω, that, even if strictly connected, are quite different from the computational point of view. In particular, we introduce a method that only uses the <lex sorting among suffixes of a multiset of words in order to sort their conjugates according to ≺ω-order. In this study an important role is played by Lyndon words. This strategy could be used in applications specially in the field of Bioinformatic…
Multiset Kernel CCA for multitemporal image classification
2013
The analysis of multitemporal remote sensing images is becoming an increasingly important problem because of the upcoming scenario of multispectral satellite constellations monitoring our Planet. Algorithms that can analyze such amount of heterogeneous information are necessary. While linear techniques have been extensively deployed, this work considers a kernel method that finds nonlinear correlations between all image sources and the class labels. We introduce in this context the Kernel Canonical Correlation Analysis (KCCA) to exploit the wealth of temporal image information and to handle nonlinear relations in a natural way via kernels. To achieve this goal, we use the generalization of …
Suffixes, Conjugates and Lyndon Words
2013
In this paper we are interested in the study of the combinatorial aspects connecting three important constructions in the field of string algorithms: the suffix array, the Burrows-Wheeler transform (BWT) and the extended Burrows-Wheeler transform (EBWT). Such constructions involve the notions of suffixes and conjugates of words and are based on two different order relations, denoted by $\plex$ and $\pom$, that, even if strictly connected, are quite different from the computational point of view. In this study an important role is played by Lyndon words. In particular, we improve the upper bound on the number of symbol comparisons needed to establish the $\pom$ order between two primitive wo…
A New Combinatorial Approach to Sequence Comparison
2008
In this paper we introduce a new alignment-free method for comparing sequences which is combinatorial by nature and does not use any compressor nor any information-theoretic notion. Such a method is based on an extension of the Burrows-Wheeler Transform, a transformation widely used in the context of Data Compression. The new extended transformation takes as input a multiset of sequences and produces as output a string obtained by a suitable rearrangement of the characters of all the input sequences. By using such a transformation we give a general method for comparing sequences that takes into account how much the characters coming from the different input sequences are mixed in the output…
A hybrid approach to semantic web services matchmaking
2008
AbstractDeploying the semantics embedded in web services is a mandatory step in the automation of discovery, invocation and composition activities. The semantic annotation is the “add-on” to cope with the actual interoperability limitations and to assure a valid support to the interpretation of services capabilities. Nevertheless many issues have to be reached to support semantics in the web services and to guarantee accurate functionality descriptions. Early efforts address automatic matchmaking tasks, in order to find eligible advertised services which appropriately meet the consumer’s demand. In the most of approaches, this activity is often entrusted to software agents, able to drive re…
Combining PCA and multiset CCA for dimension reduction when group ICA is applied to decompose naturalistic fMRI data
2015
An extension of group independent component analysis (GICA) is introduced, where multi-set canonical correlation analysis (MCCA) is combined with principal component analysis (PCA) for three-stage dimension reduction. The method is applied on naturalistic functional MRI (fMRI) images acquired during task-free continuous music listening experiment, and the results are compared with the outcome of the conventional GICA. The extended GICA resulted slightly faster ICA convergence and, more interestingly, extracted more stimulus-related components than its conventional counterpart. Therefore, we think the extension is beneficial enhancement for GICA, especially when applied to challenging fMRI d…