0000000000222066
AUTHOR
Adam L. Buchsbaum
On the determinization of weighted finite automata
We study determinization of weighted finite-state automata (WFAs), which has important applications in automatic speech recognition (ASR). We provide the first polynomial-time algorithm to test for the twins property, which determines if a WFA admits a deterministic equivalent. We also provide a rigorous analysis of a determinization algorithm of Mohri, with tight bounds for acyclic WFAs. Given that WFAs can expand exponentially when determinized, we explore why those used in ASR tend to shrink. The folklore explanation is that ASR WFAs have an acyclic, multi-partite structure. We show, however, that there exist such WFAs that always incur exponential expansion when determinized. We then in…
On finding common neighborhoods in massive graphs
AbstractWe consider the problem of finding pairs of vertices that share large common neighborhoods in massive graphs. We prove lower bounds on the resources needed to solve this problem on resource-bounded models of computation. In streaming models, in which algorithms can access the input only a constant number of times and only sequentially, we show that, even with randomization, any algorithm that determines if there exists any pair of vertices with a large common neighborhood must essentially store and process the input graph off line. In sampling models, in which algorithms can only query an oracle for the common neighborhoods of specified vertex pairs, we show that any algorithm must …
Algorithmic Aspects of Speech Recognition: A Synopsis
Speech recognition is an area with a sizable literature, but there is little discussion of the topic within the computer science algorithms community. Since many of the problems arising in speech recognition are well suited for algorithmic studies, we present them in terms familiar to algorithm designers. Such cross fertilization can breed fresh insights from new perspectives. This material is abstracted from A. L. Buchsbaum and R. Giancarlo, Algorithmic Aspects of Speech Recognition: An Introduction, ACM Journal of Experimental Algorithmics, Vol. 2, 1997, http://www.jea.acm.org.
Improving table compression with combinatorial optimization
We study the problem of compressing massive tables within the partition-training paradigm introduced by Buchsbaum et al. [SODA'00], in which a table is partitioned by an off-line training procedure into disjoint intervals of columns, each of which is compressed separately by a standard, on-line compressor like gzip. We provide a new theory that unifies previous experimental observations on partitioning and heuristic observations on column permutation, all of which are used to improve compression rates. Based on the theory, we devise the first on-line training algorithms for table compression, which can be applied to individual files, not just continuously operating sources; and also a new, …
An Approximate Determinization Algorithm for Weighted Finite-State Automata
Nondeterministic weighted finite-state automata are a key abstraction in automatic speech recognition systems. The efficiency of automatic speech recognition depends directly on the sizes of these automata and the degree of nondeterminism present, so recent research has studied ways to determinize and minimize them, using analogues of classical automata determinization and minimization. Although, as we describe here, determinization can in the worst case cause poly-exponential blowup in the number of states of a weighted finite-state automaton, in practice it is remarkably successful. In extensive experiments in automatic speech recognition systems, deterministic weighted finite-state autom…