Genomic-scale analysis of DNA words of arbitrary length by parallel computation.
In the post-genomic era, one of the main tasks is deciphering the meaning of the DNA sequences of complex organisms. In order to do so, there is a clear need for biocomputer tools able to extract and order the information of long DNA molecules, such as whole chromosomes or even complete genomes. However, most genomic analyses have been concentrated on the detection and counting of short words having sizes of between 1 and 10 nucleotides. In this paper, we describe parallel algorithms with different complexities that exhaustively determine all words of size k, k being arbitrarily large, in a source DNA sequence. The results shown that our algorithms achieve a high degree of scalability, allo…