Abstract:
Suffix trees have been applied to fundamental string problems such as finding the longest repeated substring, finding all squares or repetitions in a string, computing substring statistics, approximate string matching and string comparison. They have also been used to address other types of problems such as text compression, compressing assembly code, inverted indices and analyzing genetic sequences.
Suffix array is a simpler and compact alternative to the suffix tree; suffix sorting is the fundamental building block in suffix array construction process. Suffix array widely used in field of bioinformatics and text processing. The repeat structure of genomic DNA is considered an essential mechanism for evolution and other fundamental biological functions. Any kind of repeats finding problems are always deemed as one of the prerequisites for genome sequencing and analysis, and among these problems exact repeat finding is the first step for most other repeats finding problems.
This work depicts the parallel implementation of suffix array and then their applications in bioinformatics such as exact repeats finding, tandem repeats finding, exact string matching etc. the parallel implementation of suffix array algorithm on a GPU using the Compute Unified Device Architecture (CUDA) platform, both from NVIDIA Corporation. CUDA is a parallel computing architecture. It is a middle-ware. compute engine which exposes the power of NVIDIA Graphics Processing Units to software developers through industry standard programming language. The thread level parallel code block provides an efficient primitive for building a high performance suffix array construction program and many other applications.
The parallel version runs much faster than any serial implementation on CPU for the large size of input data elements. The parallel algorithm can also be easily adapted for similar type of problems with little modification