procrastAligner is an efficient local multiple alignment
heuristic for identifcation of conserved regions in one or more DNA sequences. More specifically, procrastAligner has been designed for local multiple alignment of interspersed DNA repeats. The algorithm consists of seven main steps:
(1) palindromic
spaced seed patterns to match both DNA strands simultaneously,
(2) seed extension (chaining) in order of decreasing multiplicity,
(3) procrastination when low multiplicity matches are encountered,
(4) gapped extension of seed chains,
(5) detect unrelated regions using a hidden Markov model,
(6) apply transitive homology relationships, and
(7) removal of any unrelated sequence from the final local multiple alignment.
The emission probabilities for
each possible pair of aligned nucleotides in our HMM for detecting unrelated regions in the local multiple alignments were extracted from the HOXD
substitution matrix presented by Chiaromonte et al 2002 "Scoring Pairwise Genomic Sequence Alignments". Further details on how we extracted & implemented these values can be found here.
For further details on the procrastAlign algorithm:
Gapped Extension for Local Multiple Alignment of Interspersed DNA Repeats. Todd J Treangen, Aaron E Darling, Mark A. Ragan, Xavier Messeguer. Lecture Notes in Bioinformatics 4983. pp. 74–86, 2008.
c Springer-Verlag Berlin Heidelberg 2008. [pdf]
Procrastination leads to efficient filtration for local multiple alignment. Aaron E Darling, Todd J Treangen, Louxin Zhang, Carla Kuiken, Xavier Messeguer, Nicole T. Perna. Lecture Notes in Bioinformatics 4175. pp. 126–137, 2006.
c Springer-Verlag Berlin Heidelberg 2006. [pdf]