Graphic sequence alignment

This script will display the dynamic programming matrix and the traceback for alignment of two amino acid sequences (proteins). It makes no sense to use this program to align nucleotide sequences (DNA or RNA).

The alignment may be global (Needleman-Wunsch) or local (Smith-Waterman). The algorithms can use linear gap penalties or affine gap penalties, and one of three different BLOSUM substitution matrices. In the case of affine gap penalties, three scores are shown at each alignment end-point: the best score for an alignment ending with a gap in the top sequence, the best score for an alignment ending with a match between two amino acids, and the best score for an alignment ending with a gap in the left sequence.

The traceback is shown graphically. Green arrows correspond to the optimal alignment shown; blue arrows correspond to alternative optimal alignments; and red arrows correspond to the possible traceback directions for every given matrix cell.

Type in two amino acid sequences (using the amino acid letters ARNDCQEGHILKMFPSTWYV), select alignment method, and press the Align button:

Sequence 1
Sequence 2
Alignment
Substitution matrix
Gap costs
Show traceback No   Yes

The source code is available: the Perl script and a supporting Perl module. The script now runs on a tiny Raspberry Pi Linux server.

The example sequences are from Chapter 2 of Durbin et al.: Biological sequence analysis. Cambridge University Press 1998.

Here is a collection the various substitution matrices (BLOSUM, PAM, GONNET).


Peter Sestoft 2003-04-11, 2003-09-11, 2011-11-23, 2016-02-01