< Phylogenetics < Alignment Preparation
Global alignment methods attempt to align the entire length of two sequences. Without models for sequence evolution, these methods are very general and can be applied to any sequence (not just biological) similarity.
The first algorithm to discuss is the Needleman-Wunsch.
requires: seq1 = first sequence; seq2 = second sequence
//scoring matrices values
match = 1
mismatch = -1
gap = -1
sc_matrix //initialize scoring matrix the size of which is the largest sequence length
p_matrix //initialize pointing matrix the size of which is the largest sequence length
sc_matrix[0][0] = 0
p_matrix[0][0] = 0 // 0 = no where; 1 = left; 2 = right; 3 = down; 4 = up; diagonal = 5;
for i=0 to length of seq1
sc_matrix[0][i] = gap * i
p_matrix[0][i] = 1
end for
for i=0 to length of seq2
sc_matrix[i][0] = gap * i
p_matrix[i][0] = 1
end for
for i=0 to length of seq2
for j=0 to length of seq1
subseq1 <= seq1[j]
subseq2 <= seq2[i]
if subseq1 = subseq2 then
diag <= sc_matrix[i][j] + match
else
diag <= sc_matrix[i][j] + mismatch
end if
up <= sc_matrix[i][j] + gap
left <= sc_matrix[i][j] + gap
if diag >= up then
if diag>= left then
sc_matrix[i][j] <= diag
p_matrix[i][j] <= 5
else
sc_matrix[i][j] <= left
p_matrix[i][j] <= 1
end if
else
if diag>= left then
sc_matrix[i][j] <= up
p_matrix[i][j] <= 4
else
sc_matrix[i][j] <= left
p_matrix[i][j] <= 1
end if
end for
end for
al_seq1 <= ""
al_seq2 <= ""
keep_going <= true
i <= seq1.length
j <= seq2.length
while keep_going = true
if p_matrix = 0
keep_going = false
end if
if p_matrix[i][j] = 5 then
al_seq1.add(seq1[i])
al_seq2.add(seq1[j])
i <= i - 1
j <= j - 1
else if p_matrix[i][j] = 1 then
al_seq1.add(seq1[i])
al_seq2.add("-")
i <= i - 1
else if p_matrix[i][j] = 4 then
al_seq1.add("-")
al_seq2.add(seq1[j])
j <= j - 1
end if
end while
al_seq1 <= al_seq1.reverse
al_seq2 <= al_seq2.reverse
This article is issued from Wikibooks. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.