There are two ways of parallel computations in the maximum likelihood method; parallel computation on topology search and branch lengths evaluation on site by site. These strategies were compared in the former work [OOta et al., 1995,OOta, 1996] under nucleotide sequence data.
The most apparent computation which can be carried out in parallel is
topology search. Search of one topology is completely independent
from that of another topology. However, the search space will increase
almost exponentially if we execute exhaustive search. For instance, 20
sequence data produce the topology space whose size is
.
Generally, the number of topologies of unrooted
bifurcating trees for n nucleotide sequences is given as
[Cavalli-Sforza and Edwards, 1967]
Therefore, it is necessary to specify the candidates one of which might give the maximum likelihood value. Several heuristic approaches were proposed [Felsenstein, 1981,Adachi and Hasegawa, 1996,Saitou, 1996]. However, these approaches will violate independency among subproblems.
The evaluation for each site on the sequences can also be executed in parallel when we assume that mutations occur at each site independently. Furthermore, we can combine the sites which have the same configurations of sites in a set of sequences, and can compute at once. This strategy is powerful in terms of independency of the subproblems (any heuristic approach will not violate independency of the subproblems). It is expected that computation time for each subproblem is almost the same. On the other hand, communication cost is higher in comparison with the former [OOta et al., 1995]. However, practically, the difference can be negligible.
Therefore, we chose the latter parallel computation strategy; branch lengths evaluation site by site in parallel (Figure 1.4).