next up previous
Next: Tree superimposition Up: DeepForest: a Molecular Evolutionary Previous: Application programs

Tree reconstruction

The maximum likelihood method [Felsenstein, 1981] is known to be robust and reliable among many methods for reconstruction of molecular phylogenetic trees [Kuhner and Felsenstein, 1994,Saitou and Imanishi, 1989]. In this method, the concept likelihood is defined as the measure for closeness between given data and a hypothesis. We explore the hypothesis space, and select one or several hypotheses which give the maximum likelihood.

Unfortunately, however, this method requires extremely high computational cost [Felsenstein, 1993]. One of good solutions for this problem is parallel computation. Program fastDNAml [Olsen et al., 1994] is a maximum likelihood method implemented into parallel environment, however, amino acid sequences are not available in this program. ProtML [Adachi and Hasegawa, 1996] is virtually only application program of the maximum likelihood method in which amino acid data are available. However, this program is basically for sequential computation. Generally speaking, more flexible parallel computation is required in actual data analyses.

Our objectives are as follows:

The maximum likelihood method to infer a phylogenetic tree is based on the stochastic concepts. We assume a tree which has a certain topology, and calculate the values of the likelihood on each tree and actual sequence data.

The likelihood values are given by the following equation [Felsenstein, 1981].

 \begin{displaymath}l_{s_k}^{(k)} = [\sum_{s_i} P_{s_k s_i}(v_i) l_{s_i}^{(i)}][ \sum_{s_j} P_{s_k s_j}(v_j) l_{s_j}^{(j)}],
\end{displaymath} (1.8)

where lsk(k) is the likelihood of node k with state of nucleotide or amino acid sk. lsk(k) is the product of likelihoods regarding two daughter nodes i and j (Figure 1.2).


  
Figure 1.2: Recursive definition of conditional likelihood lSk(k).
\begin{figure}
\begin{center}
\epsfxsize=10.0cm
\epsffile{09.eps}\end{center}\end{figure}

To search for the stationary point of the likelihood surface, we calculate differentiation of the likelihood with branch length, and find the point at which differentiation is 0. Of course, there is no guarantee that this stationary point gives the largest likelihood value. It is empirically known that the stationary point corresponds to the maximum value. However, the strict theory of this calculation remains to be established.

To search the point at which differentiation is 0, an iteration is applied. An appropriate initial value is set to an equation that has been derived from equation 1.8, and the obtained value is set as a new initial value recursively.


next up previous
Next: Tree superimposition Up: DeepForest: a Molecular Evolutionary Previous: Application programs
Satoshi OOta
1999-03-06