next up previous
Next: Database operation Up: DeepForest: a Molecular Evolutionary Previous: Tree reconstruction

   
Tree superimposition

When we have several gene trees, we may need to interpret the gene trees in terms of biology. It is not feasible in some cases. It often occurs that given gene trees are incongruent to each other even if they are all correct in terms of molecular evolution. Our objectives are not only to obtain evolutionary relationships of genes, but also to obtain biological knowledge derived from the evolutionary relationships of genes; phylogeny of species, evolutionary differentiation, and so on.

So far, we intuitively infer such biological relationships from gene trees. However, such strategy is somewhat crude. DeepForest provide numerical evaluation for gene tree superimposition.

Tree comparison methods were proposed by many authors (e.g., Foulds et al. 1979; Robinson and Foulds 1981; Zhang and Shasha 1989; Shasha et al. 1994).

However, our intention is not pairwise tree matching, but superimposition of multiple trees. Furthermore, internal nodes are unlabeled in our problem, since it is virtually impossible to know each correspondence among them. I thus have developed an algorithm to superimpose multiple gene trees. In my algorithm, topology distances [Foulds et al., 1979] between a given topology and each gene tree are computed, and their sum is assigned to the topology as a ``negative'' score. This computation is iterated for all possible topologies. Finally, the topology (or topologies) having the smallest score is chosen as the most probable superimposed tree. In general tree matching algorithms, pruning costs and weights for tree editing according to the depth are introduced [Zhang and Shasha, 1989,Shasha et al., 1994]. To reduce computational cost, they were omitted in our method, that is, all the pruning costs = 0 and all the weights = 1.

The algorithm is summarized as follows:

1.
A list of all tissue classes appeared in gene trees is generated.

2.
Tissue classes appeared just once are removed from the list because such classes do not affect scores when the pruning costs were omitted.

3.
All possible topologies are generated for the number of tissue classes.

4.
Topology distances between each of the generated topologies and gene trees are computed, and are summed up as a score.

5.
The former operation is iterated for all the possible topologies.

6.
Obtained scores are sorted.

A program named super was developed to perform the above algorithm. Table 1.3 shows the input format for program super.


  
Table 1.3: Input format for program super
\begin{table}\begin{center}
\shadowbox{%
\begin{minipage}{15cm}
\begin{tex2htm...
...shown\end{verbatim}\end{tex2html_preform}\end{minipage}}
\end{center}\end{table}


next up previous
Next: Database operation Up: DeepForest: a Molecular Evolutionary Previous: Tree reconstruction
Satoshi OOta
1999-03-06