When we have several gene trees, we may need to interpret the gene trees in terms of biology. It is not feasible in some cases. It often occurs that given gene trees are incongruent to each other even if they are all correct in terms of molecular evolution. Our objectives are not only to obtain evolutionary relationships of genes, but also to obtain biological knowledge derived from the evolutionary relationships of genes; phylogeny of species, evolutionary differentiation, and so on.
So far, we intuitively infer such biological relationships from gene trees. However, such strategy is somewhat crude. DeepForest provide numerical evaluation for gene tree superimposition.
Tree comparison methods were proposed by many authors (e.g., Foulds et al. 1979; Robinson and Foulds 1981; Zhang and Shasha 1989; Shasha et al. 1994).
However, our intention is not pairwise tree matching, but superimposition of multiple trees. Furthermore, internal nodes are unlabeled in our problem, since it is virtually impossible to know each correspondence among them. I thus have developed an algorithm to superimpose multiple gene trees. In my algorithm, topology distances [Foulds et al., 1979] between a given topology and each gene tree are computed, and their sum is assigned to the topology as a ``negative'' score. This computation is iterated for all possible topologies. Finally, the topology (or topologies) having the smallest score is chosen as the most probable superimposed tree. In general tree matching algorithms, pruning costs and weights for tree editing according to the depth are introduced [Zhang and Shasha, 1989,Shasha et al., 1994]. To reduce computational cost, they were omitted in our method, that is, all the pruning costs = 0 and all the weights = 1.
The algorithm is summarized as follows:
A program named super was developed to perform the above algorithm. Table 1.3 shows the input format for program super.