## Extending Tree Models to Splits Networks

David Bryant

In this chapter we take statistical models designed for trees and adapt them for splits networks, a more general class of mathematical structures. The models we propose provide natural swing-bridges between trees, filling in gaps in the probability simplex. There are many reasons why we might want to do this. Firstly, the splits networks provide a graphical representation of phylogenetic uncertainty. Data that is close to tree-like produces a network that is close to a tree, while noisy or badly modeled data produce complex networks. Secondly, models that incorporate several trees open up possibilities for new tests to assess the relative support for different trees, in both likelihood and Bayesian frameworks. Thirdly, by searching through network space rather than tree space we may well be able to avoid some of the combinatorial headaches that make searching for trees so difficult.

17.1 Trees, splits and splits networks

Splits are the foundation of phylogenetic combinatorics, and they will be the building blocks of our general statistical model. Recall (from Chapter 2) that a split S = {A, B} of a finite set X is an unordered partition of X into two non-empty blocks. A phylogenetic tree for X is a pair T = (T, 0) such that T is a tree with no vertices of degree two and 0 is a bijection from X to the leaves of T.

Removing an edge e from a phylogenetic tree divides the tree into two connected components, thereby inducing a split of the set of leaves that we say is the split associated to e. We use splits(T) to denote the sets associated to edges of T. The phylogenetic tree T can be reconstructed from the collection splits(T). The Splits Equivalence Theorem (Theorem 2.35) tells us that a collection S of splits is contained in splits(T) for some phylogenetic tree T if and only if the collection is pairwise compatible, that is, for all pairs of splits {A, B}, {A', B'} at least one of the intersections

is empty.

If we think of phylogenetic trees as collections of compatible splits then it be comes easy to generalize trees: we simply consider collections of splits that may not be pairwise compatible. This is the approach taken by Split Decomposition [Bandelt and Dress, 1992], Median Networks [Bandelt et al., 1995], Spec-troNet [Huber et al., 2002], Neighbor-Net [Bryant and Moulton, 2004], Consensus Networks [Holland et al., 2004] and Z-networks [Huson et al., 2004].

The usefulness of these methods is due to a particularly elegant graphical representation for general collections of splits: the splits network. This has been the basis of SplitsTree4 [Huson and Bryant, 2005] which implements many of the tree generalizations mentioned above (see Figure 2.8).

To define splits networks, we first need to discuss splits graphs. These graphs have multiple characterizations. We will work with three of these here.

For a graph G let dG denote the (unweighted) shortest path metric. A map ^ from a graph H to a graph G is an isometric embedding if (u, v) = dG(^(u), ^(v)) for all u, v € V(H). A graph G is a partial cube if there exists an isometric embedding from G to a hypercube. [Wetzel, 1995] called these graphs splits graphs. This terminology has persisted in the phylogenetics community, despite the potential for confusion with the graph-theoretic term 'splits graph' (a special class of perfect graphs). Refer to [Imrich and Klavzar, 2000] for a long list of characterizations for partial cubes.

[Wetzel, 1995] (see also [Dress and Huson, 2004]) characterized splits graphs in terms of isometric colorings. Let k be an edge coloring of the graph. For each pair u, v € V(G) let CK (u, v) denote the set of colors that appear on every shortest path from u to v. We say that k is an isometric coloring if dG(u, v) = |CK(u, v)| for all pairs u, v € V(G). In other words, k is isometric if the edges along any shortest path all have different colors, while any two shortest paths between the same pair of vertices have the same set of edge colors. A connected graph is a splits graph if and only if it has an isometric coloring [Wetzel, 1995].

A third characterization of splits graphs is due to [Winkler, 1984]. We define a relation 0 on pairs of edges ei = {ui,vi} and e2 = {u2,v2} in a graph G by ei©e2 ^ dG(ui,u2) + dG(vi,v2) = dG(ui,v2) + dG(vi,u2). (17.1)

This relation is an equivalence relation if and only if G is a splits graph.

Two edges ei and e2 in a splits graph have the same color in an isometric coloring if and only if the isometric embedding of the splits graph into the hypercube maps ei and e2 to parallel edges, if and only if ei0e2. Thus, a splits graph has, essentially, a unique isometric coloring and a unique isometric embedding into the hypercube. The partition of edges into color classes is completely determined by the graph.

Suppose now that we have a splits graph G and a map 0: X ^ V(G). Using the isometric embedding, one can quickly prove that removing all edges in a particular color class partitions the graph into exactly two connected (and convex) components. This in turn induces a split of X, via the map 0. A splits network is a pair N = (G, 0) such that

(ii) each color class induces a distinct split of X.

The set of splits induced by the different color classes is denoted splits(N).

It is time for two examples. The splits network on the left of Figure 17.1 corresponds to a collection of compatible splits - it is a tree. In this network, every edge is in a distinct color class. If we add the split {{2,6}, {1,3,4, 5}} we obtain the splits network on the right. There are four color classes in this graph that contain more than a single edge. These are the three horizontal pairs of parallel edges and the four edges marked in bold that induce the extra split. 