P44

This determinant of polytopes represents a parameterized family of assignment problems. Indeed, suppose the cost qj of assigning job i to worker j depends piecewise-linearly on a vector of three parameters w = (wx,wy,wz), namely qij = min{w ■ p : p € Pj}.

Thus the cost qj is determined by solving the linear programming problem with polytope Pj. The parametric assignment problem would be to solve the assignment problem simultaneously for all vectors w € R3. In other words, we wish to preprocess the problem specification so that the cost of an optimal assignment can be computed rapidly. This preprocessing amounts to computing the irredundant V-representation of the polytope obtained from the determinant. Then the cost of an optimal assignment can be computed as follows:

min{w ■ p : p € det((Pj)i<ij<4)}. Our discussion furnishes a higher-dimensional generalization of Remark 2.6:

Remark 2.28 The parametric assignment problem is solved by computing the determinant of the matrix of polytopes (Pj) in the polytope algebra.

We can similarly define the parametric shortest path problem on a directed graph. The weight of each edge is now a polytope Pj in Rd, and for a specific parameter vector w € Rd we recover the scalar edge weights by linear programming on that polytope: dj = min{w ■ p : p € Pj}. Then the shortest path from i to j is given by d("-1> = min{w ■ p : p € Pj"-1>>}, where Pj"-1> is the (i, j)-entry in the (n — 1)-th power of the matrix (Pj). Here matrix multiplication is carried out in the polytope algebra (Pd, ®, ©).

The Hungarian algorithm for assignments and the Floyd-Warshall algorithm for shortest paths can be extended to the parametric setting. Provided the number d of parameters is fixed, these algorithms still run in polynomial time. The efficient computation of such polytopes by dynamical programming using polytope algebra arithmetic along a graph is referred to as polytope propagation (see Chapters 5-8). We close this section by revisiting the case of alignments.

Remark 2.29 The problem of parametric alignment of two DNA sequences ct1 and ct2 is to compute the Newton polytopes NP(/oi ,o2) of the corresponding coordinate polynomial /oi,o2 of the pair hidden Markov model (2.13).

If some of the scores have been specialized then we compute Newton polytopes of polynomials in fewer unknowns. For instance, if w' = 0 then our task is to compute the Newton polytope NP($oi ,o2) of the specialized polynomial go-i,^. This can be done efficiently by running the Needleman-Wunsch Algorithm 2.13 in the polytope algebra and is the topic of Chapters 5-7.

Example 2.30 Returning to Example 2.19, we observe that the 14 points v1,...,v14 are the vertices of the Newton polytope P = NP(gACG,ACC). It is important to note that all of the 14 points corresponding to monomials in gACG,ACC are in fact vertices of P, which means that every possible alignment of ACG and ACC is an optimal alignment for some choice of parameters.

The polytope P is easily computed in polymake, which confirms that the polytope is six-dimensional. The f-vector is /(P) = (14, 51, 86, 78, 39,10). These numbers have an interpretation in terms of alignments. For example, there is an edge between two vertices in the polytope if for two different optimal alignments (containing different numbers of matches, mismatches, and gaps) the parameter regions which yield the optimal alignments share a boundary. In other words, the fact that the polytope has 51 edges tells us that there are precisely 51 "parameter boundaries", where an infinitesimal change in parameters can result in a different optimal alignment. The normal cones and their defining inequalities (like the seven in Example 2.19) characterize these boundaries, thus offering a solution to the parametric alignment problem. □

2.4 Trees and metrics

One of the important mathematical structures that arises in biology is the phylogenetic tree [Darwin, 1859, Felsenstein, 2003, Semple and Steel, 2003]. A phylogenetic tree is a tree T together with a labeling of its leaves. The number of combinatorial types of phylogenetic trees with the same leaves grows exponentially (Lemma 2.33). In phylogenetics a typical problem is to select a tree, based on data, from the large number of possible choices.

This section introduces some basic concepts regarding the combinatorics of trees that are important for phylogeny. The notion of tree space is related to the tropicalization principle introduced in Section 2.1 and will be revisited in Section 3.5. A widely used algorithm in phylogenetics, the neighbor-joining algorithm, is a method for projecting a metric onto tree space. This algorithm draws on a number of ideas in phylogenetics and serves as the focus of our presentation in this section. We begin by discussing a number of different, yet combinatorially equivalent, characterizations of trees.

A dissimilarity map on [n] = {1,2,..., n} is a function d : [n] x [n] ^ R such that d(i, i) = 0 and d(i, j) = d(j, i) > 0. The set of all dissimilarity maps on [n] is a real vector space of dimension (n), which we identify with R(2). A dissimilarity map d is called a metric on [n] if the triangle inequality holds:

d(i,j) < d(i,k) + d(k,j) for i,j,k € [n]. (2.27)

A dissimilarity map d can be written as a non-negative symmetric nx n matrix D = (dij) where dij = d(i, j) and dii = 0. The triangle inequality (2.27) can be expressed by matrix multiplication where the arithmetic is tropical.

Remark 2.31 The matrix D represents a metric if and only if D 0 D = D.

Proof The entry of the matrix D 0 D in row i and column j equals di1 0 d1j 0 ••• 0 din 0 dnj = min { difc + dfcj : 1 < k < n}. (2.28)

This quantity is less than or equal to dij = dii 0 dij = dij 0 djj, and it equals clij if and only if the triangle inequality dij < d+ d^j holds for all k. □

Note that, for weighted graph G on n nodes, the matrix D^n-1 represents the corresponding graph metric. This is the content of Proposition 2.2.

The set of all metrics on [n] is a full-dimensional convex polyhedral cone in R(2), called the metric cone. The metric cone has a distinguished subcone, known as the cut cone, which is the R>0-linear span of all metrics arising from all splits {A, B} of [n] into two disjoint non-empty subsets A and B. The split metric d{A,B} is defined by d{A,B}(i,j) = 0 if i,j € A or i,j € B, d{A,B}(i, j) = 1 if i € A, j € B or i € B,j € A. (2.29)

The cut cone is strictly contained in the metric cone if n > 6. This and many other results on metrics can be found in [Deza and Laurent, 1997].

A metric d is a tree metric if there exists a tree T with n leaves, labeled by [n] = {1,2,..., n}, and a non-negative length for each edge of T, such that the length of the unique path from leaf x to leaf y equals d(x,y) for all x,y € [n]. We sometimes write dy for the tree metric d which is derived from the tree T.

Example 2.32 Let n = 4 and consider the metric d given by the matrix

The metric d is a tree metric, as can be verified by examining Figure 2.4. □

The space of trees [Billera et al., 2001] is the following subset of the metric cone:

The structure of is best understood by separating the combinatorial types

Was this article helpful?

0 0

Post a comment