# Minimum Spanning Tree

## Introduction

The following algorithm to compute a minimum spanning tree is quite simple and quite interesting. It can be done by a person as well as by a computer.

Consider some nodes randomly placed that we wish to connect together with the least expenditure of material. The nodes might represent lots of different items. Perhaps internet devices that we wish to attach together to a single line.

The image on the left shows 15 nodes randomly scattered. Each node is labeled by a letter A-Z and they are different colors. On the right the same nodes are connected by yellow lines and the nodes themselves are yellow as well.

The 14 lines form a minimum spanning tree. That is, you cannot connect the nodes with less cable, string, pipe, or whatever.

## The Algorithm

The process is simple and pretty intuitive.

1. Find the 2 (unconnected) nodes that are closest to each other.
2. Connect them with a link unless the link would form a loop.
3. Repeat with the next 2 closest nodes. Stop when you are unable to not form a loop.

Obviously, the second condition is a little tricky. The way to do it easily is to assign each node to a group. All nodes reachable from each other belong to the same group. Watch the little video.

Animation. Return with the browser's Back button)

In this gif movie, groups are shown as colors. When each link is set, the two groups are merged into one. Now the rule for avoiding loops is easy. "Don't link two nodes that belong to the same group".

Initially, each node is in a group by itself. With each link, one group falls away and the process is done when there is just one group remaining. So it's easy to see that in a graph of N nodes we will finish with N-1 links, since we lose a group with each link created.

## The Code

As with the other projects in this group, the sarg, camera and pgcon modules are described in the Pglib writeup If you haven't already done so, you may want to read about them first.

The program code below is available as minspan.py. The whole package can be downloaded as a zip file.

The nodes are represented by Minnode instances and defined with

```06 class Minnode :
07     def __init__ (self, pos, label, group) :
08         self.pos = pos
09         self.label = label
10         self.group = group
11
12     def __str__ (self) :
13         return "<%s at %s in group %d>" % (self.label, self.pos, self.group)
14
15     __repr__ = __str__
16
```

The attributes of a node are simply its position, label (uppercase letter) and the group (an integer) that it is part of. By overriding both the __str__ and __repr__ methods a nicer description of the node is produced for printing.

The class Minspan is a singleton. The code for it follows:

```17 class Minspan :
18     def __init__ (self, pgcon, pixels, nNodes) :
19         letters    = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
20         self.pgcon   = pgcon            # pgame interface
21         self.nodes = [Minnode(randpos(20,pixels-20), letters[i], i+1)
22                                     for i in range(nNodes)]
24         self.pixels = pixels
25         for m in range(nNodes) :
26             for n in range(m+1,nNodes) :
27                 x = self.nodes[m]
28                 y = self.nodes[n]
29                 dist = euclDist2(x.pos,y.pos)
```

Attributes consist of a connection to our pgcon modules, the size of our square window in pixels and the number of nodes to produce. The nodes are produced using a list comprehension in line 21, assigning x and y position at random within the window and assigning the labels to upper case A,B,C... .

A list of plinks (potential links) is built in the double for loop (line 25). Each potential link is a tuple containing references to two nodes and a distance (actually distance squared) between them. When plinks is sorted our list is nicely in order of shortest to longest links.

Since every node can link to every other that we would have N*(N-1) potential links. But a link between A and B is the same as one from B to A. The little m+1 in line 26 keeps us from generating such duplicates or even trying to link a node to itself.

The list links starts out empty and is populated from the plinks using the rules above. While the (N-1) links are found from the N*(N-1)/2 plinks the animation is created. There are 2 frames for each link made, one (line 43) showing which 2 groups are to be merged and the other (line 48) after the merge has taken place.

```33
38             dst, nodA, nodB = plink
39             if nodA.group != nodB.group :
41                 masterGroup = nodA.group
42                 slaveGroup  = nodB.group
43                 self.display("Will link %s->%s" % (nodA.label, nodB.label))
45                 for node in self.nodes :
46                     if node.group == slaveGroup :
47                         node.group = masterGroup
48                 self.display("Grouped %s with %s" % (slaveGroup,masterGroup))
51
```

The last method of the Minspan class creates the display itself

```52     def display (self, banner) :
53         pgcon = self.pgcon
54         pgcon.newScreen()
55         for a,b in self.links :
56             if a.group != b.group : color = WHITE
57             else                  : color = colors[a.group%7]
58             pgcon.lineDraw(color, a.pos, b.pos, 1)
59         for node in self.nodes :
60             color = colors[node.group%7]
61             pgcon.textDraw(color, node.pos, node.label)
62         pgcon.textDraw(WHITE, (self.pixels/2,10), banner)
63         pgcon.camera.armed = 1
64         pgcon.writeScreen(.500)
65
```

Each call to the display method creates one frame in the animation. The interaction with the pgcon and camera modules is the same as in the kmeans project. First the links are drawn in the color of their group (or white if not yet assigned). Then the labels of the nodes are drawn on top as text. The camera is armed in case png images are to be created, either for their own use or later compacted into a gif file (or both). A 1/2 second wait between frames lets us keep up with the action.

Finally, 2 small helper functions generate random x/y positions on the screen for placement of nodes and calculate the Euclidean distance for comparison of link lengths. Actually, since these distance are only used for the comparison, we don't need to take the square root to calculate the actual distance. The distance squared is just fine.

```66 def euclDist2(a,b) :
67     return (a[0]-b[0])**2 + (a[1]-b[1])**2
68
69 def randpos(low,high) :
70     from random import randint
71     return (randint(low,high), randint(low,high))
72
```

This starts to get important as the number of nodes increases. At 20 nodes the potential links come to 190 (20*19/2). But if it's 1000 nodes we need 499500 potential links. Saving 499500 square root calculations is no longer so trivial.

This is an important point. How time and memory are consumed as the problem size grows is a very important part of algorithm design. In this case, the dominating factor is the time needed to compute the distances between all the vertices. It is roughly the square of the number of vertices.

One more point. How can we be sure that what we end up with really is a minimum tree. Although not a formal proof here is one way to get some intuition on this. Take the final solution as generated and try to improve it. Remove a link to make the one final group into two. Attempt to reconnect the two groups with a shorter link somewhere else. You can't because we carefully consider potential links from shortest to longest. If such a shorter link actually existed we would have used it instead.

And one last point. The graph that is generated is a tree - in the computer science sense where trees grow upside down. In your imagination pick up any vertex and raise it off the page. Pretend that the vertices are a bit sticky but are slowly pulled up by the edges connecting them, following in turn the vertex you chose. The vertex you chose is the root of the tree and, indeed, the tree is correctly upside down.

## Running the program

At the bottom of minspan.py main is called. sarg is used to get parameters from the command line or to supply defaults. A Minspan instance ms is created, nodes are created, links are made generating the graphic display.

```73 def main()  :
74     from pgcon import Pgcon
75     pgcon = Pgcon()
76     seed    = sarg.Int("seed", 0)
77     if seed : random.seed(seed)
78     else :
79         seed = random.randint(101,999)
80         print "Seed is", seed
81         random.seed(seed)
82     pixels  = sarg.Int("pixels",600)
83     nNodes  = sarg.Int("nodes",  20)
84     ms = Minspan(pgcon, pixels, nNodes)
87     pgcon.close()
88
89 if __name__ == "__main__" : main()
```

```\$python minspan.py nodes=15 pixels=300 seed=854  prefix=mns maxpics=50 gif=1
Just captured frame: 1, mns_001.png
Just captured frame: 2, mns_002.png
Just captured frame: 3, mns_003.png
.
.
Just captured frame: 27, mns_027.png
Just captured frame: 28, mns_028.png
Making uniform GIF
mns.gif has 28 frames
```

But simply "\$python minspan.py" will generate even more interesting and random displays. The arguments on the command line set the random number seed to get repeatable sequences, nodes, and pixels for size.

Arguments prefix, maxpics, gif control the pgcon and camera to generate images and an optional gif file. These work in exactly the same way as in the kmeans project and dijkstra project and are discussed in the writeup for the pglib project