16 December, 2011

In this article I give an informal definition of a graph and of the minimum spanning tree. Afterwards I describe Prim’s algorithm and then follow its execution on an example. Finally, the code in C is provided.

Graphs

Wikipedia gives one of the common definitions of a graph:
In computer science, a graph is a kind of data structure, specifically an abstract data type (ADT), that consists of a set of nodes and a set of edges that establish relationships (connections) between the nodes. The graph ADT follows directly from the graph concept from mathematics.
Informally, G=(V,E) consists of vertices, the elements of V, which are connected by edges, the elements of E. Formally, a graph, G, is defined as an ordered pair, G=(V,E), where V is a finite set and E is a set consisting of two element subsets of V.
This is a graph:
p1.png
It’s a set of nodes (A, B, C, D and E) and the edges (lines) that interconnect them.
An important thing to note about this graph is that the edges are bidirectional, i.e. if A is connected to B, then B is connected to A. This makes it an undirected graph.
A common extension is to attribute weights to the edges. This is what I’ve done with the previous graph:
p2

Minimum spanning trees

Basically a minimum spanning tree is a subset of the edges of the graph, so that there’s a path form any node to any other node and that the sum of the weights of the edges is minimum.
Here’s the minimum spanning tree of the example:
g3.png
Look at the above image closely. It contains all of the initial nodes and some of the initial edges. Actually it contains exactly n – 1 edges, where n is the number of nodes. It’s called a tree because there are no cycles.
You can think of the graph as a map, with the nodes being cities, the edges passable terrain, and the weights the distance between the cities.
It’s worth mentioning that a graph can have several minimum spanning trees. Think of the above example, but replace all the weight with 1. The resulting graph will have 6 minimum spanning trees.
Given a graph, find one of its minimum spanning trees.

Prim’s Algorithm

One of the classic algorithms for this problem is that found by Robert C. Prim. It’s a greedy style algorithm and it’s guaranteed to produce a correct result.
In the following discussion, let the distance from each node not in the tree to the tree be the edge of minimal weight between that node and some node in the tree. If there is no such edge, assume the distance is infinity (this shouldn’t happen).
The algorithm (greedily) builds the minimal spanning tree by iteratively adding nodes into a working tree:
  1. Start with a tree which contains only one node.
  2. Identify a node (outside the tree) which is closest to the tree and add the minimum weight edge from that node to some node in the tree and incorporate the additional node as a part of the tree.
  3. If there are less then n – 1 edges in the tree, go to 2
For the example graph, here’s how it would run:
Start with only node A in the tree.
g4.png
Find the closest node to the tree, and add it.
g5.png
Repeat until there are n – 1 edges in the tree.
g6.png
g7.png
g8.png

The Programme

The following programme just follows the algorithm. It runs in O(n2) time.
Here’s the code in C (prim.c):
#include <stdio.h>

/*
 The input file (weight.txt) look something like this
  4
  0 0 0 21
  0 0 8 17
  0 8 0 16
  21 17 16 0

 The first line contains n, the number of nodes.
 Next is an nxn matrix containg the distances between the nodes
 NOTE: The distance between a node and itself should be 0
*/

int n; /* The number of nodes in the graph */

int weight[100][100]; /* weight[i][j] is the distance between node i and node j;
   if there is no path between i and j, weight[i][j] should
   be 0 */

char inTree[100]; /* inTree[i] is 1 if the node i is already in the minimum
   spanning tree; 0 otherwise*/

int d[100]; /* d[i] is the distance between node i and the minimum spanning
  tree; this is initially infinity (100000); if i is already in
  the tree, then d[i] is undefined;
  this is just a temporary variable. It's not necessary but speeds
  up execution considerably (by a factor of n) */

int whoTo[100]; /* whoTo[i] holds the index of the node i would have to be
   linked to in order to get a distance of d[i] */

/* updateDistances(int target)
 should be called immediately after target is added to the tree;
 updates d so that the values are correct (goes through target's
 neighbours making sure that the distances between them and the tree
 are indeed minimum)
*/
void updateDistances(int target) {
 int i;
 for (i = 0; i < n; ++i)
  if ((weight[target][i] != 0) && (d[i] > weight[target][i])) {
   d[i] = weight[target][i];
   whoTo[i] = target;
  }
}

int main(int argc, char *argv[]) {
 FILE *f = fopen("dist.txt", "r");
 fscanf(f, "%d", &n);
 int i, j;
 for (i = 0; i < n; ++i)
  for (j = 0; j < n; ++j)
   fscanf(f, "%d", &weight[i][j]);
 fclose(f);

 /* Initialise d with infinity */
 for (i = 0; i < n; ++i)
  d[i] = 100000;

 /* Mark all nodes as NOT beeing in the minimum spanning tree */
 for (i = 0; i < n; ++i)
  inTree[i] = 0;

 /* Add the first node to the tree */
 printf("Adding node %c\n", 0 + 'A');
 inTree[0] = 1;
 updateDistances(0);

 int total = 0;
 int treeSize;
 for (treeSize = 1; treeSize < n; ++treeSize) {
  /* Find the node with the smallest distance to the tree */
  int min = -1;
  for (i = 0; i < n; ++i)
   if (!inTree[i])
    if ((min == -1) || (d[min] > d[i]))
     min = i;

  /* And add it */
  printf("Adding edge %c-%c\n", whoTo[min] + 'A', min + 'A');
  inTree[min] = 1;
  total += d[min];

  updateDistances(min);
 }

 printf("Total distance: %d\n", total);

 return 0;
}

And here’s a sample input file (dist.txt). It’s the example graph:
5
0 10 0 5 0
10 0 5 5 10
0 5 0 0 0
5 5 0 0 20
0 10 0 20 0

The code’s commented and there shouldn’t be any problems.
Good luck. Always open to comments.

No comments:

Post a Comment