I’ll say it – applying graph theory can be as frustrating as it is satisfying. From DFS/BFS to Dijkstra’s, Kruskal’s and Prim’s algorithms, and more advanced techniques like combining SegTrees/BITs with graphs? Suffice it to say, it gets rough.

Anywho, let’s get started!


Graphs, the lingo.

Okay, so to explain graph algorithms, we need to be on the same page and for that, if you already are familiar with graph terminologies, skip this section of the article and jump to DFS, else, read this and I’d also suggest watching MIT’s ~85 minute lecture on Graph Theory.

So! The vocabulary in the field of graph theory -

Node - A point, or a vertex.

Edge - A line connecting two nodes/vertices, representing connectivity or a path between the two nodes.

Directed Edge - An edge which signifies only one-way connectivity, i.e. if A->B, it is possible to move from A to B but not from B to A.

Undirected Edge - No restrictions on movement exist. It is a two-way street.

Cyclic Graph - A set of nodes connected in such a manner that it is possible to reach the starting node by traversing some (or all) edges just once.

Graph - A collection of vertices and edges.

Tree - A collection of vertices and edges, such that no cycle exists, i.e. An acyclic graph

DAG - Directed Acyclic Graph (A directed tree)

Root - Exclusive to trees, it is the topmost level of the tree. There can only be one root in any tree.

Child/Parent - A node X that can be reached by another node Y only, implies X is the child of Y and Y is the parent of X.

Ancestor - Similar to a parent, but across a larger distance, i.e. to get to X, if I have to go through Y and further levels and only then can I get to X, Y is an ancestor of X.


Storing a graph

Now, two ways to store a graph as information.

Adjacency Matrix - A 2-D array adj, where adj[i][j] signifies the status of the connectivity of nodes i and j. If adj[i][j] = 0, then i and j, are not connected, and if adj[i][j] = k, they are connected with an edge weight of k.

Adjacency List - Also, a 2-D array, where adj[i] stores a 1-D array of nodes i is connected to. i.e. if j is one of the nodes present in adj[i], that implies i and j are connected.




Depth First Search

Alright, so let’s get into DFS.

DFS is my first choice, when it comes to “Find a route from A to B”, or “Can I get from A to B via C without going through D?” type of questions, or any situations which can be molded into these types of questions.

The idea is simple – You check ALL possible routes.

That’s how DFS works. Check all paths, stop when you find a path satisfying your condition.

Now there’s a catch – If the graph contains a cycle, you’ll HAVE to use a hash array to check if a given node has been previously visited or not, to prevent an accidental infinite loop. If the graph is acyclic, you’ll need nothing of the kind.

Now, consider the following problem -

You’re driving a fancy car. This car has enough petrol to only last 60 km. Check if there’s a route from your home to a friend’s, and also check if the distance is within 60 km to see if you can drive over to their place, assume its a small city and you know which road leads to whose house.

So now, you’ll have to run a depth-first-search taking your house as the root, and in your recursive procedure, the base case would be if the node you’re at, is your friends home. Also, passing an extra variable as a parameter that keeps track of current distance travelled, will help you check if you can reach their home without needing a petrol refill.

The pseudocode would be as follows -

int dfs(int node, int destination, int distance):
   if node equals destination and distance is within 60:
     print "Friend is reachable"
     return
   else:
     for all neighbours of node:
        dfs(neighbour, destination, distance + edgeWeight[node][neighbour])

And that’s the basics of depth first search. The recursive calling stops if the base case has been reached OR if all nodes have been visited, whichever happens first.

Now, on to the next topic.


Breadth First Search

Similar to DFS, the difference being, in DFS, paths are traversed depth-wise, (choose one path, keep going till you reach the end) but in a BFS, there are no “paths”, there are areas. As in, at every step you process information equidistant from the start (root) node.

When thinking of BFS, I always picture something like the ripples of water, or a nuclear explosion. The circle of effect spreading? That’s the breadth first search. Here’s an example of what kind of problem a technique like this can solve -

Given an NxN grid consisting of either 0s,1s or 2s, the following nomenclature exists -

0 - an empty tile

1 - the location of a person

2 - the location of an alarm

Now, assume all people have their mobile phones and whenever the first person hears an alarm ring, they instantly alert everyone else through a text message.

The time a person at (x, y) takes to hear an alarm at (a, b) is equal to |x-a|+|y-b|

Find how long it takes for all people to be alerted if the building catches fire.

Now, this question serves as a nice example where BFS can be applied. The idea is to perform a BFS on all alarms and find the time to alert the closest person to each alarm. Find the global minimum of these times and you have your answer.

Which leads us to the next question, how would you implement this, and more generally, how would you implement BFS?

Well, I prefer the iterative implementation of BFS as I find it more intuitive to understand. In a BFS, we generally use a queue. The way it works is as follows -

So in the above question, you’d have to call BFS on each alarm, and upon finding a person, calculating the time taken to reach this person and changing the final answer accordingly.

That’d go as follows -

int bfs(int start):
    create an empty queue
    push in start
    while the queue is not empty:
        temp = queue.front()
        queue.pop_front()
        if temp is a person:
            current_time=time taken for alarm to reach temp
            minimum_time=min(minimum_time, current_time)
            break
        for all neighbours in temp:
            queue.push_back(neighbour)
    return minimum_time

Calling the above method over every alarm, will provide you with the minimum time each alarm takes to alert the closest person. Find the minimum of these and voila!



And that’s BFS for you!


Dijsktra's Algorithm



Dijkstra’s Algorithm, is used to determine the shortest path from a given source to a destination. It’s complexity is O(V²) [and can be optimized to O(E * logV) ]and the crux of it is -

What’s the least distance I must travel to reach node X through all neighbours of X, if I know the least distance I must travel to visit X’s neighbours.

Logically, to get to X from another node (say ‘Y’), REQUIRES you to travel through atleast one of X’s neighbours. So if you could calculate the distance you must travel to reach all its neighbours, you can then make an educated decision on which route is the shortest of all routes. It’s similar to dynamic programming, as you’re solving subproblems, i.e. solving the distance from current node to transit node, and then transit node to destination node.

Obviously, to store the results of all these subproblems, you’ll need to create an array dist of size V, to where dist[i] stores the distance from the current node (for whom the array was made) to the node i. Therefore by the end of all the iterations, the array dist, stores the distances from the current node to every other node.

So effectively, you iterate over every node, and for each node you re-iterate over every node to see if there’s a new shorter possible route to your outer node through your inner node.

The pseudo code flows as -

dist[]={inf, inf, inf, inf, ...}
for all nodes (i):   
   for all nodes (j): 
     if i and j are connected: 
        dist[j]=min(dist[j], dist[i]+len[i][j]

Floyd Warshall's Algorithm

The ASSP (all source single point) algorithm that runs in O(V^3) and calculates the shortest distance between all pairs of points.

I prefer to understand it instead as just running dijkstra’s over all vertices, and in fact the original algorithm can be optimized by using an adjacency list instead of a matrix.

The pseudo code flows as follows -


    ans[][]=[[inf, inf, inf, ...], [inf, inf, inf, ...], ... ]

    for all nodes (k): //intermediate node
        for all nodes (i): //start node
            for all nodes (j): //end node
                ans[i][j]=min(ans[i][j],ans[i][k]+ans[k][j])
            //NOTE: LITERALLY removing the '[i]' from the prev line gives Dijkstra's
            //That is how similar they are.

I’ll be frank, I’ve never needed to use Floyd Warshall’s algorithm so far simply because having Dijkstra’s algorithm in my arsenal was good enough.


Minimum Spanning Trees

Firstly, a spanning tree is a tree that connects every node in a graph. A minimum spanning tree, is a spanning tree whose sum of edges is the least, across all other spanning trees.

Now, there are two famous MST algorithms. Prim’s algorithm, and Kruskal’s algorithm. I personally prefer Kruskal’s algorithm since I find it relatively easiest to understand, prove and implement, not to mention both algorithms have the same average time complexity O(ElogV) so the difference further reduces.

Kruskal's Algorithm

Pre-requisite: Disjoint set union data structure 

Kruskal’s algorithm works by sorting edges by their values in ascending order, and picking the vertices accordingly as part of the MST. This makes sense intuitively as the smallest available global edge at any point of time must be a part of the MST, as the same tree would be the MST whether that edge existed or not. i.e., if you’re going to connect a node to an MST, it’s optimal to connect it with the smallest edge attached to it.

This is the proof of why Kruskal’s algorithm can find the MST –

Assume you have connected a node n to your MST with an edge whose value is greater, than the least value of the edge connected to n. Now let the length of the MST (sum of all edges of this tree) equal k. Now, must be the minimum value of the lengths of all spanning trees (for it to be a MINIMUM spanning tree), but now if you instead considered the edge connected to which had the least value, it is obvious that the new tree’s length is lesser than the previous one, implying that we must always pick the smallest edge connected to a node. Thus, Kruskal’s algorithm is proved.

The pseudo-code for this also requires the implementation of the disjoint set data structure and will therefore be uploaded on my GitHub profile, which I’ll link to here shortly.




So yeah! That’s the start of graph theory! The next article on graph theory would be mixing it with segTrees/BITs, and to prepare yourself for that, consider reading up on SegTrees. You can do that here! - Traversing Segment Trees.




Phew! Long article. Almost 2,000 words. K. I’m gonna sleep. Ciao.

Aditya