Alice has two friends, Bob and Charlie. Now, the three of them decide to meet up at a coffee shop.

Bob, now runs into a friend of his, Dexter.

And Charlie runs into a classmate of his, Esther.

Bob and Charlie invite their friends over to the same table. The size of the crowd goes from 3 to 5 and Alice greets her extended friends Dexter and Esther.

As the day progresses, Dexter sees a couple of his colleagues at another table and switches tables to join their table.

Can you think of an algorithm to store this data optimally, one which can tell you if two given people share the same table, are friends, extended friends or strangers to one another?

Cue Disjoint Sets.

In more formal terms, Disjoint sets (aka Union-Find) is a data structure that allows you to connect components and form larger components allowing you to efficiently handle queries asking “who is connected to who”.

Even better, is that the most efficient form of their implementation which I use (which is what we’ll be discussing here) is extremely small (~2 lines)

So! Having said that, let’s dive in!

This data structure is implemented with an auxiliary array that keeps track of a group’s leader. I’ll explain what this means in the following paragraphs.

Now, before we go further, let’s understand how we’d solve this without disjoint sets.

In the above example, the friends would represent the nodes of a graph and edges between nodes would indicate that those nodes (friends) were sharing a table. Queries could be handled by a simple breadth-first-search then.

But let’s try a different approach. Consider this example, where we introduce a formal term.

Let each table have a “leader”, a person whom only everybody on that table “admires’.

Now if two people have the same leader, it would imply they share the same table.

This is the concept of Disjoint Sets.

An array parent or leader defines the parent for every element, as is done in an hash map.

So a disjoint set is not a unique data structure but a way of thinking.

Its implementation involves only two methods -

join( )

find( )

These two suffice in performing standard operations on Disjoint sets.




join( )

This function is used to combine two disjoint sets. i.e. to unite them. Now, to maintain consistency, a certain heuristic (I’ll explain what this means in a moment) is followed to decide when joining two sets, which set is joined to which – i.e. who stays the overall leader and which leader loses his leadership status.

This heuristic depends is circumstantial. Generally, in terms of optimization, it’s faster to consider the group which is smaller, and strip that leader (let’s call him B) of his title and update his leader to the larger group’s leader (let’s call him A).

Now the question arises - what about B’s followers? Don’t they still believe B is their leader? Their parent array indices still state that their leader is B, right?

Well, this isn’t a problem as you’ll soon see in our find( ) function’s implementation.

But for the join function’s implementation, it’s as follows -

int join(int x, int y) {
	if(find(x)>find(y)) 
		swap(x, y);
	par[ find(y) ] = find(x);
}

The crux to gather from here is - joining two groups is performed by assigning the parent of one leader to the other leader (who is assigned to whom is circumstantial and depends on the question - I’ll link an example down below, a question that I solved recently. Made me feel good about this concept lol.)

That’s the essence of the join function.


Find( )

Now let’s get on to the find( ) part of union-find algorithms.

This implementation, allows you to find the leader of the group in O(1) on average which is pretty handy because you need to use the find( ) function even to unite two groups.

Personally, I know the drawbacks of the following implementation (it’s recursive) but I love the simplicity of it. It’s so beautiful to look at.

Okay, sorry, drifting away. Back.

Here is my implementation in C++.

int find(int x) {
	return par[ x ] = (par[ x ] == x)? x : find(par[ x ]);
}

So, let’s work through this. What’s happening here, is as follows.

My base case, is when the parent array and its index are the same. This is when I’m looking at a leader, or a person who has no leader.

Therefore he is his own leader. And that’s how I know I’ve either reached the top of a group or just one lonely dude.

Now, what if the parent array indicates the parent of my current index is not itself?

That’s my general case. This means he HAS to be a follower of somebody. How do I find who he’s following? Easy – parent[current_entity] literally points me to the current entity’s parent.

So, now that I know who’s the current person’s immediate leader, can I say I know this leader is the group’s leader? No. Well, at least not with my implementation of my join function.

Why? Because if a group merges with another group, the leader who became a follower of the main leader, still has his followers thinking that the overall leader is the same while a change was made.

So, the recursive nature of the implementation is necessary to identify the overall group’s leader.

Hence the recursion, hence the implementation as such.

I suppose that’s it? And here’s the question I was talking of -

http://codeforces.com/problemset/problem/893/C

And here’s my solution - http://codeforces.com/contest/893/submission/32615359

Well, that’s an overview of Disjoint sets, I really like this because usually it’s a very convenient substitute for standard graph algorithms like DFS or BFS and it never gives me issues with its efficiency.

So yeah, that’s that. Hope it helped.

Aditya