About Me

My photo
I'm a colonist who has declared war on machines and intend to conquer them some day. You'll often find me deep in the trenches fighting off bugs and ugly defects in code. When I'm not tappity-tapping at my WMD (also, known as keyboard), you'll find me chatting with friends, reading comics or playing a PC game.

Saturday, March 8, 2008

An introduction to the STL : Maps

(Reader Level : Beginner)
(Knowledge assumptions : std::string, std::vector)

Hello! Today we're going to discuss another popular data structure in the STL - Maps. Maps are basically associative arrays or hash tables where each element consists of a unique key that identifies a particular value. Once a key is stored in the map, then it cannot be changed. Only the value it identifies can be changed. If we want to associate a new key with a value, then the entire element(key + value) must be removed from the map and the new element inserted. To actually see the beauty of maps, let us attempt to solve a simple problem.

The Problem: Write a C++ program to find the character with most number of occurrences in a string.

Solution attempt #1 : Using Vectors
#include <iostream>
#include <string>
#include <vector>

const char VectorMethodOfCounting(const std::string& str)
{
std::vector<unsigned int> vCharCount(26);
char mostUsedChar = '\0';
unsigned int count = 0, currentIndex;

for(std::string::size_type i = 0; i < str.length(); ++i)
{
//Calculate the index of this character in the vector.
currentIndex = str[i] - 'a';
++vCharCount[currentIndex];

if(vCharCount[currentIndex] > count)
{
mostUsedChar = str[i];
count = vCharCount[currentIndex];
}
}

return mostUsedChar;
}

int main()
{
std::string myString = "thebrowncowjumpedoverthemoon";
std::cout<<VectorMethodOfCounting(myString);

return 0;
}
Basically, what we've done here is create a vector for all twenty-six characters in the english alphabet. This vector will maintain a count of the occurrences of each alphabet in our string. Then, we loop through the characters of the string. As we encounter, each character in the string, we determine its index in the vector by subtracting its ASCII value from that of 'a'. We also maintain a record of the character with the most occurrences till date. When we finish iterating through the string, we return this character.
This solution, of course, works but it has some pretty serious drawbacks. The first shortcoming is that it only works with lowercase strings. If the string were to contain uppercase characters or other special characters like spaces, then the solution fails. We could resize the vector to account for all 256 ASCII characters (or maybe all UNICODE characters) but then that would be wasted space if only a handful of characters were actually present in the string. Also, as we will soon see, this problem could have been solved more elegantly. I especially hate that part where we determined the index of the character in the vector.

Now, on to maps. If we analyse the problem more closely, we will come to the conclusion that what we really want to do is maintain a count of the number of occurrences for each character in the string. In other words, each unique character in the string is associated with a count. This concept could best be represented with a map where each element consists of a character as the key and an integer as the value. Let's try doing that.

Solution attempt #2 : Using Maps
#include <iostream>
#include <string>
#include <map>

const char MapMethodOfCounting(const std::string& str)
{
std::map<char, unsigned int> mapCharCount;
char mostUsedChar = '\0';
unsigned int count = 0;

for(std::string::size_type i = 0; i < str.length(); ++i)
{
++mapCharCount[str[i]];

if(mapCharCount[str[i]] > count)
{
mostUsedChar = str[i];
count = mapCharCount[str[i]];
}
}

return mostUsedChar;
}

int main()
{
std::string myString = "thebrowncowjumpedoverthemoon";
std::cout<<MapMethodOfCounting(myString);

return 0;
}
A little explanation is in order. In order to use maps, we need to include the pre-compiled header "map". The next thing we do is create a map named mapCharCount. The first char in the syntax is the data type of the keys and the unsigned int is the data type of the values. Everything else is pretty much the same with the exception of the indexing. Unlike our vector method, indexing into the map based on the key is easy. Our key is the character str[i], so we retrieve the value (the count) by mapCharCount[str[i]]. Pure elegance!

There's more to maps than meets the eye. For more information on maps, feel free to read up on the following links:

SGI Documentation on Maps
Member functions of STL Map

In my next post, I'll talk about another STL container - The Priority Queue.

No comments: