Sunday, July 31, 2011

Dijkstra's Algorithm on Scooters: Part 4 of Ann's Visit to the Cit of G'Raph

Dijkstra's algorithm finds the shortest path from a given starting node to all of the other nodes in the graph. It requires that the weights of all edges are non-negative. It operates by maintaining a set of "visited" nodes and continually updating the tentative distance to all of the unvisited nodes. At each iteration, the closest unvisited node is added to the visited set and the distances to its unvisited neighbors are updated.



"We start by drawing up a list of all the islands in the entire city G'Raph." began Edgar. "We say that they are all 'unvisited' and have a tentative distance of infinity. Except the starting island itself; we give that a distance of zero. You can always get from here to here without moving."

"But distances of infinity are obviously not correct." interrupted Ann. She had walked from the inn to the library this morning, and it had not been an infinite distance.

"Of course it isn't correct." agreed Florence. "Dijkstra's algorithm keeps finding SHORTER paths to each node. So the distance will keep going down as we update our list. But since we have not found any paths at this point, we use the largest distance that we can. That way, any path we find will be better."

That explanation sounded reasonable to Ann. However, she did not quite see where this was going.

"Then the fun begins." continued Edgar. "We keep visiting islands until we have visited them all! This is the part where we get to drive around on scooters."

He took a deep breath before launching into the explanation.
"Each time we finish visiting a new island, we follow the same procedure:
1) We find the unvisited island that has the closest distance to our starting point.
2) We go to the island on our scooters. Usually, Florence and I like to race. That adds some more excitement.
3) We find all of the islands connected to the one we are on -- all of its neighbors -- and update their tentative distances. For each unvisited neighbor, we compute how far it is from the starting island IF we go through the current island. That is: the distance to the current island plus the distance to its neighbor. If this new distance is shorter, then that becomes the island's new tentative distance.
4) We mark the current island as visited, knowing that we now have the shortest path to that island.
Then we just repeat the procedure for the next closest unvisited island." described Edgar.

"And at the end, all of the islands have been visited, and we have a list of the shortest distances to each one." added Florence.

Ann looked at them blankly. "But I don't understand…" she started.

"An example will help." exclaimed Edgar. "Let's say that we start at the library. That becomes our current node -- with distance zero. Everything else starts with an infinite distance."



"Then we look at the neighbors and update their tentative distances." he continued. "In this case the inn is at most 5 meters away and city hall is at most 18 meters away."


"Now we are done with the library's node. So we mark it as 'visited' and move to the next closest island -- the inn." he continued. "In this case it is a quick scooter drive and not much of a race. But it will get more exciting."



"Since the inn is the closest unvisited island, we know we have found the shortest path to it. So we can update both its distance and its path." he continued.

"Yay!" added Florence quietly from the side.

Edgar gave her a strange look. "Anyway... We do the same thing here. Updating the neighbor's distances. The inn has three neighboring islands to consider: the library (which is already visited), the dry cleaners, and the brewery."

"Then we again move onto the closest unvisited node. This time it is the brewery." Edgar explained. "We repeat the process there, updating the neighbors."


"Then onto city hall. Here is where things get very interesting. It is a good race from brewery to city hall." Edgar made driving motions with his hands as he explained this. "Also, on the algorithmic side, the distance from the library to the dry cleaners is shorter if we go past the inn instead past city hall. So we do NOT update the distance to the dry cleaners in this case, because we have already found a shorter path."


"Then onto the dry cleaners. And so forth." he concluded with a wave of his hand.



"Yes." agreed Ann. "I understand that part. You are expanding the set of visited nodes, and you are maintaining tentative distances to nodes along the unvisited frontier. Each time you find a shorter path to a node, you update its distance. But what I do not understand is…"

"Exactly! You are quick." proclaimed Edgar. "I bet that you are confused by the question of whether there could be a shorter path through other unvisited notes."

"No. I am not." declared Ann. "There cannot be a shorter path, because you are taking the CLOSEST unvisited node. If there was a shorter path, then you would have to go through another node. But we already know that that other node is further away, because it is not the CLOSEST. It would be like saying that it is 200 miles to Athens and 100 miles to Atlantis, but the shortest path to Atlantis is through Athens. It would not make any sense. What I am confused about is…"

"Why we add the distances?" ventured Edgar. "Well, that is simple. The distance of going from A to B through node C is the distance from A to C plus the distance from B to C."

By now Ann was aggravated at the interruptions. "No! I have walked before. I know that when you walk over two bridges the total distances is the sum of the bridges."

Ann continued before she could be interrupted again. "I understand the algorithm. It is all very clear. What I want to know is: Why do you have to use scooters? You already have all the distances between islands written on this map. You could do all of this without traveling to the islands physically."

Now Edgar looked at Ann blankly. "Why would we do that?"

Ann sighed. "Because it is a lot faster to just mark nodes on a map as 'visited' instead of actually visiting them. You could do this entire algorithm on paper without leaving the library."

Edgar looked at Florence for help. "But, scooters are the best part. What fun would it be without scooters?"

Florence agreed. "That is why we call them 'visited' nodes, because we get to visit them. That is the whole point."

"All I am saying is that you do not need to physically drive there if you already have a representation of the graph." Ann noted.

"I don't think you really understand the concepts yet." ventured Edgar. "Let me start again. Dijkstra's algorithm finds the shortest distance from any starting node to all other nodes in the graph…"

Ann sighed and sat quietly in the corner. As Edgar and Florence re-explained how to add unvisited neighbors to the visited list, Ann started to daydream about racing through G'Raph on a scooter. There was something odd appealing about that approach.

To be continued in Part 5: A Disagreement over Data Structures...


Follow Ann's visit to G'Raph from the beginning with Part 1: The City of G'Raph.


--------------


Also, check out the new illustrations added to the following stories:


Friday, July 29, 2011

Bridge Weights: Part 3 of Ann's Visit to G'Raph

Graphs can have either weighted or unweighted edges. In graphs with weighted edges, each edge is associated with a real-valued weight. These weights can have different interpretations, depending on what the graph represents. For example, the weights might indicate the distance between two nodes in a transportation graph.

Two of G'Raph's scholars, Edgar and Florence, met Ann at the library. They were both visibly excited. They rarely had opportunities to show off their latest algorithms. They babbled about path routing and clustering. Their excitement was contagious.

After settling into a small, over-crowded office in the basement of the library, the scholars immediately pulled out a large stack of maps. Each map showed the same set of islands -- the city of G'Raph. However, each map contained different labels next to the bridges.

"What are all these labels?" asked Ann.



"It depends on map," answered Florence. "This one is labeled with the distance between the islands. We use it for path planning."

"I understand labeling the bridges with their lengths. That would allow you to compute the shortest path between two islands, right? How else would you label an edge?" asked Ann.

"Distance is only one concern," said Edgar. He had been sitting quietly in the corner, drinking his ninth cup of coffee. "There are hundreds of other things you might want to consider. For example, this map is labeled with the smelliness of each bridge. We use it for planning tours. Some of the bridges cross particularly nasty bits of swamp, you know. We want to avoid those bridges for tours, so we give them a high cost."

"This one is labeled with the estimated length of time needed to cross the bridge," interrupted Florence. She hoped to prevent Edgar from launching into a coffee fueled lecture of unpleasantness of living in a swamp. "We use it to compute different evacuation scenarios."

"Evacuation?" asked Ann in surprise. "Why would you need to evacuate?"

"There was some trouble a few years ago with a dragon," answered Florence. "We did not account for the time that it actually took to cross an old bridge. It totally gummed up the evacuation. Ten people got stuck at the dry cleaners after the dragon burned out a bridge. They were there for a week."

"They did not stop complaining for three months after that," added Edgar.

Ann nodded. The prospect of being stuck at the dry cleaners for a week seemed rather unpleasant. "Which map do you use most?" she asked.

"This one, of course," answered Florence and Edgar at the same time. They both pointed to a map labeled with the distances between islands.

"You use it for path planning? How do you do that?" prompted Ann. She was eager to learn more about the graph algorithms.

"Dijkstra's algorithm!" they answered at the same time.

"What is that?" asked Ann. Her knowledge of graph algorithms was admittedly not what it should be.

"Oh you will love it," proclaimed Edgar. "It involves scooters."



To be continued in Part 4: Dijkstra's Algorithm on Scooters...


--------------


Follow Ann's visit to graph from the beginning with Part 1: The City of G'Raph.

Tuesday, July 26, 2011

Directed Graphs, Bridges, and the Mayor’s Office: Part 2 of Ann's Visit to G'Raph

The edges in a graph can either be directed or undirected. Directed edges indicate one way relationships, such as: There is a link from Node A to Node B. Undirected edges indicate bidirectional relationships, such as: Node A and Node B are linked. The distinction between undirected and directed edges is important to both the structure of the graph and how algorithms operate on the graph.

As Ann waited in the mayor's office, she studied the map of G'Raph that hung on the wall. The map itself was huge, taking up the entire south wall of the room. There was at least a thousand dirt islands, represented as tiny circles, and several thousand bridges, represented as lines. Each bridge was clearly labeled with some number, which Ann assumed was the distance between islands. At seeing the full extent of the city mapped out, Ann again wondered why anyone would build a city in such an inhospitable swamp.

"Ann!" greeted the mayor in a loud voice. "You have grown so much since I last saw you. It must have been at least ten years. How have you been?"

"I have been splendid. Thank you for asking. And yourself?" responded Ann. She had absolutely no recollection of ever meeting the mayor. Then again, given how young she was ten years ago, this was not a surprise.

Still, Ann was convinced that she should remember this particular man. His bushy gray muttonchops alone would be impossible to forget. They easily covered half of his face.

"I am wonderful," answered the mayor. "Life in G'Raph is most wonderful. How are you finding our humble town so far?"

"I have met the most charming and helpful people. However, I do have to admit that your islands pose a bit of a navigational challenge," answered Ann honestly.

The mayor laughed. He turned to look at the map on the wall. "Yes. That is a common feeling. Luckily, our best scholars have developed some wonderful algorithms. Once a person masters those, life becomes a lot easier."

"I suppose," replied Ann. She was thoroughly unconvinced.

"You know... It was not always this simple either," continued the mayor in a hushed tone. "My predecessor made such a wonderful mess of our bridge system. He actually made each bridge one way."

"Really?" Ann gasped. She could see that it would take a long time to navigate around the city using the bidirectional bridges. The thought of making all the paths one way was absurd.

"Yes," answered the mayor. "He thought it would streamline movement. Admittedly, it did make the traffic flow better over each bridge. But it also meant that you had to account for the direction of every bridge when planning your path. Worse though, since each bridge only went one direction, some paths became much too long. He even started building second bridges between some high traffic islands -- one bridge from here to there and another from there to here. Perhaps you wondered why there are two parallel bridges between the south market and the top hat shoppe?"

"I did not come that way, so I did not have an opportunity to wonder," Ann explained.

"Well, that is why. Before the second bridge, people had to take nine bridges to get to from south market to the top hat shoppe. It was horrendous!"

"The second bridge helped?" asked Ann.

"It did. However, building extra bridges is just too cumbersome. Unlike adding edges on this map, building a bridge takes real effort. Adding new edges in the physical world is not as simple as drawing a new line. Eventually, we returned to using two way bridges again."

Ann nodded in agreement. From what she had seen, each bridge probably took days to build. She could not imagine it being worth doubling that effort.

Finally, Ann decided to get to the real purpose of her visit. "Sir. My father has always spoken highly of the theorists of G'Raph. I am sure that you are aware of my question and the vaguely defined danger that the kingdom faces. Would it be possible for me to stay here and learn from your scholars?"

The mayor's face lit up in a wide smile. "Of course!" he proclaimed. "You are welcome to stay as long as you want. I assure you that you shall find our scholar's work most enlightening. I shall take you to the G'Raph library first thing tomorrow."

After thanking the mayor, Ann began her trek back to the inn. As she crossed the bridge from the dry cleaner to the inn, she was grateful that the bridge was undirected. She was too tired to find her way around G'Raph through an entirely different set of bridges.


To be continued in Bridge Weights: Part 3 of Ann's Visit to G'Raph...


--------------


Follow Ann's visit to graph from the beginning with Part 1: The City of G'Raph.

Sunday, July 24, 2011

The City of G’Raph: Part 1 in Ann’s Visit to G’Raph

A graph is defined by a set of nodes and edges. The edges define a relationship between two nodes, linking those nodes. In unweighted graphs edges are binary -- there is either an edge between two nodes or there is not. In weighted graphs, the edges have real-valued weights indicating the relationship or proximity between two nodes. Graphs can be used to represent a large number of real world systems, including: social networks, computer networks, and transportation systems.

Ann arrived at the city of G'Raph shortly before nightfall. Any hope she had vanished as as she surveyed the city. It looked more like a massive swamp than a proper city. Tiny dirt islands dotted the landscape, poking out of the mud like warts. On each dirt island stood a small structure -- usually a house or business. The islands were connected by what appeared to be an inconceivably large number of rope bridges.

Despite the large sign stating "Welcome to G'Raph: Land of the Bridges", Ann was certain that she was in the wrong place. This could not be the same G'Raph of which her father had spoken so highly. To hear him tell it, some of the best minds in the world had built G'Raph into an intellectual powerhouse. Could she really find any answers here?

Ann took a deep breath, immediately regretting that decision as the swamp air filled her lungs. She coughed a few times, shook her head in dismay, and walked toward the city. Her first stop was G'Raph city hall. Hopefully, the mayor could introduce her to someone to help with her quest.

Unfortunately, Ann quickly discovered that she could not simply walk toward city hall. It almost seemed as though the town was designed to be confusing. Each little dirt island had a only few bridges connecting it to neighbors. This arrangement required her to stop and think about the path that she was taking instead of just the direction that she was going.

Finally, after crossing thirty bridges, Ann stopped at a small farm to ask directions. The farmer, who had been harvesting his radish crop from the swamp mud, seemed eager to help.

"Excuse me, sir. Can you point me in the direction of city hall?" asked Ann.

"The direction? I guess it is that way. But you need to take that bridge over there." The farmer’s arms extended in different directions.

"Are you sure?" asked Ann. In her experience, the fastest way to city hall was to walk toward it.

"You're not from around here, are you?" asked the farmer.

Ann shook her head.

"It's the bridges," explained the farmer. "In order to get anywhere in G'Raph, you have to understand how the different islands are connected. Each bridge forms a link between two island nodes. So in order to get from node A to node B, you need to take the correct sequence of bridges."

"I go that way?" Ann asked tentatively.

"You need to go: from here to McFane's farm, to the brewery, to the inn, to the 24 hour dry cleaners, and then to city hall. That is 5 bridges." The farmer counted the steps on his fingers.

Ann stared back at him.

"Let me write it down for you," the farmer offered. He pulled out a scrap of paper and sketched a small map:


North West corner of G'Raph (not to scale)


"There you go," he said, offering the paper to Ann. "Each line is a bridge. Just follow the bridges from island to island, and you will be fine."

"Thank you," Ann said. Her head spun wildly. How did anyone find their way around here?

Little did Ann know, the very structure of G'Raph had pushed it's population to become a city of algorithmic thinkers. It was here in the city of G’Raph that Ann would find the first real clue to her quest.


To be continued in Directed Graphs, Bridges, and the Mayor’s Office...

Thursday, July 21, 2011

Ushers, Peanut Vendors, and Matrix Indices

A 2 dimensional matrix is similar to a 2-dimensional array. An M by N matrix has M rows, each of which has N columns. Thus the individual elements can be indexed directly by a row number and a column number.


In some programming languages, such as C and C++, matrices can easily be implemented as an array of arrays. For example, consider an M by N matrix X. In this case, X is a length M array of pointers. Each pointer allows you to access a different row -- a length N array. You can access the pointer to the i-th row using the notation X[i], and you can access the j-th column of the i-th row using the notation X[i][j].


Henry came from a long line of soccer stadium peanut vendors. As such, Henry was introduced to the concepts of matrices and matrix indices at a very early age. By the time he was six years old, Henry could accurately throw a bag of peanuts to any matrix location with his eyes closed. By the time he was seven, he could do that in high winds.


The reason behind Henry's proficiency in matrices was the layout of the of the East Alexandria soccer stadium, and specifically the A seating section. Henry's family had worked the A seating section for three generations. Every family member could tell you the smallest details of the section, such as which of the chairs were squeaky or where the best view was.


The A seating section had a remarkable resemblance of a rectangular matrix. It consisted of 25 rows of seats, and each row consisted of exactly 30 seats. As such, the seats were indexed by two numbers: a row number and column number. These were printed in bold letters on the ticket.



Henry's job at the stadium had two parts. First, before the game, he served as an usher and helped fans find their seats. Second, during the game, he served as a peanut vendor and made sure that every fan stayed sufficiently supplied with peanuts. Henry took his job quite seriously, and he had become very good at it.


As an usher, Henry excelled in randomly accessing matrix elements in order to guide people to their seats. "Row 3, Column 7? Right this way, sir!" Henry would start at the lower corner of the section and walk the patrons up to their seats. In this case, he first walked up to row 3 and then over to column 7.


As a peanut vendor, Henry became adept at traversing the matrix of seats while looking for hungry soccer fans. He would iterate over the entire matrix by walking past each seat in a row before moving down to the next row. Years of experience had taught Henry that for this particular stadium design it was much easier to move between seats in a row, rather than moving between rows in a column. Climbing over the chairs between rows turned out to be surprisingly difficult. Knowing whether it was easier to move between columns in a row or rows in a column was a very important consideration for a peanut vendor.


Henry became famous later in life when he published his domain expertise in a 5 volume series entitled "On Selling Peanuts and Traversing Stadium Seats". Despite becoming a canonical reference in the field, this work was best known for its bold introduction of a portable, steam-powered, peanut thrower. Henry was able to use the proceeds from these books to retire early and buy his own season tickets to the East Alexandria soccer team (row 1, column 9)!

Monday, July 18, 2011

Programming Languages, Compilers, and Pigeons

Machine language is a set of instructions that specifies the lowest level of computer operations, such as moving data around or adding two numbers. Luckily, modern programming languages hide this low-level complexity from programmers, allowing the programmers to work with a higher level language. A compiler is a program that translates the high level language into machine language.

Ann had always wondered about the intelligence of pigeons. They appeared nearly brainless as they paced her window sill, bobbing their heads. She could never believe that they were smart enough to comprise the core of the kingdom’s pigeon carrier communication network. However, it was not until the summer after her tenth birthday that Ann learned the frustration of working with these birds.

The true difficulty of working with pigeons consisted of trying to give orders. The pigeons only understood a simple set of ultra-low level instructions, called the Binary Instruction and Reference Directives Set (or BIRDS for short). This instruction set contained such commands as “Fly X meters” and “Turn Y degrees”. It completely lacked any high level concepts at all.

To make matters worse, a pigeon could only keep two values in its head at the same time -- one in each side of its brain. This led to complex sequences of instructions, such as: “Store the value 10 in X. Fly X meters.” Every value had to be moved into either the left half of the brain (X) or the right half (Y) before it could be used. Not only did you have to work with these low-level instruction, but you also had to worry about where to put the values for those instructions.

As a result, an instruction such as “Fly to Perth” became a series of 113 instructions measured in meters and degrees. Half of these instructions simply moved data into one of the available slots. A mistake in any one of the instructions could send the pigeon to the wrong city.

In order to form an efficient pigeon communications network, the kingdom's head pigeon master employed hundreds of skilled compilers. These compilers would work day and night translating scrawled instructions from the king's servants into precise BIRDS code. It was a job that took years of training and an obsessive attention to detail.

Thus, it greatly shocked both Ann and the castle’s pigeon master when King Fredrick insisted that Ann learn the skill. Shortly after Ann’s tenth birthday, King Fredrick commanded the pigeon master to spend the summer tutoring Ann in the art of compiling pigeon instructions by hand. Both Ann and the pigeon master assumed that they were being punished.

“I don't understand why I am doing this,” protested Ann after her father left. The pigeon master nodded in agreement, but he was wise enough not to question the king.

“Learning to compile pigeon instructions is an important skill,” commented the pigeon master. “Its value should not be underestimated. Pigeon code is at the heart of the kingdom's entire communication network. The network carries reports from all territories, messages from advisors, and the king’s own commands. You could say that the fate of the kingdom depends on the correct and efficient compilation of BIRDS instructions!” The pigeon master obviously felt passionate about his job.

“But it is so boring,” Ann objected. “Why can't you simply tell the pigeon to fly to Paris, retrieve a message, and come home.”

The pigeon master sighed. He had been over this already. “How does the pigeon know where Paris is? And how does it know what message to pick up? And how does it know how to get back? You have to tell the pigeon exactly what to do. Otherwise it will just look at you blankly and bob its head.”

“I am telling it 'Go to Paris.' That should be enough.” argued Ann.

“It would be enough IF the pigeon had some internal concept of 'going to paris'. But, most unfortunately, that is not one of the skills with which a pigeon is born.”

“Why not?” asked Ann.

The pigeon master looked shocked, as though he could not believe the question. “I do not get to control the knowledge with which pigeons are born,” he finally answered. “I only train the pigeons.”

“You could teach them about Paris,” Ann suggested. “I learned about Paris in school when I was five.”

“What if you wanted the pigeon to go to Atlantis? Or North Saskatoon? Should the pigeons memorize every possible path?” questioned the pigeon master.

“I guess that would be a little much,” agreed Ann. “Especially for birds that can only remember two things at a time. They do not seem up to that sort of challenge.”

“Exactly,” agreed the pigeon master. “The BIRDS code provides a minimal set of instructions that every pigeon knows. It is a building block that can be used to create arbitrarily more complex programs. However, the king and his advisors do not write in BIRDS code. They write in higher level concepts, such as 'Fly to Perth'. As compilers, our job is to transform these high level concepts into simple instructions that a pigeon can understand.”

He put the BIRDS manual in front of Ann and showed her how to translate a simple flight around the castle. Each seemingly simple instruction morphed into a list of agonizingly low-level details. Move this data here, then use it to compute this turn here. Ann struggled to keep the relevant context in her head.

Ann studied each of the BIRDS instructions, trying to memorize them all. The instructions themselves were simple. However, using them to efficiently perform a desired task was an art form.

After a few hours, the pigeon master told her that she was done for the day. As Ann stood up to leave, she thought of another question.

“Don't you ever wish that pigeons were smart enough to just understand what you mean to say?” she asked.

The pigeon master gave a long sigh. “Every single day,” he replied sadly.

Thursday, July 14, 2011

The Hamming Distance Option: Part 3 of Peter and the Postal Service

Hamming distance is a measure of the difference between strings of two equal length. Formally, the hamming distance between two strings is a count of the number of positions in which those strings differ. It is computed by traversing both strings, character by character, and counting the differing characters. Two strings with a hamming distance of zero are perfect matches.


Peter was shocked that the mailman had refused to deliver the letter to "Petee Fencer" because it was not a perfect match with his name. The 'e' at the end of the name was obviously supposed to be an 'r'. Sowhat if Peter had been the one to insist on perfect string matching? The mailman should have been able to pick up this error easily.


"Okay. New plan." started Peter. "Let me tell you about hamming distance…"


The mailman continued to smile politely. He was clearly enjoying this new game.


"Hamming distance is a measure of the distance between two strings. You go through each position and count how many of the characters are different. For example, the distance between "Petee" and "Peter" is 1 -- the last letters do not match."



"Okay." Agreed the mailman. "And the distance between 'Peter' and 'Puget' is 3, right? And the distance between 'Peter' and 'Smart' is 5, right?"



"Uh... yeah." confirmed Peter. Obviously the mailman was well versed in hamming distance. Peter wondered how he had had those two examples so readily available.


"What if the lengths of the strings are not equal?" the mailman asked. "Doesn't hamming distance apply to equal length strings?"


"Well... Yes. Technically, they are supposed to be equal length in order to compute hamming distance. I guess that we should be using a slightly different metric in that case..." commented Peter. He thought over the situation for a moment. "Let's just pretend they are the same length. Just add spaces to the end of shorter one until they are the same length."


"Really? Isn't that cheating?" The mailman gave Peter a shocked look. It was one thing to insist on using a specific metric to match mail, but it was another matter entirely to cheat with that metric.


Peter dismissed this concern. "Filter out all letters with hamming distance greater than 2."


"Are you sure?" the mailman asked.


Peter nodded.


"So you do not want anything to 'Mr P. Fencer'?" confirmed the mailman.


"Wait! Of course I do. That is obviously me. P is just an abbreviation." yelled Peter.


"But the hamming distance is much more than 2. In fact, it is 5. "Mr P. Fencer" and "Peter Fencer" do not match until the space before your last name." argued the mailman innocently.



"Yes, but…" started Peter.


"Nope." interrupted the mailman. "You told me to filter all names with a hamming distance greater than 2."


"Wait." Peter argued. "How about string edit distance?" he tried.


The mailman looked at him for a full minute. "Do you want the letter I have for 'Apprentice P. Fencer'? Or do you want to keep trying to help me improve my job?"


Peter sighed heavily. He had underestimated the complexities of string comparisons for mail delivery. He should have listened to the mailman's warning.


"If I apologize, can you go back to the algorithm that you were using before?" Peter asked.


The mailman looked at Peter for a moment. "I suppose I could. You know that it is not 100% accurate though, right? It even uses a few heuristics."


Peter nodded humbly.


From that day on, Peter started receiving all of his mail. He learned not to argue string comparison algorithms with postal carriers.


--------------


See how it started with Part 1 and Part 2 of Peter and the Postal Service.

Sunday, July 10, 2011

Incorrectly Delivered Mail and Comparing Strings: Part 2 of Peter and the Postal Service

Strings are sequences of characters. In order for two strings to be identical, every character must match between the two strings. Thus, you can check whether two strings are identical by iterating over each position in the string and comparing the corresponding characters.


"Excuse me!" Peter shouted while running down the road after the mailman. "This letter is not addressed to me."


"It isn't?" asked the mailman. He looked annoyed at the interruption to his routine.


"No. It is not." confirmed Peter. "My name is Peter Fencer. This letter is for Paul Fencer"


"Eh... Sorry. They are close." replied the mailman.


"Close does not count!" objected Peter.


"Huh?" asked the mailman.


"When comparing two strings, you need to compare every single character. The two strings match if and only if every character is the same!" argued Peter.


The mailman looked at Peter skeptically. "Are you sure that you want to argue this with me?" he asked.


"I most certainly do." continued Peter. "Here is the algorithm you should be using:

IF the two strings are different lengths THEN one of them has extra (unmatched) characters at the end, so reject the match.

Otherwise, FOR EACH position in the string: compare the corresponding characters in each string. IF they are not the same, reject.

And if you reach the end of the strings without rejecting, then you have a match."


"Alright." said the mailman. "I will use that test from now on. What string do you want me to use for comparison?"


Peter smiled. He pulled out his new return address stamp, stamped a blank piece of paper, and handed it to the mailman. "This." he declared.


The mailman nodded and left.


For two blissful weeks everything seemed to go perfectly. Peter never had to chase the mailman down the street to return a piece of incorrectly addressed mail. Peter felt pride that he helped the mailman improve his job performance.


Then, Peter noticed that he was missing a very important letter from the head of the library association. He had been assured that it would arrive last week. He approached the mailman the next day.


"Excuse me. I am waiting for a letter from the library association. Have you seen it?" he asked.


"No. I have not seen anything from the library association that matched your address." answered the mailman. "There was something for a Petee Fencer, but not for Peter Fencer."


"Wait!" cried Peter. "That is me! It is obviously a typo. The 'r' probably looked like an 'e' on some list."


"That cannot be right. It does not pass the algorithm you gave me." argued the mailman innocently.



"Yes, but…" started Peter. He wanted to scream.


The mailman stood there politely smiling.


"Okay. New plan." started Peter. "Let me tell you about hamming distance…"


---------------------------


Learn more about strings by reading about Peter and his new address stamp.


Other updates: Illustrations added to "Hunting Dragons With Binary Search" and "President of the Heap".