Tuesday, January 22, 2013

An Outline of a Successful Tower Defense Game – Part 1


Hi All,

Tower Defense (TD) is a popular genre of games for both the mobile and the PC world. The underlying idea is similar in almost all the TD games although some variations exist. The player is faced with a map which has an area (one ore more) from which critters (monsters) emerge and an area to which the critters are trying to get. In some variations of the game the critters try to steal some resource from the target area and bring it back to the area from which they emerged. The critters usually come in waves of ever increasing strength. The player must build defensive towers along the map to shoot or otherwise harm the critters so that they not achieve their goal. In most cases the player has one or more limited resource at his disposal but killing a critter grants the player a small amount of this resource. Sounds simple? It is simple! This is exactly why TD games are so popular, fun, and addictive.

During the past five years I have been playing numerous version of this simple concept. Over the time I came to realize that some of the TD games are much less fun than the others and I asked myself “why?”. In this blog post I try to all the properties that make a TD game fun and loved by players. The analysis is based mainly but not only on the most successful TD games on the internet: Kingdom Rush, Cursed Treasure: Don’t touch my gems!, The Protector series, The GemCraft series, Balloon TD series, Flash Element TD, Desktop TD, and many others.

If you want the key takeaway from this post then perhaps the two most important tips are:

  1. Don’t make random maps. Each map must be carefully planned for the upgrades available for the player in that level and the style of the game.
  2. Don’t confuse the player, too many tower types, too many upgrades, too complex construction scheme, too many critters, etc… In a successful TD game simplicity and clarity is the key.

Map Style

There are two main map styles:

  • Open map – The map is an open field of some discrete grid. The player can place towers on the field itself blocking the critters’ path effectively creating a maze of death through which the critters must pass. In this map type the critters keep trying to find the shortest route to the target zone while the player is not allowed to close the maze in such a way that no such route will exist. This map type allows a special technique called critter herding or juggling. With this technique you create a maze with two possible exits, when the critters rush toward the first exit you block it, when they rush toward the second exit you unblock the first and block the second and so on.
  • A path – The map contains a pre-defined path from the source of the critters to the target area. The path may have splits but the critters will always follow the path. Usually the path is somewhat wide and the critters can populate it in a non discrete fashion. In this map type the towers are built outside the path and may not block the path in any way.

DesktopTdCritters follow a simple maze in Desktop TD.

There are TD games (e.g. Canyon Defense) that have a mixed model but even in these games the path model is the main one.

My observation: It seems that the map style is not a huge factor in making a TD successful or not but it is clear that path based TD games are the vast majority of all the successful TD games. This observation supports the key takeaway (1) that a map must be carefully planned by the game developer and not left random to the player.

For both map styles there is a tower placement scheme. In some TD games it is possible to build towers in specific predefined areas/squares/places while other areas are blocked for constructions by some objects or scenery. Cursed Treasure, for example, allows building specific towers in specific land types thus controlling the tower placement but still allows for some freedom on where you may put the towers.

My observation: The best TD games limit the tower placement areas in some sense or the other. If the maps are planned carefully then it is possible to limit tower placement to specific locations, otherwise it is best to have some limitation which is reasonable for the map.

Towers and Upgrades

The main defense force of every TD game is the towers. Towers come in all shapes and sizes, from arrow shooting archers to fast firing machine guns, from monkeys throwing darts to giant rocket launchers, you name it. All TD games support upgrading the towers in one way or the other, some allow multiple upgrade paths which allow your tower to become more effective in certain locations on the map.

My observation: Don’t have more than 5 tower types. Most successful TD games have between 3 and 5 tower types. More tower types confuse the players and usually don’t contribute anything to gameplay.

BalloonTD5Balloon TD 5 has 17 tower types

Tower upgrades make towers more effective in their ability to dispatch critters. Upgrades usually affect the basic characteristics of the tower (e.g. range, firing speed, and damage) but may also allow the tower to create some special effect on the critters it hits (e.g. poison damage, slow, freeze, etc...). In most cases the upgrade is done directly on the tower by spending some resource but in some cases it can be done by attaching an external element (e.g. fire gem stone) or by another supporting tower. Some TD games allow the upgrade of a tower based on the tower experience points (XP) and not simply by using a player resource. The tower gains experience by killing critters and this experience can be spent on upgrading the tower. Note that this upgrades scheme requires careful map planning since it sometimes creates a large gap between tower capabilities over the entire map.

My observation: Don’t have multiple upgrade paths for the tower, at least not for the first couple of upgrades. At the end of the upgrade line you may allow two possible paths. For example a fast firing tower may upgrade fire speed and damage for 3 times and then chose to become a tower that specializes in long range or in an even faster firing rate.

My observation: Prefer upgrading the tower itself and not by adding some supporting tower or other type of attachments.

My observation: If you chose to make tower upgrades available based on the tower XP, make sure to have a sensible XP scheme and map planning. The Protector TD series makes excellent use of this upgrade type by providing carefully designed levels.

Bubble Tank TD  Upgrade hell of Bubble Tank TD 2, can you understand what is going on here?

Stay tuned for part 2 where I discuss critters and player upgrades (achievements).

Thanks for reading,


Saturday, December 15, 2012

A short update

Hi All,

It has been a long time since I posted on my blog. In the past weeks I was quite busy with various small projects and simply didn’t have the time to write a new post. Here is a quick review of the main things that I was working on.

  1. I fixed that annoying bug in npm where it could not parse a package.json file that had a BOM header. It was particularly annoying to Visual Studio users since it creates the BOM marker by default on any file you create. The main issue was that npm didn’t give a nice warning about it but instead failed to parse the package.json file. The fix was very small as you can see here.
  2. I developed an npm package called hashes which is an implementation of a HashTable and a HashSet in JavaScript. I tried to make the package as high quality as I could therefore it includes detailed documentation, tests, and extensibility points.
  3. I wrote a small application to help me catch the person who constantly turns off my air conditioner at work. (Just to clarify things, the aforementioned AC covers only two cubes, mine and my team mate’s. Needless to say that my team mate doesn’t turn the AC off.) The application uses a webcam to capture a short film when it detects changes in pre-defined areas of the frame. The code is written really badly so don’t take it as an example for anything.
  4. I uploaded some of my older projects to my GitHub. Feel free to fork them.
  5. Last and not least, I am involved in the Eclipse – Orion project currently mostly solving bugs but in the future I am planning to contribute features.

That’s it for now. Thank you for reading.


Saturday, October 20, 2012

Simply Explained: Multithreaded vs. Async as the hamburger restaurant



Recently I have been lurking around the Node.js IRC channel and I saw that many people start using Node.js without realizing the main differences between the multithreaded approach and the asynchronous approach. This post will explain this difference through a simple (non computer oriented) example.

Suppose you are starting a hamburger fast food restaurant. In your restaurant you have only one dish: A hamburger. It is made by strictly following these steps:

  1. When a customer comes in the clerk takes his order (10 sec).
  2. The clerk puts a meatball on the grill and roasts it (30 sec).
  3. The clerk puts the buns in the oven and warms them (30 sec).
  4. The clerk assembles the hamburger and serves it to the customer (10 sec).

This recipe may not result in the best of hamburgers but it’s a start. In our example we have two critical resources (marked bold), the grill and the oven. When each of these devices is in use, no one else can use them. The clerk represents our thread and the recipe our code. The clerk follows the recipe to the letter in the same way our code is executing on a CPU.

--- What happens in a single threaded environment?

We have one clerk. When he puts the meatball in the grill he has to wait near the grill (doing nothing) while the meatball is fully roasted, then he takes the meatball out and can proceed to the oven. He puts the buns in the oven and again… waits… Until the buns are warm. The entire hamburger making process takes 80 seconds (10 + 30 + 30 + 10).

This is not bad for a single customer. In fact, the fastest any customer can be served a hamburger is 80 seconds so in the case when only one customer comes to the restaurant we are doing a perfect job serving him fast. The problem starts when there are several customers coming in. Consider the case when two customers are coming in together. The clerk has to follow the recipe for both customers, the first customer  gets a hamburger after 80 seconds while the second customer after 160 seconds (since the clerk can start making the second hamburger only after he is done with the first). Each additional customer would have to wait additional 80 seconds for any other customer standing before him in the line. The next diagram depicts serving one customer by a single clerk.


To solve the aforementioned problem we usually use a multithreaded model. In our example we simply add another clerk. In this case when two customers come into the restaurant each customer will be served by a different clerk. On first glance you might think that now each customer gets a hamburger after 80 seconds but this is not the case. Remember the critical resources? Both clerks take the order at the same time but only one of the clerks can use the grill (while the other waits for the grill to be available). The diagram for two clerks serving two customers looks like this:


The first customer still gets the hamburger after 80 seconds but the second customer gets his after 110 seconds, 50 seconds faster than in the single threaded case. From this example we can deduce two things:

  1. If more than two customers come we need more clerks. Clearly, there is a limit to the number of clerks we can have (and that limit is probably not very high). More clerks mean a crowded kitchen and at some point they start bumping into each other and even taking the wrong meatballs from the grill. This happens in software too. The more threads you have the more time you computer spends on switching between them and the more time you spend on debugging various race conditions .
  2. No matter how many clerks we hire we still have only one grill and one oven so at some point adding clerks will have no added value (in fact it will just cause more problems, see item 1). This happens in software too. So what do we do? We buy more grills… errr… RAM, CPUs, storage, bandwidth etc... Our handling capacity is directly linked to the hardware upgrades we are making.

--- Where does the asynchronous model comes in?

In the asynchronous model we make a small change to our kitchen equipment. Instead of hiring more clerks we have only the one clerk but he doesn’t wait at the critical resources. Instead, the grill and the oven have an input box and an output box each. When the clerk wants to use the grill, for example, he puts the meatball in the input box and goes away to do anything else. When the meatball is ready (after 30 seconds) it drops into the output box and the grill rings a bell. The clerk can pickup the meatball and continue the recipe. The diagram for serving two customers in this model looks like this:

Drawing3 (3)

Note the colors: White – the clerk is actually doing some work, Grey – the clerk is not doing anything, Yellow – the machines are doing the actual work.

We can see that the second customer still gets his hamburger after 110 seconds with only one clerk. We can also see that during these 110 seconds the clerk was occupied for 40 seconds and the rest of the time he was waiting for additional customers to come in (we didn’t have that slack time in the multithreaded model).

--- Wait! What? Making the algorithm async didn’t make it faster?

No! Async will not make your algorithm run faster, if anything it will make it run slower.

--- Then why people use Node.js which works in the async model?

The answer is in those Grey areas in the last diagram. Think about the second example (with the multithreaded model), if another customer comes in and wants to order ice-cream (which takes 10 seconds to make and requires the ice-cream machine). The customer would have to wait 120 seconds to get the ice-cream simply because there is no clerk to take his order. In the async model it would take 30 seconds because the single clerk can take the order 20 seconds into handling the first two customers.

The main point here is that you can’t take the clerk from the second example and put it to work as the clerk in the third example.  He simply would not understand why the machines are making all those sounds. In the software world, you can’t take code written for a multithreaded server, translate it to JavaScript and let it run on Node.js because it doesn’t work in the async model. Doing so means that you are back to the single threaded model of the first example which would certainly cause severe degradation in the response times of your server.

So what should you do to make your Node.js server as responsive as possible? (Not as fast as possible!)

  1. Use the async framework that Node.js provides for you and avoid using the sync methods.
  2. Make your code async by implementing your own async calls or using the calls of Node.js APIs.
  3. Make the non async code section as short as possible. Look at the ice-cream example. If we could reduce the “Take order” step into 5 steps that take 2 second each then the third customer could get the ice-cream faster (14 seconds).

Conclusion, Node.js with its async nature is a great tool but as with any other tool it requires understanding of how and when it should be used. Node.js is not a magic solution that makes any server run faster but a framework that allows you to build a server with improved response times for high number of users without considerably increasing the hardware capacity.

Thank you for reading. Please write your thoughts in the comments.


Saturday, September 1, 2012

Async ForEach–The basic three levels of async loops


Hi All,

Today I am going to continue my series about async. This time I will talk about loops or more specifically about for each loops over a collection. I present async for each loops by through three examples where each example adds the asynchronous part to a different section of the loop.

Non-Async loop

This is the most basic type of loop and is the same loop that is provided by all sorts of frameworks (for example underscoreJS has a forEach method). In this case it is up to the user to make the code asynchronus and the loop function itself does not provide any facilities for it.

Non async loop
Code Snippet
  1. App.eachSync = function (collection, iterator, context) {
  2.     var token = {
  3.         canceled: false
  4.     };
  5.     for (var i = 0, l = collection.length; i < l; i++) {
  6.         iterator.call(context, collection[i], token);
  7.         if (token.canceled) {
  8.             break;
  9.         }
  11.     }
  12. };
(for brevity I have omitted all the error handling code)
The loop function takes three arguments:

  • collection – The collection to iterate over.
  • iterator – The function that will be called for each iteration.
  • context – The pointer to “this” operator within the iterator function.

The iterator takes two arguments

  • The item of the current iteration
  • A token which can be used to transfer data between the iteration (in this example used for breaking out of the loop).

If the user wants to run something asynchronously then she must add the asynchronous code calls within the iterator method.


The next obvious step from the previous version is that the loop calls the iterator asynchronously. In the following code the loop runs synchronously but schedules each call to the iterator to run asynchronously. In this case, if the collection contains many items, the loop itself may make your thread stuck.

A loop that calls the iterator asynchronously
Code Snippet

  1. App.eachAsync1 = function (collection, iterator, context) {
  2.     var token = {
  3.         canceled: false
  4.     };
  5.     for (var i = 0, l = collection.length; i < l; i++) {
  6.         setTimeout(function (index) {
  7.             if (!token.canceled) {
  8.                 iterator.call(context, collection[index], token);
  9.             }
  10.         }, 0, i);
  11.     }
  12. };


Another approach to the async loop problem would be to make the loop itself asynchronous. To do this we need to save the current iteration nu,ber within the scope and call an intermediate method iteratively. The intermediate method will schedule itself and call the iterator. The code would look something like this: 

Both the loop and the iterator calls are async
Code Snippet

  1. App.eachAsync2 = function (collection, iterator, context) {
  2.     var token = {
  3.         canceled: false
  4.     },
  5.     totalLength = collection.length,
  6.     innerFunction = function (index) {
  7.         setTimeout(function (innerIndex) {
  8.             if (!token.canceled) {
  9.                 iterator.call(context, collection[innerIndex], token);
  10.             }
  11.         }, 0, index);
  12.         index++;
  13.         if (index < totalLength) {
  14.             setTimeout(innerFunction, 0, index);
  15.         }
  16.     }
  18.     innerFunction(0);
  19. };

At first, the innerFunction is called with the index 0. Next, it schedules the iterator call and itself until we reach the end of the collection. Note that it is possible to run the iterator directly and not schedule it.


As always with async we gain non blocking code on the expense of time. The code runs slower but in chunks, each chunk block the thread for a short while but allows other things to run in the thread in between the chunks. I ran the three loops on a code that prints the numbers 1 to 1000 to the console. As expected the performance degradated noticeably from one method to the next.

To quote Isaac Z. Schlueter (and I hope I am quoting right) – Async is something that you do not when you want to make things go faster but when you can’t make things go fast enough.


Thank you for reading,


Friday, August 10, 2012

Async Mutex – Control flow in asynchronous functionality


Hi All,

In this post I am going to continue talking about asynchronous execution. Having read about async one might think that mutual exclusion and flow control structures are no longer required but this is not the case. For async execution you need similar structures with the async twist to them. In this post I am going to present a simple case where an async mutex is needed and used.

The scenario is quite simple. Through async calls you acquire some data structure, calculate some value and add it to the data structure (which affects all subsequent calculations). In my example I would like to calculate the first 7 elements of the Fibonacci series. For the sake of the argument I will assume that each operation must be performed asynchronously.

(Sorry C# guys, this code is in JavaScript).

 var data = [0, 1];

function retrieveData(callback) {
var x = data[data.length - 1];
var y = data[data.length - 2];
setTimeout(function() {
callback(x, y);
}, 0);

function calculateSum(x, y, callback) {
var sum = x + y;
setTimeout(function() {
}, 0);

function addSum(sum, callback) {
setTimeout(callback, 0);

//Adds count fibonnacci elementd to data
function fibonacci(count) {
var i;
for ( i = 0; i < count; i++) {
retrieveData(function(x, y) {
calculateSum(x, y, function(sum) {
addSum(sum, function() {

So what do we have here…  The next Fibonacci number is calculated through three async calls. The first call to retrieveData takes the last two numbers on the array and returns them through an async call to a callback. Think of it as retrieving a record from a remote repository or database. Next we have an async call that does the “logic”. It calculates the sum of the two numbers through the calculateSum function. The sum itself is then passed to a callback which adds it back to the original array, again through an async call. When I run this code on V8 I get the following result:


5 x [0, 1, 1, 1, 1, 1, 1]

5 times the array you see above. But why? Lets look at the loop and try to trace what happens. The loop runs 5 times without interruptions (we are not multi-threaded). Each run it schedules the call to retrieveData on the event loop. Next, the function fibonacci exits and retrieveData is run five times in a row. At this point all five execution scenarios run on the same two values of the array – 0 and 1. Next, calculateSum and addSum run (each runs five times in a row). The end result is now clear. What we need here is to tell the event loop that it can run the functions asynchronously but only one execution may run within the loop. To this end I have implemented an async mutex:

   App.AsyncMutex = function() {
this.queue = [];
this.ownerToken = null;
this.token = 0;

App.AsyncMutex.prototype.enter = function(callback) {
if (this.ownerToken !== null) {
} else {
this.ownerToken = this.token;

App.AsyncMutex.prototype.leave = function(token) {
if (this.ownerToken === null || this.ownerToken !== token) {
throw new Error("Owner token mismatch. Expected " + this.ownerToken + " but received " + token);

var callback;

if (this.queue.length > 0) {
this.ownerToken = this.token;
callback = this.queue.shift();
} else {
this.ownerToken = null;

The async mutex implementation is very simple. To protect some code simply wrap it with a call to mutex.enter. In the callback you receive a token which you use when you are done with the protected code and you call mutex.leave(token). The mutex assures that only one executer is within the protected code and adds all the others to a queue. Once the executer is done it runs the next executer in the list.

To use the mutex we need only a tiny code change:

      function fibonacci(count) {
var i, mux = new App.AsyncMutex();
for ( i = 0; i < count; i++) {
mux.enter(function(token) {
retrieveData(function(x, y) {
calculateSum(x, y, function(sum) {
addSum(sum, function() {
Run the code again and the result is:
[0, 1, 1] 

[0, 1, 1, 2]

[0, 1, 1, 2, 3]

[0, 1, 1, 2, 3, 5]

[0, 1, 1, 2, 3, 5, 8]

Exactly as we want it to be.



Thank you for reading.



Friday, August 3, 2012

Simply Explained – Async


Hi All,

The blog is back after a somewhat long break. The break was due to two major changes in my life, the first is that I moved to a new apartment and had to deal with lots of new apartment stuff. The second is that I made a career change of 180 degrees (in terms of technological stack) and moved from Windows-.NET-WPF to Linux-Javascript-NodeJS. This means that you will probably see a change in content in future posts.


Modern applications are becoming more demanding in CPU resources. The CPU manufacturers used to deal with this demand by increasing the clock speed of the processor and by improving the internal architecture. In recent years (~10) this trend has changed to building multi-code processors and the dependence on multi threaded designs to speed up the application. In the world of .NET dealing with threads was usually done by experts as there are many pitfalls such as race conditions and mutual exclusion to be considered. In the last two versions of .NET framework this entire area was dramatically improved by introducing many new classes and the TPL framework and integration into the language in C# 5. You can read more about it in Pavel’s blog here and in this nice post that summarizes some of the key features of multi threaded libraries in .NET 4.

Parallel to the trend of multi-threaded design another trend evolved (or rather forced). The event loop design. It is not new by any means and in fact Windows is using this concept from the very first versions. The basic idea is that there is a queue of events and a process that de-queues the events one by one and handles them. The events may come from external sources like IO or internally from the process. In this post I will demonstrate how to implement a simple factorial method in the three paradigms:

Simple Code

In this paradigm we don’t do anything special and just write the code ignoring the implications of a long running method. The code would look something like that:

The simple way.
Code Snippet
  1. private BigInteger Factorial(int n)
  2. {
  3.   BigInteger result = 1;
  4.   for (int i = 2; i <= n; i++)
  5.   {
  6.     result = result * i;
  7.   }
  9.   return result;
  10. }


This is how you would use it.
Code Snippet
  1. myLabel.Content = Factorial(50000).ToString();

The main problem here is that when this code is run, everything else doesn’t. The UI will appear to be stuck since it is not being refreshed and no interactions are possible with the process.

What you would probably do in this case is jump straight to the multi-threaded paradigm.


In this paradigm we want to call our calculation on a background thread and once it is done let it report the result on the UI thread (updating the label). The code would be something like this:

Running the calculation in the background.
Code Snippet
  1. Thread thread = new Thread(new ParameterizedThreadStart(ThreadedFactorial));
  2. thread.IsBackground = true;
  3. thread.Start(50000);


The code that runs on the background thread.
Code Snippet
  1. public void ThreadedFactorial(object n)
  2. {
  3.   BigInteger result = Factorial(Convert.ToInt32(n));
  4.   this.Dispatcher.Invoke(new Action<string>(PrintResult), new object[] { result.ToString() });
  5. }


The result is reported on the UI thread.
Code Snippet
  1. public void PrintResult(string result)
  2. {
  3.   myLabel.Content = result;
  4. }

This implementation solves the issues with the simple implementation and is quite simple. The problems begin when there are resources shared between threads. You have to use locks, semaphores, and other thread synchronization constructs to make things work right. When the code gets larger you find yourself thinking “but what if the context switch is between the “if” statement and the assignment” and end up adding a lock on the entire method. We all been there. (And I will not start with how hard it is to debug concurrency issues.)

One more thing to take into account is that some environments don’t have threads (not .NET). Your browser running JavaScript probably does it on a single thread (the JavaScript not the browser).

The next paragraph shows how to write the same functionality on a single threaded system.


The dispatcher is a “magical” object. It allows you not only to sync calls from different threads but also to schedule calls on the same thread (the UI thread in our case). Note the second parameter of the method Dispatcher.BeginInvoke is DispatcherPriority which allows you to add your method execution to a specific priority within the event loop of the UI thread. If you are able to break the calculation (or any other task) to small enough pieces then you can schedule those pieces one after the other on the event loop. This will insure that other events such as repaints and IO inputs are still handled by the process since they are interlaced within your calculation. The code may look something like this:


Calling the factorial method. Note the last parameter is the callback that should be run once the method is finished. Passing the callback as the last parameter is a common practice in this paradigm.
Code Snippet
  1. PartialFactorial(1, 1, 50000, new Action<string>(PrintResult));


The partial factorial method.
Code Snippet
  1. public void PartialFactorial(BigInteger result, int current, int end, Action<string> callback)
  2. {
  3.   BigInteger tempResult = result;
  4.   int i;
  5.   for (i = current; i <= current + 100 && i <= end; i++)
  6.   {
  7.     tempResult = tempResult * i;
  8.   }
  10.   if (i <= end)
  11.   {
  12.     this.Dispatcher.BeginInvoke(new Action<BigInteger, int, int, Action<string>>(PartialFactorial), System.Windows.Threading.DispatcherPriority.Background, new object[] { tempResult, i, end, callback });
  13.   }
  14.   else
  15.   { callback(tempResult.ToString()); }
  16. }

There are a couple things to notice here. First, I perform only 100 calculations and then schedule the rest by invoking the same method with the new parameters. The scheduling is done with the Background priority which is after all the IO and rendering is already finished. If the calculation is complete I call the callback provided for me by the original caller. I could schedule the callback call as well.

The main advantage here is that you don’t need to deal with any threads as everything is done on a single thread. There are several disadvantages that I should mention:

  1. The code is a bit awkward. One can get used to this but it will always seem a bit clogged compared to the clean code of the multi-threaded approach. This is mainly due to the specific example I give here. Many processes can be broken into very small work units without making the flow so hard to understand.
  2. There is a loss of call stacks. Every time you schedule a method to the event-loop you are basically starting a new call stack. This is for better and for worse (no stack overflow as in recursion).
  3. The framework has to support this approach. In the example above one of the most time consuming methods was actually converting the BigInt to string (if you talk binary you know why it takes so long). There was no simple way to break this method to smaller bits so the UI is stuck while in this method. Frameworks like NodeJS which are built on the principle of event loop will make sure that no method blocks for that long.


I ran a small performance test on all three methods. And the event loop approach was a little slower than the other two but not by a factor. For a factorial of 50000 it took the first two methods ~7.5 seconds to complete while the event loop method took ~8 seconds. I tried different number of iterations within the PartialFactorial methods. Values above 300 made the UI cough a little bit. With a value of 100 there were no visible penalties.

Thank you for reading,


Friday, April 27, 2012

LongString: When things become unequal (comparing two long strings) – Part 2


Hi All,

In Part 1 of the LongString post I have shown how we can tackle the problem of running out of memory when manipulating long string using a class similar to the LongString. In Part 2 I will discuss a problem which arises from my implementation of LongString class when implementing the IEquatable<LongString> interface. As you may recall, the LongString is an aggregation of regular strings which are less or equal than a given value. The direct result of this definition is that two equal long strings may have a completely different representation (even if the given value is the same). I will discuss editing operations in Part 3 of this series but you may consider the case where for the given value of 3 the long string (“AB”,”CD”) is equal to the long string (“ABC”,”D”) but they have a different internal representation.

Implementing Equality

The first thing I did was to check how the framework string implements the equality method. You can see the code below but essentially since the strings are represented as a continues memory block the code simply goes over all the bytes and bitwise compares them (with a small optimization which I will not discuss).

  1. private unsafe static bool EqualsHelper(string strA, string strB)
  2.     {
  3.       int i = strA.Length;
  4.       if (i != strB.Length)
  5.       {
  6.         return false;
  7.       }
  8.       char* ptr = &strA.m_firstChar;
  9.       char* ptr2 = &strB.m_firstChar;
  10.       while (i >= 12)
  11.       {
  12.         if (*(long*)ptr != *(long*)ptr2 || *(long*)(ptr + (IntPtr)8 / 2) != *(long*)(ptr2 + (IntPtr)8 / 2) || *(long*)(ptr + (IntPtr)16 / 2) != *(long*)(ptr2 + (IntPtr)16 / 2))
  13.         {
  14.         IL_7D:
  15.           while (i > 0 && *(int*)ptr == *(int*)ptr2)
  16.           {
  17.             ptr += (IntPtr)4 / 2;
  18.             ptr2 += (IntPtr)4 / 2;
  19.             i -= 2;
  20.           }
  21.           return i <= 0;
  22.         }
  23.         ptr += (IntPtr)24 / 2;
  24.         ptr2 += (IntPtr)24 / 2;
  25.         i -= 12;
  26.       }
  27.       goto IL_7D;
  28.     }

I wanted to implement something similar and therefore my first implementation was like this:

  1. public bool EqualsVerySlow(LongString other)
  2. {
  3.   if (Length != other.Length) return false;
  4.   for (int i = 0; i < Length; i++)
  5.   {
  6.     if (this[i] != other[i])
  7.     { return false; }
  8.   }
  9.   return true;
  10. }

But then I discovered that I didn’t implement the indexer yet so I simply use the method from Part 1 which returns the internal indices (within the array of strings and in a string) GetItemIndexForCharIndex which resulted in this simple code:

  1. public char this[int index]
  2. {
  3.   get
  4.   {
  5.     Tuple<int, int> innerIndex = GetItemIndexForCharIndex(index);
  6.     return _items[innerIndex.Item1][innerIndex.Item2];
  7.   }
  8. }

The main problem here is that the implementation of GetItemIndexForCharIndex calculates the indices by going over the entire array of strings. This is OK for the indexer but it is not very good when used in a loop therefore it is time to implement the enumerator. A simple enumerator which “remembers” the current position and moves forward should be enough for this case.

  1. class LongStringEnumerator : IEnumerator<char>
  2. {
  3.   LongString _innerString;
  4.   int _currentIndex;
  5.   int _currentCharIndex;
  7.   public LongStringEnumerator(LongString target)
  8.   {
  9.     _innerString = target;
  10.     _currentIndex = 0;
  11.     _currentCharIndex = 0;
  12.   }
  14.   public char Current
  15.   {
  16.     get
  17.     { return _innerString._items[_currentIndex][_currentCharIndex]; }
  18.   }
  20.   public void Dispose(){}
  22.   object System.Collections.IEnumerator.Current
  23.   {
  24.     get
  25.     { return Current; }
  26.   }
  28.   public bool MoveNext()
  29.   {
  30.     _currentCharIndex++;
  31.     if (_currentCharIndex >= _innerString._items[_currentIndex].Length)
  32.     {
  33.       _currentCharIndex = 0;
  34.       _currentIndex++;
  35.       if (_currentIndex >= _innerString._items.Count)
  36.       {
  37.         return false;
  38.       }
  39.     }
  40.     return true;
  41.   }
  43.   public void Reset()
  44.   {
  45.     _currentCharIndex = 0;
  46.     _currentIndex = 0;
  47.   }
  48. }

Now I can fix the Equal code to use the enumerator instead of the indexer.

  1. public bool Equals(LongString other)
  2. {
  3.   if (Length != other.Length)
  4.     return false;
  6.   LongStringEnumerator enumerator = new LongStringEnumerator(this);
  7.   LongStringEnumerator enumerator2 = new LongStringEnumerator(other);
  9.   do
  10.   {
  11.     if (enumerator.Current != enumerator2.Current)
  12.       return false;
  13.     enumerator2.MoveNext();
  14.   }
  15.   while (enumerator.MoveNext());
  17.   return true;
  18. }

One question remains… How bad is this solution? I have run some performance tests and it turns out that this equals method is about 30 to 50 times slower than the equals method of a regular string. In absolute terms, this equals method takes up to 20ms when comparing two 1,000,000 char long strings with only one different character (in a random position). The original method is so fast that it cannot be measured in ms Smile.


The main problem with the long string is going to be performance (in terms of time) compared to the regular string. We see that if an algorithm required mainly comparisons then the LongString is not a good choice. Still, remember that the LongString was designed for a very specific scenario in which it should perform better. In the next part I will implement some editing methods.

A small note about concurrency. Just out of curiosity I have implemented the slow method using the TPL:

  1. public bool EqualsSlow(LongString other)
  2. {
  3.   if (Length != other.Length)
  4.     return false;
  6.   bool isEqual = true;
  7.   Parallel.For(0, Length, (i, loopState) =>
  8.   {
  9.     if (this[i] != other[i])
  10.     {
  11.       isEqual = false;
  12.       loopState.Break();
  13.     }
  15.   });
  17.   return isEqual;
  18. }

This was indeed faster in some cases over the very slow implementation but I cannot compete with the concurrency done in the CPU (which will be used to run the original Equals method of string). This implementation may give you some ideas on how to improve the Equals implementation of LongString (e.g. run a forward and a backward enumerators in separate tasks).

Thank you for reading,