Monday 21 October 2013

String.concat vs +Operator in String Java Class

Introduction

Concatenation of Strings is very easy in Java - all you need is a '+'. It can't get any easier than that, right? Unfortunately there are a few pitfalls. One thing you should remember from your first Java lessons is a small albeit important detail: String objects are immutable. Once constructed they cannot be changed anymore.
Whenever you "change" the value of a String you create a new object and make that variable reference this new object. Appending a String to another existing one is the same kind of deal: a new String containing the stuff from both is created and the old one is dropped.
You might wonder why Strings are immutable in first place. There are two very compelling reasons for it:
  1. Immutable basic types makes things easier. If you pass a String to a function you can be sure that its value won't change.
  2. Security. With mutable Strings one could bypass security checks by changing the value right after the check. (Same thing as the first point, really.)

The performance impact of String.concat()

Each time you append something via '+' (String.concat()) a new String is created, the old stuff is copied, the new stuff is appended, and the oldString is thrown away. The bigger the String gets the longer it takes - there is more to copy and more garbage is produced.
Creating a String with a length of 65536 (character by character) already takes about 22 seconds on an AMD64 X2 4200+. The following diagram illustrates the exponentially growing amount of required time:
String.concat() - exponential growth
Figure 1: StringBuilder vs StringBuffer vs String.concat
StringBuilder and StringBuffer are also shown, but at this scale they are right onto the x-axis. As you can see String.concat() is slow. Amazingly slow in fact. It's so bad that the guys over at FindBugs added a detector for String.concat inside loops to their static code analysis tool.

When to use '+'

Using the '+' operator for concatenation isn't bad per se though. It's very readable and it doesn't necessarily affect performance. Let's take a look at the kind of situations where you should use '+'.
a) Multi-line Strings:
String text=
    "line 1\n"+
    "line 2\n"+
    "line 3";
Since Java doesn't feature a proper multi-line String construct like other languages, this kind of pattern is often used. If you really have to you can embed massive blocks of text this way and there are no downsides at all. The compiler creates a single String out of this mess and no concatenation happens at runtime.
b) Short messages and the like:
System.out.println("x:"+x+" y:"+y);
The compiler transforms this to:
System.out.println((new StringBuilder()).append("x:").append(x).append(" y:").append(y).toString());
Looks pretty silly, doesn't it? Well, it's great that you don't have to write that kind of code yourself. ;)
If you're interested in byte code generation: Accordingly to Arno Unkrig (the amazing dude behind Janino) the optimal strategy is to useString.concat() for 2 or 3 operands, and StringBuilder for 4 or more operands (if available - otherwise StringBuffer). Sun's compiler always usesStringBuilder/StringBuffer though. Well, the difference is pretty negligible.

When to use StringBuilder and StringBuffer

This one is easy to remember: use 'em whenever you assembe a String in a loop. If it's a short piece of example code, a test program, or something completely unimportant you won't necessarily need that though. Just keep in mind that '+' isn't always a good idea.

StringBuilder and StringBuffer compared

StringBuilder is rather new - it was introduced with 1.5. Unlike StringBuffer it isn't synchronized, which makes it a tad faster:
StringBuilder compared with StringBuffer
Figure 2: StringBuilder vs StringBuffer
As you can see the graphs are sort of straight with a few bumps here and there caused by re-allocation. Also StringBuilder is indeed quite a bit faster. Use that one if you can.

Initial capacity

Both - StringBuilder and StringBuffer - allow you to specify the initial capacity in the constructor. Of course this was also a thing I had to experiment with. Creating a 0.5mb String 50 times with different initial capacities:
different initial capacities compared
Figure 3: StringBuilder and StringBuffer with different initial capacities
The step size was 8 and the default capacity is 16. So, the default is the third dot. 16 chars is pretty small and as you can see it's a very sensible default value.
If you take a closer look you can also see that there is some kind of rhythm: the best initial capacities (local optimum) are always a power of two. And the worst results are always just before the next power of two. The perfect results are of course achieved if the required size is used from the very beginning (shown as dashed lines in the diagram) and no resizing happens at all.

Some insight

That "PoT beat" is of course specific to Sun's implementations of StringBuilder and StringBuffer. Other implementations may show a slightly different behavior. However, if these particular implementations are taken as target one can derive two golden rules from these results:
  1. If you set the capacity use a power of two value.
  2. Do not use the String/CharSequence constructors ever. They set the capacity to the length of the given String/CharSequence + 16, which can be virtually anything.

Benchmarking method

In order to get meaningful results I took care of a few things:
  • VM warmup
  • separate runs for each test
  • each sample is the median of 5 runs
  • inner loops were inside of each bench unit

Some of discussion example below :
What does the compiler do here?
String s = "This "
s += "is anoher "
s += "concatenated "
s += "string"
Theoretically a compiler could optimize in such way below :
In practice you'll get either this:
String s = "This ";
s = (new StringBuilder()).append(s).append("is another ").toString();
s = (new StringBuilder()).append(s).append("concatenated ").toString();
s = (new StringBuilder()).append(s).append("string").toString();

Second,Example :
String s = "This ";
s = (new StringBuilder()).append(s).append("is another ").toString();
s = (new StringBuilder()).append(s).append("concatenated ").toString();
s = (new StringBuilder()).append(s).append("string").toString();
or that:
String s = "This ";
s = s.concat("is another ");
s = s.concat("concatenated ");
s = s.concat("string ");
How many String object will be created in below chunk of code : 
String s1 = "Hello"+"world";
String s2 = s1+"Java";
if you look at the compiled code, you can easily guess:
String s1 = "Helloworld";
String s2 = (new StringBuilder(String.valueOf(s1))).append("Java").toString();
Do some step way to determined number of String object are created.
The answer is 3.
You can view the deassembled result by:
javap -verbose YourClass
The Constant pool includes:
...
const #17 = Asciz       Helloworld;
...
const #30 = Asciz       Java;
...
It means two strings ("Helloworld" and "Java") are compile-time constant expression which will be interned into constant pool automatically.
The code:
Code:
 Stack=3, Locals=3, Args_size=1
 0:   ldc     #16; //String Helloworld
 2:   astore_1
 3:   new     #18; //class java/lang/StringBuilder
 6:   dup
 7:   aload_1
 8:   invokestatic    #20; //Method java/lang/String.valueOf:(Ljava/lang/Object;)Ljava/lang/String;
 11:  invokespecial   #26; //Method java/lang/StringBuilder."<init>":(Ljava/lang/String;)V
 14:  ldc     #29; //String Java
 16:  invokevirtual   #31; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
 19:  invokevirtual   #35; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
 22:  astore_2
 23:  return
It indicates that s2 is created by StringBuilder.append() and toString().
To make this more interesting, javac can optimize the code in constant folding. You can guess the count of strings created by the following code:
final String s1 = "Hello" + "world";
String s2 = s1 + "Java";
"final" means s1 is constant which can help javac to build the value of s2 and intern s2. So the count of string created here is 2.
Reference link: