Under the Hood of Java Strings Concatenating Performance

>> Sunday, September 19, 2010

Introduction

In my previous post I covered some of the more sophisticated Java strings management issues (Collators, Normalizer, interning, substring). Few of the readers of that post left comments asking about StringBuilder vs. StringBuffer vs. the built-in concatenation (+) operator performance. To have a closure to the subject I decided to elaborate on Java strings concatenating.

Is It Important?
In java we talk a lot about strings concatenating which brings up questions about the importance of that. As always the answer depends on the scenario, if we just want to take two strings and print a message composed out of these strings it is probably doesn't matter - we can use the + operator or String.concat() method and get it done. It is more important when we are processing mass of data and loop a logic performing strings concatenating.



Dynamic vs. Static Strings Concatenating

A static strings concatenating means that all of the substrings building the final string are known at compile time. If this is the case we should use the "+" operator since the compiler will perform the concatenating at compile time without any performance penalty. In the following example see a Java code sample and the corresponding bytecode generated by the compiler:



(a) String s1 = "aa" + "bb"; 

(b) final String s2 = "aa";  
(c) final String s3 = "bb"; 
(d) String s4 = s2 + s3;


// The code above decomopile using 'javap -c'.
//
// The numbers on the left are the offset within the main method. All of the 
// comments below are generated by the javap utility and not added by me.
public static void main(java.lang.String[]);
 Code:
 0: ldc #16; //String aabb
 2: astore_1
 3: ldc #18; //String aa
 5: astore_2
 6: ldc #20; //String bb
 8: astore_3
 9: ldc #16; //String aabb
 11: astore 4
 13: return


The disassembled bytecode illustrates how the compiler performs the static strings concatenating:

  • Offsets 0,2: Load constant number 16 (the string aabb) from the constants pool into variable number 1 (this is s1). If we take a look at the corresponding line in the java class (line (a)) we can see that the compiler translated "aa" + "bb" into "aabb".
  • Offsets 3,5: Load constant number 18 (the string aa) from the constants pool into variable number 2 (this is s2) (corresponding line in the code is (b)).
  • Offsets 6,8: Load constant number 20 (the string aa) from the constants pool into variable number 3 (this is s3) (corresponding  line in the code is (c)).
  • Offset 9, 11: This is probably the most interesting task the compiler does in this example, first it understands that s2 and s3 are final references so it assumes that line (d) is equivalent to "String s4 =
    "aa" + "bb";
    ". If so it can handle it as static concatenating and replace "aa" + "bb" with "aabb". Still it
    doesn't stop here - it continues by looking up "aabb" in the class' constants pool,  since it has already created that constant earlier it uses the same entry in the constants pool as one used for line (a). The outcome is that both s1 and s4 are references to the same instance (in simple English s1==s4 ==> true).
If s2 and s3 weren't static references the compiler would have generate a totally different code - using StringBuilder (I am using JSE 1.6, versions earlier than 1.5 used StringBuffer):



// Notice that s2 and s2 are not final anymore!
(a) String s2 = "aa";
(b) String s3 = "bb";
(c) String s4 = s2 + s3;

// The code above decomopile using 'javap -c'.
//
// The numbers on the left are the offset within the main method. All of the 
// comments below are generated by the javap utility and not added by me.
public void dynamicStrings();
 Code:
 0: ldc #18; //String aa
 2: astore_1
 3: ldc #20; //String bb
 5: astore_2
 6: new #30; //class java/lang/StringBuilder
 9: dup
 10: aload_1
 11: invokestatic #32; //Method java/lang/String.valueOf:(Ljava/lang/Object;)Ljava/lang/String;
 14: invokespecial #38; //Method java/lang/StringBuilder."<init>":(Ljava/lang/String;)V
 17: aload_2
 18: invokevirtual #41; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
 21: invokevirtual #45; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
 24: astore_3
 25: return


I'll not analyze the bytecode as before but the general idea is that the compiler generates the following code:

s4 = new StringBuilder(String.valueOf(s2)).append(s3).toString()


So first conclusion: when concatenating static strings it is totally safe to use the "+" operator.




StringBuilder and StringBuffer

When we use dynamic strings the compiler cannot precalculate the concatenating result for us, instead, as illustrated in the bytecode sample above, it uses StringBuilder. This is not that bad if we only do the operation once or twice but if we loop again and again over such code it will have dramatic affect on performance, think of the following example:



String result = "";
for (int t=0; t<10000; ++t ) {
 result = result + getSomeString();
}




If we generalize the bytecode sample we have seen before the compiler will generate something similar to that:


String result = "";
for (int t=0; t<10000; ++t ) {
 result = new StringBuilder(String.valueOf(result)).append(getSomeString()).toString();
}


Obviously this is not the most efficient way to get our task done. The code generated by the compiler instantiates too many StringBuilder objects, invokes too many methods, and instantiates too many String objects. If I wrote the code myself I could have done it much more efficient:



StringBuilder sb = new StringBuilder(); // If we know the final size the builder will need we can
 // pre-allocate the buffer using new StringBuilder(int)
for (int t=0; t<10000; ++t ) {
 sb.append(getSomeString());
}
String result = sb.toString();


Second conclusion: dynamic strings concatenating should be done 'manually' (explicit use of the StringBuilder/StringBuffer) and not by the compiler (the + operator).



So what is the difference between StringBuilder and StringBuffer? The difference is that
StringBuffer is a synchronized class, all of its methods are synchronized and as such it should be used in a multithreaded environment (when more than one thread access the same StringBuffer instance). Usually strings concatenating is done by a single thread - in that scenario the StringBuilder should be used.

Some More StringBuilder/StringBuffer Tips
  • Pre-allocate - if we know in advanced the final size of our buffer (or we have some good estimate) we can pre-allocate the internal buffer in construction (like in new StringBuilder(128)). By default string builder and buffer use a 16 character buffer which needs to be reallocated when we exceed this capacity. When the buffer is expanded the new size will be (current size + 1)*2 (so 16, 34, 70, ...). By the way: the constructors that get String or CharSequence pre-allocate the internal buffer as the length of the argument + 16 characters.
  • When appending characters there is no need to convert them into Strings: buff.append('a') is more efficient than buff.append("a"). The difference here is minor but it might be important when performing the operation in a loop. 
  • Similar to the one above: the buffers accept all kind of primitive types (boolean, char, int, long, float, double). So buff.append(5) and not buff.append(5+"")

There is a Third Way - String.concat()

The java.lang.String concat() method is another way to concat strings. This method should be pretty efficient when concatenating a small number of strings (I usually use it when concatenating two strings. For more than two I use the StringBuilder). The concat() method builds a char buffer in the exact size of the destination string, fills the buffer from the two original strings' underlying buffers (using System.arraycopy() which is considered to be a very efficient method) and returns a new string based on the newly allocated buffer. Here is the method code (taken from JDK 1.6, the comments are mine):


public String concat(String str) {
 int otherLen = str.length();
 if (otherLen == 0) {
 return this;
 }
 char buf[] = new char[count + otherLen];
 getChars(0, count, buf, 0); // Uses system.arraycopy()
 str.getChars(0, otherLen, buf, count); // Uses system.arraycopy()
  return new String(0, count + otherLen, buf);
}


Using this method to concate a small number of strings this seems to have the best trade off in
coding and performance aspects:

  • Coding: using the concat() method the code  is less verbose and more intuitive than a code using the StringBuilder. 
  • Performance: the number if temporary objects and buffers allocated by the concat()
    method is less than  the number of buffers and objects allocated by the compiler when it performs the + operator.




A Benchmark

To complete this post I did a short benchmark in which I perform 100,000 strings concatenating (so I am comparing here the case of multiple strings concatenating) for each of the concatenating methods illustrated above (I assumed a single threaded model so didn't include the StringBuffer in the benchmark). I wraped the tests in a loop which executes them 20 times. Here is the code (I removed the timing statement to make it clearer) and the results are below:



for (int x=0; x<20; ++x) {
 // Using the + operator
 String s3 = "";
 for (int t=0; t<100000; ++t) {
  s3 = s3 + s2;
 }

 // Using String.concat()
 s3 = "";
 for (int t=0; t<100000; ++t) {
  s3 = s3.concat(s2);
 }

 // Using StringBuilder
 s3 = "";
 StringBuilder sb = new StringBuilder(s3);
 for (int t=0; t<100000; ++t) {
  sb.append(s2);
 }
 s3 = sb.toString();

}


Benchmark Results



By looking at the results we can see that performing mass strings concatenations the StringBuilder has a huge performance advantage. We can also see the huge different between the + operator and the String.concat() method.

7 comments:

Anonymous October 15, 2010 at 1:10 AM  

I lately came during your site and happen to be understanding along. I assumed I'd personally go away my preliminary comment. I seriously don't know what to say except that We've cherished examining. Respectable internet web site. I am heading to preserve visiting this weblog truly frequently.

windows 7 November 19, 2010 at 9:45 PM  

Absolutely brilliant post guys, been following your blog for 3 days now and i should say i am starting to like your post. and now how do i subscribe to your blog?

Anonymous August 22, 2011 at 5:10 AM  

The BEST !!!!!

Anonymous November 18, 2012 at 7:23 AM  

while doing,

(a) String s2 = "aa";
(b) String s3 = "bb";
(c) String s4 = s2 + s3;

is converted to :
s4 = new StringBuilder(String.valueOf(s2)).append(s3).toString()

In your byte code line 11 shows the use of valueOf(.

But when ran same program i did not see any valueOf() in byte code.

i.e it is coverting it to something like:

str1 = new StringBuilder().append(str1).append(str2).toString();

I am also using java se 6.
Why so?

Eyal Lupu November 24, 2012 at 1:27 PM  

Hi,
I have no explanation for that – I just tried your code (and this time actually with Java 7) and I do see the valueOf in the bytecode.

Lisa Edward May 14, 2013 at 12:24 AM  

This is definitely one of the best articles I have read in this website! Thanks Mate.
java training in Chennai

best dissertation services May 21, 2013 at 10:16 PM  
This comment has been removed by a blog administrator.

  © Blogger templates Sunset by Ourblogtemplates.com 2008

Back to TOP