As programmers, we often have to add or concatenate different strings. This string concatenation is an essential topic in Java. This is because if the string concatenation is not done efficiently, it will affect the program’s performance.
To understand the effect of this performance, let’s discuss a problem with LeetCode. The problem is Step By Step Directions From A Binary Tree Node to Another.
In this problem, a binary tree is given. After giving the source and destination, we are asked to return the shortest path from the source to the destination.
Since only the root of the tree was given, my approach was first to take the edge out of the tree and bring the info of the traditional graph. Then keep track of the path while figuring out the shortest path by implementing BFS. At the end of all, find the path and return it. Simple, but I got Time Limit Exceeded (or TLE. This verdict is given when someone got failed to pass the constrained runtime of the problem) after submitting! Why! Missed something in Constraint? No, not at all. The algorithm is straightforward BFS, so the complexity is also OK. So, maybe, there is a problem with the code, and I started looking at the code again, and then the problem came out.
While concatenating the path, I was concatenating regular strings with the + operator. What’s wrong with the + operator? That is what I will try to write now.
In Java, strings are immutable. This means that a string object cannot be changed. A new string object will be created, whatever change we do to that string.
In the above problem, since I was concatenating the directions of edges with the + operator, a new string object was created for each concatenation. During object creation through string concatenation, new memories (string objects and byte arrays) are taken from the heap, the characters of the previous string are copied, and characters of the new string are added. When you have to concat many strings through a loop, using + operator is not a good approach at all, because every concatenation has these tasks. As a result, my solution got TLE.
In the case of StringBuilder, new objects are not created during each concatenation. StringBuilder first takes a memory of a certain capacity. If you need to append a new string, it dynamically creates a new byte array with the required memory, copies the contents of the StringBuilder there, then copies the contents of the new string. The process is such that these tasks aren’t done too many times. For example, if the initial capacity is 16, then to become a StringBuilder’s size 10^6, a new byte array creation and copying there can be happened at most 16 times (according to Oracle Java-11).
So it turns out that, with the + operator, for each concatenation, a byte array is created with a new size, contents are copied, and a new string object is created. But in the case of StringBuilder, these tasks are only on demand. So when many strings are needed together, StringBuilder is a better option. For this reason, my solution got accepted after using StringBuilder.
Showing some numbers to understand how much performance has been achieved using StringBuilder instead of the + operator. I have seen the accepted solution of 1.431 seconds to this problem. So I am assuming that this problem’s time limit is at least 1.5 seconds. This means that the solution of the string concatenation with the + operator could not finish the execution in 1.5 seconds. Since actual data is unavailable from Leetcode, I am assuming it finished in 1.5 seconds (reduced, it should take more). StringBuilder’s solution took 0.461 seconds. That means, a 67.93% performance gain! Actually, the performance gain is much better than this. As Leetcode doesn’t give us info about the actual time taken by a TLE solution, we assumed a time that is much fewer than the actual.
Now let’s get some actual values. Let’s concat some random strings with the + operator, then concat those same strings with other ways and compare them using Java Microbenchmark Harness or JMH. I will compare by concatenating 100 thousand strings of 10 lengths. Let me first declare two variables:
private String getRandomString(Random random) {
var leftLimit = 48; // numeral '0'
var rightLimit = 122; // letter 'z'
var targetStringLength = 10;
return random.ints(leftLimit, rightLimit + 1)
.filter(i -> (i <= 57 || i >= 65) && (i <= 90 || i >= 97))
.limit(targetStringLength)
.collect(StringBuilder::new, StringBuilder::appendCodePoint,
StringBuilder::append)
.toString();
}
Now writing a method to concat the string with the + operator
private int concatWithPlusOperator(List list) {
var string = "";
for (var i = 0;i
Let’s write the method of concat strings with StringBuilder:
private int concatWithStringBuilder(List list) {
var stringBuilder = new StringBuilder();
for (var i = 0;i
Comparing two more ways to concat String. Here the first one is StringJoiner (introduced in Java 8):
private int concatWithStringJoiner() {
var stringJoiner = new StringJoiner(",");
for (var string : randomStringList)
stringJoiner.add(string);
return stringJoiner.toString().length();
}
And the second one is String.join() (also introduced in Java 8):
private int concatWithStringJoin() {
var string = String.join(",", randomStringList);
return string.length();
}
Now here is a static block to store the generated random strings in the list:
static {
var random = new Random();
for (var i = 0; i < TOTAL_STRING; i++) {
randomStringList.add(getRandomString(random));
}
}
Finally, writing the benchmark method for concatenation with the + operator:
@Fork(value = 10, warmups = 5)
@Benchmark
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@BenchmarkMode(Mode.AverageTime)
public void doBenchMarkOfPlusOperator() {
int len = concatWithPlusOperator(randomStringList);
}
and the benchmark method for concatenation with StringBuilder:
@Fork(value = 10, warmups = 5)
@Benchmark
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@BenchmarkMode(Mode.AverageTime)
public void doBenchMarkOfStringBuilder() {
int len = concatWithStringBuilder(randomStringList);
}
Benchmark method for StringJoiner:
@Fork(value = 10, warmups = 5)
@Benchmark
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@BenchmarkMode(Mode.AverageTime)
public void doBenchMarkOfStringJoiner() {
int len = concatWithStringJoiner();
}
And the last one is the benchmark method for String.join():
@Fork(value = 10, warmups = 5)
@Benchmark
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@BenchmarkMode(Mode.AverageTime)
public void doBenchMarkOfStringJoin() {
int len = concatWithStringJoin();
}
Let’s see the Benchmark results:
Benchmark Mode Cnt Score Error Units
BenchmarkRunner.doBenchMarkOfPlusOperator avgt 50 4970.270 ± 68.968 ms/op
BenchmarkRunner.doBenchMarkOfStringBuilder avgt 50 1.454 ± 0.023 ms/op
BenchmarkRunner.doBenchMarkOfStringJoin avgt 50 1.987 ± 0.017 ms/op
BenchmarkRunner.doBenchMarkOfStringJoiner avgt 50 1.961 ± 0.039 ms/op
See, it takes 1.454, 1.987 and 1.961 milliseconds on average with StringBuilder, String.join() and StringJoiner respectively, whereas to do the same concatenation of 100 thousand strings of 10 lengths, the + operator is taking 4970.270 milliseconds on average! How inefficient the plus operator way is to contact String objects!
You will get the full code here: github-repo
By the way, none of the StringBuilder, StringJoiner, and String.join() is thread-safe. If thread safety is concerned, StringBuffer should be used. Since it is thread-safe, StringBuffer is slower than StringBuilder.
StringJoiner and String.join() are useful to have a concise code when we need to make a string separated by a specific delimiter, from a string collection.