Responsive Ads Here

Monday, January 27, 2020

Overloading '+' vs. StringBuilder | String in java | String is immutable in java


String manipulation is one of the most common activities in computer programming. So in this blog we will talk about String manipulation , String concatenation, Operator overloading, StringBuilder. 



StringBuilder, String immutable, String Concatenation, Operator Overloading


Immutable Strings

Objects of the String class are immutable. Every method of String class that appears to modify a String actually creates and returns a brand new String object containing the modification. The original String is left untouched.

Let's see a example 
package blog;

public class Immutable {
 public static String uperCase(String str) {
  return str.toUpperCase();
 }

 public static void main(String args[]) {
  String original = "santosh";
  System.out.println(original);
  String modified = uperCase(original);
  System.out.println(modified);
  System.out.println(original);
 }
}
Output
santosh
SANTOSH
santosh


When original is passed to userCase() it's actually a copy of the reference to original. The Object this reference is connected to stays in a single physical location. The reference are copied as they are passed around.

Looking at the definition for upserCase(), you can see that the reference that's passed in has name str, and it exists for only as long as the body of uperCase() is being executed.

When userCase() completes, the local reference s vanishes. uperCase() returns the result, which is the original string with all the characters set to uppercase.

It's actually returns a reference to the result. But the reference that it returns is of a new object, and the original String is left alone unchanged.

Overloading '+' vs. StringBuilder

Since String objects are immutable, you can alias to a particular String as many times as you want. Because a String is read-only, there is no possibility that one reference will change something that will affect the other references.

Immutability can have efficiency issues. A case in point is the operator '+' that has been overloaded for String objects. 

Overloading means that an operation has been given an extra meaning when used with a particular class.

The '+' and '+=' for String are the only operators that are overloaded in java, and java does not allow the programmer to overload any others.

Note : C++ allows the programmer to overload operators at will. because this can often be a complicated process. That's why java designers think that shouldn't be included in Java.


The '+' operator allows you to concatenate Strings :

package blog;

public class Concatenation {
 public static void main(String args[]) {
  String apple = "apple ";
  String str = "papaya " + apple + "etc " + 40;
  System.out.println(str);
 }
}

Output
papaya apple etc 40

You could imagine how this might work. The String "papaya" could have a method append() that creates a new String object containing "papaya" concatenated with the contents of apple. The new String object would then create another new String that added "etc", and so on.

This would certainly work, but it requires the creation of a lot of String objects just to put together this new String, and then you have a bunch of intermediate String objects that need to be garbage collected. I suspect that the java designers tried this approach first. I also suspect that they discovered it delivered unacceptable performance.

To see what really happens, you can decompile the above code using the javap tool that comes as part of the JDK. Here's the command line :  javap -c Concatenation

The -c flag will produce the JVM byte codes. After we strip out the parts we are not interested in and do a bit of editing, here are the relevant byte codes.


public class Concatenation {

  public static void main(java.lang.String[]);
    Code:
       0: ldc           #2                  // String apple
       2: astore_1
       3: new           #3                  // class java/lang/StringBuilder
       6: dup
       7: invokespecial #4                  // Method java/lang/StringBuilder."<init>":()V
      10: ldc           #5                  // String papaya
      12: invokevirtual #6                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      15: aload_1
      16: invokevirtual #6                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      19: ldc           #7                  // String etc
      21: invokevirtual #6                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      24: bipush        40
      26: invokevirtual #8                  // Method java/lang/StringBuilder.append:(I)Ljava/lang/StringBuilder;
      29: invokevirtual #9                  // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
      32: astore_2
      33: getstatic     #10                 // Field java/lang/System.out:Ljava/io/PrintStream;
      36: aload_2
      37: invokevirtual #11                 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      40: return
}


If you have experience with assembly language, this may look familiar to you- Statement like dup and invokevirtual are the Java Virtual Machine (JVM) equivalent of assembly language. If you have never seen assembly language, don't worry about it- The important part to notice is the introduction of java.lang.StringBuilder class by the compiler. There was no mention of StringBuilder in the source code, but the compiler decided to use it anyway, because it is much more efficient.

In this case, the compiler creates a StringBuilder object to build the String str, and call append() four times, one for each of the pieces. Finally, it calls toString() to produce the result, which it stores as str.


Before you assume that you should just use String everywhere and that the compiler will make everything efficient, let's look a little more closely at what the compiler is doing. 

Here is an example that produces a String result in two ways: using String, and by hand-coding with StringBuilder:
package blog;

public class WithStringBuilder {
	public String implicit(String fields[]) {
		String result = "";
		for (int i = 0; i < fields.length; i++) {
			result += fields[i];
		}
		return result;
	}

	public String explicit(String fields[]) {
		StringBuilder result = new StringBuilder();
		for (int i = 0; i < fields.length; i++) {
			result.append(fields[i]);
		}
		return result.toString();
	}
}

Now if you run javap -c WithStringBuilder
You can see the simplified code for the two different methods. First, implicit()
public java.lang.String implicit(java.lang.String[]);
    Code:
       0: ldc           #2                  // String
       2: astore_2
       3: iconst_0
       4: istore_3
       5: iload_3
       6: aload_1
       7: arraylength
       8: if_icmpge     38
      11: new           #3                  // class java/lang/StringBuilder
      14: dup
      15: invokespecial #4                  // Method java/lang/StringBuilder."<init>":()V
      18: aload_2
      19: invokevirtual #5                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      22: aload_1
      23: iload_3
      24: aaload
      25: invokevirtual #5                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      28: invokevirtual #6          // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
      31: astore_2
      32: iinc          3, 1
      35: goto          5
      38: aload_2
      39: areturn


Notice 8: and 35:, which together form a loop. 8: does an "integer compare greater than or equal to" of the operands on the stack and jump to 38: when the loop is done. 35: is a goto back to the beginning of the loop, at the 5:

The important thing to note is that the StringBuilder construction happens inside this loop, which means you are going to get a new StringBuilder object every time you pass through the loop.

Here are the byte codes for explicit();

public java.lang.String explicit(java.lang.String[]);
    Code:
       0: new           #3                  // class java/lang/StringBuilder
       3: dup
       4: invokespecial #4                  // Method java/lang/StringBuilder."<init>":()V
       7: astore_2
       8: iconst_0
       9: istore_3
      10: iload_3
      11: aload_1
      12: arraylength
      13: if_icmpge     30
      16: aload_2
      17: aload_1
      18: iload_3
      19: aaload
      20: invokevirtual #5                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      23: pop
      24: iinc          3, 1
      27: goto          10
      30: aload_2
      31: invokevirtual #6     // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
      34: areturn


Not only is the loop code shorter and simpler, the method only creates a single StringBuilder object. Creating an explicit StringBuilder also allows you to preallocate its size if you have extra information about how big it might need to be, so that it does not need to constantly reallocate the buffer.


Thus, when you create a toString() method, if the operations are simple ones that the compiler can figure out on its own, you can generally rely on the compiler to build the result in a reasonable fashion. But if looping is involved, you should explicitly use a StringBuilder in your code and than convert it to String using toString().

StringBuilder was introduced in Java SE5. Prior to this, Java used StringBuffer, which ensure thread safety and so was significantly more expensive.


Stay tuned to know more on String in java


You can also read below topic to prepare yourself for interviews.

No comments:

Post a Comment