Inside the string class, a character array is used to represent the string, and the instance variable is defined as:
Private final char value[];
String has two constructor methods that can create a string based on a char array
Public String (Char value[])
Public String (char value[], int offset, int count)
It is necessary to note that the string creates a new array based on the arguments and copies the contents without directly using the character array in the argument.
Most of the methods in string, inside are also the character array of the operation. For example:
The length () method returns the lengths of the array
The substring () method is to invoke the constructor method string (char value[], int offset, int count) to create a new string based on the argument
IndexOf when a character or substring is found is checked in this array.
The implementation of these methods is mostly direct, we will not repeat.
There are also methods in string that are related to this char array:
Returns the char at the specified index position
public char charAt (int index)
Returns a char array corresponding to a string
Public char[] ToCharArray ()
Notice that a copy of the array is returned, not the original array.
Copies the characters of a specified range in a char array into the destination array at the specified position
public void GetChars (int srcbegin, int srcend, char dst[], int dstbegin)
Handling characters by Code point
Similar to character, the string class also provides methods for handling strings at code point.
public int codepointat (int index)
public int codepointbefore (int index)
public int Codepointcount (int beginindex, int endIndex)
public int offsetbycodepoints (int index, int codepointoffset)
Encoding Conversion
Inside a string is a utf-16be processing character, a BMP character, using a char, two bytes, and two char, four bytes for the supplementary character. We have introduced the various encodings in the sixth section, which may be used for different character sets, using different byte numbers, and different binary representations. How do you deal with these different encodings? How do these encodings translate to each other in the Java internal representation?
Java uses the CharSet class to represent a variety of encodings, which have two common static methods:
public static Charset Defaultcharset ()
public static Charset forname (String charsetname)
The first method returns the default encoding for the system, for example, on my computer, executing the following statement:
System.out.println (Charset.defaultcharset (). name ());
Output is UTF-8
The second method returns the CharSet object for the given encoded name, whose charset name can be: Us-ascii, iso-8859-1, windows-1252, GB2312, GBK, GB18030, Big5, UTF-8, for example:
Charset Charset = Charset.forname ("GB18030");
The string class provides the following method, which returns a string that is represented by a given encoded byte:
Public byte[] GetBytes ()
Public byte[] GetBytes (String charsetname)
Public byte[] GetBytes (Charset Charset)
The first method has no encoding parameters, uses the system default encoding, the second method parameter is the encoded name, and the third is charset.
The string class has the following construction method, which allows you to create a string based on bytes and encodings, that is, to create an internal representation of Java based on the byte representation of a given encoding.
Public String (byte bytes[])
Public String (byte bytes[], int offset, int length)
Public String (byte bytes[], int offset, int length, string charsetname)
Public String (byte bytes[], int offset, int length, Charset Charset)
public string (byte bytes[], string charsetname)
Public String (Byte bytes[], Charset Charset)
In addition to the encoding conversion through the methods in string, there are some methods for encoding/decoding in the CharSet class, which is not covered in this section. It is important to realize that the internal representations of Java are different from the various encodings, but can be converted to each other.
Non-denaturing
Like the wrapper class, the String class is also immutable, meaning that once an object is created, there is no way to modify it. The string class is also declared for final, cannot be inherited, and the internal Char array value is final, and cannot be changed after initialization.
Many of the seemingly modified methods are provided in the string class, actually by creating a new string object, and the original string object is not modified. For example, let's look at the code for the Concat () method:
public string concat (String str) {
int otherlen = Str.length ();
if (Otherlen = = 0) {
return this;
}
int len = value.length;
Char buf[] = arrays.copyof (value, Len + Otherlen);
Str.getchars (buf, Len);
return new String (buf, true);
}
A new character array was created by the Arrays.copyof method, the original content was copied, and a new string was created by new. For the arrays class, we'll cover it in more detail in subsequent chapters.
Similar to wrapper classes, defined as immutable classes, programs can be simpler, more secure, and easier to understand. However, if you modify the string frequently and create a new string for each modification, the performance is too low, you should consider the other two classes StringBuilder and StringBuffer in Java, which we'll cover in the next section.
Constant string
String constants in Java are very special, except that they can be directly assigned to a string variable, and it itself is like a string object, and can call the various methods of string directly. Let's look at the code:
System.out.println ("Lao Ma says programming". Length ());
System.out.println ("Lao Ma says Programming". Contains ("Old Horse"));
System.out.println ("Lao Ma says programming". INDEXOF ("Programming"));
In fact, these constants are types of string objects, in memory, they are placed in a shared place, this place is called the string constant pool, it holds all the constant string, each constant will only save one copy, is shared by all users. When a string is used in the form of a constant, the corresponding object of type string in the constant pool is used.
For example, let's look at the code:
String name1 = "Lao Ma says programming";
String name2 = "Lao Ma says programming";
System.out.println (name1==name2);
The output is true, why? It can be thought that "Lao Ma says programming" has a corresponding string type object in the constant pool, we assume the name is Laoma, the above code is actually similar to:
String laoma = new string (New char[]{' old ', ' horse ', ' say ', ' edit ', ' process '});
String name1 = Laoma;
String name2 = Laoma;
System.out.println (name1==name2);
There is actually only one string object, and three variables point to this object, and name1==name2 is self-evident.
It is important to note that if you do not assign a value directly through a constant, but instead create it through new, = = does not return true, see the following code:
String name1 = new String ("Lao Ma says programming");
String name2 = new String ("Lao Ma says programming");
System.out.println (name1==name2);
The output is false, why? The above code looks like this:
String laoma = new string (New char[]{' old ', ' horse ', ' say ', ' edit ', ' process '});
String name1 = new string (Laoma);
String name2 = new string (Laoma);
System.out.println (name1==name2);
The constructor code for string arguments in the string class is as follows:
public string (string original) {
This.value = Original.value;
This.hash = Original.hash;
}
Hash is another instance variable in the String class that represents the cached Hashcode value, which we'll cover later.
As you can see, name1 and Name2 point to two different string objects, except that the value values inside the two objects point to the same char array. Its memory layout is probably as follows:
So, name1==name2 is not tenable, but Name1.equals (name2) is true.
Hashcode
We just mentioned hash as the instance variable, which is defined as follows:
private int hash; Default to 0
It caches the value of the Hashcode () method, that is, when the first call to Hashcode () is made, the result is stored in the hash variable, and the saved value is returned directly after the call.
Let's look at the Hashcode method of the string class with the following code:
public int hashcode () {
int h = hash;
if (h = = 0 && value.length > 0) {
Char val[] = value;
for (int i = 0; i < value.length; i++) {
H = * H + val[i];
}
hash = h;
}
return h;
}
If the cached hash is not 0, it is returned directly, otherwise the hash is calculated based on the contents of the character array:
s[0]*31^ (n-1) + s[1]*31^ (n-2) + ... + s[n-1]
s represents a string, S[0] represents the first character, n represents the length of the string, and s[0]*31^ (N-1) represents the n-1 of 31 multiplied by the value of the first character.
Why use this method of calculation? In this equation, the hash value is related to the value of each character, and each position is multiplied by a different value, and the hash value is also related to the position of each character. The use of 31 is probably because of two reasons, on the one hand can produce more scattered hash, that is, different string hash value is also generally different, on the other hand, the computational efficiency is higher, 31*h and 32*h-h namely (H<<5)-h equivalent, you can use more efficient shift and subtraction operation instead of multiplication operation.
In Java, the general use of the above ideas to achieve hashcode.
Regular expressions
In the string class, there are methods that accept not ordinary string arguments, but regular expressions, and what are regular expressions? It can be understood as a string, but the expression is a rule, generally used for text matching, find, replace, and so on, regular expression has a rich and powerful function, is a relatively large topic, we will be introduced separately in the following chapters.
There are specialized classes such as pattern and matcher for regular expressions in Java, but for simple cases, the string class provides a more concise operation, and the methods for accepting regular expressions in string are:
Delimited string
Public string[] Split (String regex)
Check if matches
Public boolean matches (String regex)
String substitution
public string Replacefirst (string regex, string replacement)
public string ReplaceAll (string regex, string replacement)
Summary
We introduce the String class, introduce its basic usage, internal implementation, code conversion, analyze its immutability, constant string, and the implementation of Hashcode.
We mention that in the frequent string modification operations, the string class is less efficient, and we mention the StringBuilder and StringBuffer classes. We also see that the string can be manipulated directly using + and + =, and behind them is the StringBuilder class.
Java elaborate string