Thinking logic of computer programs (29) and thinking 29
This section describes the Character encapsulation class Character. String operations are probably the most common operations in computer programs. Java indicates that the String class is String. This section describes String in detail.
The basic usage of strings is relatively simple and straightforward. Let's take a look.
Basic usage
You can use constants to define String variables.
String name = "programming ";
You can also use new to create a String.
String name = new String ("programming ");
You can directly use the plus (+) and minus (+ =) operators for the String, for example:
String name = "lauma"; name + = "programming"; String descritpion = ", exploring the nature of programming"; System. out. println (name + descritpion );
Output: programming, exploring the nature of Programming
The String class contains many methods to facilitate String operations.
Determines whether the string is null.
public boolean isEmpty()
Returns the string length.
public int length()
Substring
public String substring(int beginIndex)public String substring(int beginIndex, int endIndex)
Searches for characters or substrings in a string, returns the first index location found, and returns-1 if not found.
public int indexOf(int ch)public int indexOf(String str)
Search for characters or substrings from the back, and return the first index position from the back. If no index is found,-1 is returned.
public int lastIndexOf(int ch)public int lastIndexOf(String str)
Determines whether a string contains a specified character sequence. Review, CharSequence is an interface, and String also implements CharSequence
public boolean contains(CharSequence s)
Determines whether a string starts with a substring.
public boolean startsWith(String prefix)
Determines whether a string ends with a substring.
public boolean endsWith(String suffix)
Compare with other strings to see if the content is the same
public boolean equals(Object anObject)
Case Insensitive. Compare with other strings to see if the content is the same
public boolean equalsIgnoreCase(String anotherString)
String also implements the Comparable interface to compare the String size.
public int compareTo(String anotherString)
The case sensitivity can be ignored for size comparison.
public int compareToIgnoreCase(String str)
All characters are converted to uppercase characters. The new string is returned, and the original string remains unchanged.
public String toUpperCase()
All characters are converted to lowercase characters. The new string is returned, and the original string remains unchanged.
public String toLowerCase()
Returns the string after the current string and the parameter string are merged. The original string remains unchanged.
public String concat(String str)
String replacement, replacing a single character, returns a new string, the original string remains unchanged
public String replace(char oldChar, char newChar)
String replacement, replacing the Character Sequence, returns a new string, the original string remains unchanged
public String replace(CharSequence target, CharSequence replacement)
Removes spaces at the beginning and end, and returns a new string. The original string remains unchanged.
public String trim()
Returns an array of substrings that are separated. The original string remains unchanged.
public String[] split(String regex)
For example, separate "hello, world" by commas ":
String str = "hello,world";String[] arr = str.split(",");
Arr [0] is "hello", and arr [1] is "world ".
The caller understands the basic usage of String. Next we will further understand the internal usage of String.
Enter the String
Encapsulate character Arrays
The String class uses an array of characters to represent a String. The instance variable is defined:
private final char value[];
String has two constructor methods. You can create a String based on the char array.
public String(char value[])public String(char value[], int offset, int count)
It should be noted that the String will create a new array based on the parameter and copy the content without directly using the character array in the parameter.
Most of the methods in String are also the operational character array. For example:
- The length () method returns the length of this array.
- The substring () method is used to call the constructor String (char value [], int offset, int count) based on parameters to create a new String.
- IndexOf searches for characters or substrings in this array.
Most of these methods are implemented directly, so we will not go into details.
There are some methods in String, which are related to the char array:
Returns the char at the specified index position.
public char charAt(int index)
Returns the char array corresponding to the string.
public char[] toCharArray()
Note that the returned result is a copied array instead of the original array.
Copy the specified range of Characters in the char array to the specified position in the target array.
public void getChars(int srcBegin, int srcEnd, char dst[], int dstBegin)
Process characters by Code Point
Similar to Character, the String class also provides some methods to process strings by Code Point.
public int codePointAt(int index)public int codePointBefore(int index)public int codePointCount(int beginIndex, int endIndex)public int offsetByCodePoints(int index, int codePointOffset)
These methods are very similar to what we have introduced in the analysis of Character, so we will not go into detail in this section.
Encoding conversion
Inside the String is to process characters by UTF-16BE, for BMP characters, use a char, two bytes, for supplementary characters, use two char, four bytes. We have introduced various encodings in Section 6. Different encodings may be used for different character sets, which use different numbers of bytes and different binary representation. How can we deal with these different encodings? How do these encodings and Java internal representations convert each other?
Java uses the Charset class to indicate various encodings. It has two common static methods:
public static Charset defaultCharset()public static Charset forName(String charsetName)
The first method returns the default encoding of the system. For example, on my computer, run the following statement:
System.out.println(Charset.defaultCharset().name());
Output as UTF-8
The second method returns the Charset object of the given encoding name, which corresponds to the encoding we introduced in Section 6. Its charset name can be: US-ASCII, ISO-8859-1, windows-1252, GB2312, GBK, GB18030, Big5, UTF-8, such:
Charset charset = Charset.forName("GB18030");
The String class provides the following method to return the String representation in bytes of the given encoding:
public byte[] getBytes() public byte[] getBytes(String charsetName)public byte[] getBytes(Charset charset)
The first method does not have an encoding parameter. It uses the system default encoding. The second method parameter is the encoding name, and the third method is Charset.
The String class has the following constructor: You can create a String based on the byte and encoding. That is to say, an internal representation of Java is created based on the byte representation of the given encoding.
public String(byte bytes[])public String(byte bytes[], int offset, int length)public String(byte bytes[], int offset, int length, String charsetName)public String(byte bytes[], int offset, int length, Charset charset)public String(byte bytes[], String charsetName)public String(byte bytes[], Charset charset)
In addition to encoding and conversion by using the String method, the Charset class also has some methods for encoding/decoding, which will not be described in this section. It is important to realize that the internal representation of Java is different from various encodings, but they can be converted to each other.
Immutable
Similar to the packaging class, the String class is also an immutable class, that is, once an object is created, there is no way to modify it. The String class is also declared as final and cannot be inherited. The internal char array value is also final and cannot be changed after initialization.
Many seemingly modified methods are provided in the String class. They are actually implemented by creating a new String object. The original String object will not be modified. For example, let's look at the code of the concat () method:
public String concat(String str) { int otherLen = str.length(); if (otherLen == 0) { return this; } int len = value.length; char buf[] = Arrays.copyOf(value, len + otherLen); str.getChars(buf, len); return new String(buf, true);}
A new character array is created using the Arrays. copyOf method, the original content is copied, and a new String is created using the new method. For more information about the Arrays class, see the subsequent sections.
Similar to the packaging class, the program is defined as an immutable class, which is simpler, safer, and easier to understand. However, if you frequently modify strings and create a new string for each modification, the performance is too low. In this case, consider the other two classes of StringBuilder and StringBuffer in Java. We will introduce them in the next section.
Constant string
The String constants in Java are very special. Apart from being directly assigned to the String variable, they can directly call various methods of String just like a String object. Let's look at the Code:
System. out. println ("programming ". length (); System. out. println ("programming ". contains ("Old Horse"); System. out. println ("programming ". indexOf ("programming "));
In fact, these constants are String objects. In memory, they are placed in a shared place, which is called the String constant pool, which stores all constant strings, each constant only saves one copy and is shared by all users. When using a String in the form of constants, the corresponding String type object in the constant pool is used.
For example, let's look at the Code:
String name1 = ""; String name2 = ""; System. out. println (name1 = name2 );
The output is true. Why? We can think that there is a corresponding String type object in the constant pool of "programming by Ma ma". We assume the name is laoma, and the above Code is actually similar:
String laoma = new String (new char [] {'old', 'map', '', 'string'}); String name1 = laoma; string name2 = laoma; System. out. println (name1 = name2 );
In fact, there is only one String object. All three variables point to this object, and name1 = name2 is self-evident.
It should be noted that, if the value is not directly assigned through a constant but created through new, = will not return true. See the following code:
String name1 = new String (""); String name2 = new String (""); System. out. println (name1 = name2 );
The output is false. Why? The above code is similar:
String laoma = new String (new char [] {'old', 'map', '', 'stop '}); string name1 = new String (laoma); String name2 = new String (laoma); System. out. println (name1 = name2 );
The constructor code using String as the parameter in the String class is as follows:
public String(String original) { this.value = original.value; this.hash = original.hash;}
Hash is another instance variable in the String class, indicating the cached hashCode value. We will introduce it later.
It can be seen that name1 and name2 point to two different String objects, but the internal values of these two objects point to the same char array. The memory layout is roughly as follows:
Therefore, name1 = name2 is not valid, but name1.equals (name2) is true.
HashCode
We just mentioned the hash instance variable. Its definition is as follows:
private int hash; // Default to 0
It caches the value of the hashCode () method. That is to say, when hashCode () is called for the first time, the result will be saved in the hash variable, and the saved value will be directly returned if it is called later.
Let's take a look at the hashCode method of the String class. The Code is as follows:
public int hashCode() { int h = hash; if (h == 0 && value.length > 0) { char val[] = value; for (int i = 0; i < value.length; i++) { h = 31 * h + val[i]; } hash = h; } return h;}
If the cached hash value is not 0, it is returned directly. Otherwise, the hash value is calculated based on the content in the character array. The calculation method is as follows:
S [0] * 31 ^ (n-1) + s [1] * 31 ^ (n-2) +... + s [n-1]
S indicates the string, s [0] indicates the first character, n indicates the string length, s [0] * 31 ^ (n-1) the power of N-1 in 31 is multiplied by the value of the first character.
Why is this computing method used? In this formula, the hash value is related to the value of each character. Each position is multiplied by a different value. The hash value is also related to the position of each character. 31 is probably used for two reasons. On the one hand, a more scattered hash can be generated, that is, different string hash values are also generally different, and on the other hand, the computing efficiency is relatively high, 31 * h is equivalent to 32 * h-h, that is, (h <5)-h, which can be replaced by multiplication operations with more efficient shift and subtraction operations.
In Java, the above ideas are generally used to implement hashCode.
Regular Expression
Some methods in the String class accept regular expressions instead of common String parameters. What is a regular expression? It can be understood as a string, but it expresses a rule, which is generally used for text matching, search, replacement, and so on. Regular expressions have rich and powerful functions, this is a huge topic. We will introduce it separately in subsequent chapters.
Java has specialized classes such as Pattern and Matcher for regular expressions. However, in simple cases, the String class provides more concise operations. The methods used to accept regular expressions in String are:
Separator string
public String[] split(String regex)
Check for matching
public boolean matches(String regex)
String replacement
public String replaceFirst(String regex, String replacement)public String replaceAll(String regex, String replacement)
Summary
This section describes the String class, its basic usage, internal implementation, encoding conversion, and analysis of its immutable, constant String, and hashCode implementation.
In this section, we mentioned that the String class efficiency is relatively low in frequent String modification operations, and we mentioned the StringBuilder and StringBuffer classes. We can also see that strings can be operated directly using ++ and ++, and they are also behind the StringBuilder class.
Let's take a look at these two classes in the next section.
----------------
For more information, see the latest article. Please pay attention to the Public Account "lauma says programming" (scan the QR code below), from entry to advanced, ma and you explore the essence of Java programming and computer technology. Write with your heart, original articles, and retain all copyrights.
-----------
Original Articles with high praise
Thinking logic of computer programs (6)-How to recover from garbled code (I )?
Thinking logic of computer programs (7)-How to recover from garbled characters (below )?
Thinking logic of computer programs (8)-true meaning of char
Thinking logic of computer programs (28)-analytical packaging (II)