Thinking Logic of computer programs (29)-parsing string

Source: Internet
Author: User
Tags uppercase character

The previous section describes the wrapper class character for a single character, and this section describes the string class. String manipulation is probably the most common operation in a computer program, and the class that represents the string in Java is string, and this section describes string in detail.

The basic use of strings is simpler and more straightforward, let's take a look.

Basic usage

You can define a string variable by a constant

String name = "Old horse says programming";

You can also create a string from new

New String ("Lao Ma says programming");

String can use the + and + = operators directly, such as:

String name = "Old horse"; name+ = "Say programming" = "Explore the nature of programming"; SYSTEM.OUT.PRINTLN (Name

The output is: Lao Ma says programming, exploring the nature of programming

The string class includes a number of methods to facilitate the manipulation of strings.

Determines whether a string is empty

 Public boolean isEmpty ()

Get string length

 Public int Length ()

Take a substring

 Public String substring (int  beginindex) public string substring (intint

Finds a character or substring in a string, returns the first found index position, not found return-1

 Public int indexOf (int  ch)publicint indexOf (String str)

Look for a character or substring from the back, return the first index from the next number, no return-1

 Public int lastIndexOf (int  ch)publicint

Determines whether the string contains the specified sequence of characters. Recalling that Charsequence is an interface, string also implements the Charsequence

 Public boolean contains (charsequence s)

Determines whether a string begins with a given substring

 Public boolean startsWith (String prefix)

Determines whether a string ends with a given substring

 Public boolean endsWith (String suffix)

Compare with other strings to see if the content is the same

 Public boolean equals (Object anobject)

Ignore case, compare with other strings to see if content is the same

 Public boolean equalsignorecase (String anotherstring)

String also implements the comparable interface, which can compare string sizes

 Public int compareTo (String anotherstring)

You can also ignore case, make size comparisons

 Public int comparetoignorecase (String str)

Convert all characters to uppercase character, return new string, original string unchanged

 Public String toUpperCase ()

Convert all characters to lowercase character, return new string, original string unchanged

 Public String toLowerCase ()

String concatenated, returns the current string and the string after the argument string is merged, the original string is unchanged

 Public String concat (String str)

String substitution, replacing a single character, returning a new string, the original string unchanged

 Public String replace (charchar Newchar)

String substitution, replacing a sequence of characters, returning a new string, the original string unchanged

Public

Delete the opening and closing spaces, return the new string, the original string unchanged

Public

Separates strings, returns an array of delimited substrings, the original string unchanged

 Public String[] Split (String regex)

For example, separate "Hello,world" by commas:

String str = "Hello,world"= Str.split (",");

Arr[0] is "Hello", arr[1] is "world".

Understanding the basic usage of string from the caller's point of view, let's take a closer look at the inside of the string.

into string interior

Encapsulate character Array

Inside the string class, a character array is used to represent the string, and the instance variable is defined as:

Private Final Char value[];

String has two constructor methods that can create a string based on a char array

 Public String (char  value[]) public string (charintint count)

It is necessary to note that the string creates a new array based on the arguments and copies the contents without directly using the character array in the argument.

Most of the methods in string, inside are also the character array of the operation. For example:

    • The length () method returns the lengths of the array
    • The substring () method is to invoke the constructor method string (char value[], int offset, int count) to create a new string based on the argument
    • IndexOf find a character or substring is found in this array

The implementation of these methods is mostly direct, we will not repeat.

There are also methods in string that are related to this char array:

Returns the char at the specified index position

 Public Char charAt (int index)

Returns a char array corresponding to a string

 Public Char [] ToCharArray ()

Notice that a copy of the array is returned, not the original array.

Copies the characters of a specified range in a char array into the destination array at the specified position

 Public void getChars (intintchar int

Handling characters by Code point

Similar to character, the string class also provides methods for handling strings at code point.

 Public int codepointat (int  index)publicint codepointbefore (int  Index)publicint codepointcount (intint  endIndex)  publicint offsetbycodepoints (intint codepointoffset)

These methods are very similar to what we described in the Anatomy Character section, which we will not dwell on in this section.

Encoding Conversion

Inside a string is a utf-16be processing character, a BMP character, using a char, two bytes, and two char, four bytes for the supplementary character. We have introduced the various encodings in the sixth section, which may be used for different character sets, using different byte numbers, and different binary representations. How do you deal with these different encodings? How do these encodings translate to each other in the Java internal representation?

Java uses the CharSet class to represent a variety of encodings, which have two common static methods:

 Public Static Charset defaultcharset ()  Public Static

The first method returns the default encoding for the system, for example, on my computer, executing the following statement:

System.out.println (Charset.defaultcharset (). name ());

Output is UTF-8

The second method returns the CharSet object given the encoded name, corresponding to the encoding we introduced in section sixth, whose charset name can be: Us-ascii, iso-8859-1, windows-1252, GB2312, GBK, GB18030, Big5, UTF-8, such as:

Charset Charset = Charset.forname ("GB18030");

The string class provides the following method, which returns a string that is represented by a given encoded byte:

 Public byte [] getBytes ()    Public byte [] getBytes (String charsetname)  Public byte

The first method has no encoding parameters, uses the system default encoding, the second method parameter is the encoded name, and the third is charset.

The string class has the following construction method, which allows you to create a string based on bytes and encodings, that is, to create an internal representation of Java based on the byte representation of a given encoding.

 PublicString (bytebytes[]) PublicString (byteBytes[],intOffsetintlength) PublicString (byteBytes[],intOffsetintlength, String charsetname) PublicString (byteBytes[],intOffsetintlength, Charset Charset) PublicString (bytebytes[], String charsetname) PublicString (byteBytes[], Charset Charset)

In addition to the encoding conversion through the methods in string, there are some methods for encoding/decoding in the CharSet class, which is not covered in this section. It is important to realize that the internal representations of Java are different from the various encodings, but can be converted to each other.

Non-denaturing

Like the wrapper class, the String class is also immutable, meaning that once an object is created, there is no way to modify it. The string class is also declared for final, cannot be inherited, and the internal Char array value is final, and cannot be changed after initialization.

Many of the seemingly modified methods are provided in the string class, actually by creating a new string object, and the original string object is not modified. For example, let's look at the code for the Concat () method:

 Public string concat (String str) {    int otherlen = str.length ();     if (Otherlen = = 0)        {returnthis;    }     int len = value.length;     char buf[] = arrays.copyof (value, Len + otherlen);    Str.getchars (buf, Len);     return New true );}

A new character array was created by the Arrays.copyof method, the original content was copied, and a new string was created by new. For the arrays class, we'll cover it in more detail in subsequent chapters.

Similar to wrapper classes, defined as immutable classes, programs can be simpler, more secure, and easier to understand. However, if you modify the string frequently and create a new string for each modification, the performance is too low, you should consider the other two classes StringBuilder and StringBuffer in Java, which we'll cover in the next section.

Constant string

String constants in Java are very special, except that they can be directly assigned to a string variable, and it itself is like a string object, and can call the various methods of string directly . Let's look at the code:

System.out.println ("Lao Ma says programming". Length ()); System.out.println ("Lao Ma says Programming". Contains ("Old Horse")); System.out.println ("Lao Ma says programming". INDEXOF ("Programming"));

In fact, these constants are types of string objects, in memory, they are placed in a shared place, this place is called the string constant pool, it holds all the constant string, each constant will only save one copy, is shared by all users. when a string is used in the form of a constant, the corresponding object of type string in the constant pool is used.

For example, let's look at the code:

String name1 = "Lao ma says programming"= "old horse says programming"; System.out.println (name1==name2);

The output is true, why? It can be thought that "Lao Ma says programming" has a corresponding string type object in the constant pool, we assume the name is Laoma, the above code is actually similar to:

New String (newchar[]{' old ', ' horse ', ' say ', ' edit ', ' process '== laoma; System.out.println (name1==name2);

There is actually only one string object, and three variables point to this object, and name1==name2 is self-evident.

It is important to note that if you do not assign a value directly through a constant, but instead create it through new, = = does not return True , see the following code:

New String ("Lao Ma says programming"new String ("Lao Ma says programming"); System.out.println (name1==name2);

The output is false, why? The above code looks like this:

New String (newchar[]{' old ', ' horse ', ' say ', ' edit ', ' process 'newnew  String ( Laoma); System.out.println (name1==name2);

The constructor code for string arguments in the string class is as follows:

 Public string (string original) {    this. Value = original.value;      this. hash = Original.hash;}

Hash is another instance variable in the String class that represents the cached Hashcode value, which we'll cover later.

As you can see, name1 and Name2 point to two different string objects, except that the value values inside the two objects point to the same char array. Its memory layout is probably as follows:


So, name1==name2 is not tenable, but Name1.equals (name2) is true.

Hashcode

We just mentioned hash as the instance variable, which is defined as follows:

Private int // Default to 0

It caches the value of the Hashcode () method, that is, when the first call to Hashcode () is made, the result is stored in the hash variable, and the saved value is returned directly after the call.

Let's look at the Hashcode method of the string class with the following code:

 Public int hashcode () {    int h = hash;     if (h = = 0 && value.length > 0)        {char val[] = value        ;  for (int i = 0; i < value.length; i++) {            = + * H + val[i];        }         = h;    }     return h;}

If the cached hash is not 0, it is returned directly, otherwise the hash is calculated based on the contents of the character array:

s[0]*31^ (n-1) + s[1]*31^ (n-2) + ... + s[n-1]

s represents a string, S[0] represents the first character, n represents the length of the string, and s[0]*31^ (N-1) represents the n-1 of 31 multiplied by the value of the first character.

Why use this method of calculation? In this equation, the hash value is related to the value of each character, and each position is multiplied by a different value, and the hash value is also related to the position of each character. The use of 31 is probably because of two reasons, on the one hand can produce more scattered hash, that is, different string hash value is also generally different, on the other hand, the computational efficiency is higher, 31*h and 32*h-h namely (H<<5)-h equivalent, you can use more efficient shift and subtraction operation instead of multiplication operation.

In Java, the general use of the above ideas to achieve hashcode.

Regular expressions

In the string class, there are methods that accept not ordinary string arguments, but regular expressions, and what are regular expressions? It can be understood as a string, but the expression is a rule, generally used for text matching, find, replace, and so on, regular expression has a rich and powerful function, is a relatively large topic, we will be introduced separately in the following chapters.

There are specialized classes such as pattern and matcher for regular expressions in Java, but for simple cases, the string class provides a more concise operation, and the methods for accepting regular expressions in string are:

Delimited string

Public

Check if matches

 Public Boolean matches (String regex)

String substitution

 Public string Replacefirst (string regex, string replacement)  Public

Summary

In this section, we introduce the String class, introduce its basic usage, internal implementation, code conversion, analyze its immutability, constant string, and hashcode implementation.

In this section, we mention that in the frequent string modification operations, the string class is less efficient, and we mention the StringBuilder and StringBuffer classes. We also see that the string can be manipulated directly using + and + =, and behind them is the StringBuilder class.

Let's take a look at these two classes in the next section.

----------------

To be continued, check out the latest articles, please pay attention to the public number "old Horse Programming" (Scan the QR code below), from the introduction to advanced, in layman's words, Lao Ma and you explore the nature of Java programming and computer technology. Write attentively, original articles, and keep all copyrights.

-----------

Related high Praise original article

Computer Program Thinking Logic (6)-How to recover from garbled characters (top)?

Thinking Logic of computer program (7)-How to recover from garbled characters (bottom)?

Logic of the computer program (8)-the true meaning of char

Thinking Logic of computer program (28)-Profiling wrapper class (bottom)

Thinking Logic of computer programs (29)-parsing string

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.