public int indexOf (int ch) problem in JAVA string method __java

Source: Internet
Author: User
Tags string indexof

When you return a character in Java for the first time in the position of the string, string provides us with several valid APIs.

<span style= "FONT-FAMILY:SIMSUN;FONT-SIZE:18PX;" > int indexOf (int ch) 
      //Returns the index at which the specified character appears for the first time in this string.
 int indexOf (int ch, int fromindex) 
      //Returns the index at the first occurrence of the specified character in this string, starting at the specified index.
 int indexOf (string str) 
      //Returns the index where the specified substring appears for the first time in this string.
 int indexOf (String str, int fromindex) 
      //Returns the index where the specified substring first appears in this string, starting at the specified index. </span>
As an example:

<span style= "FONT-FAMILY:SIMSUN;FONT-SIZE:18PX;" >class Demo1
{public
	static void Main (string[] args) {
		System.out.println ("Hello World". IndexOf (' l ')) ;
		The return value is 2
		System.out.println ("Hello World". IndexOf (' O ', 5));
		The return value is 7
	}
}</span><span style= "FONT-FAMILY:SIMSUN;FONT-SIZE:24PX;" >	</span>
We are always used to taking this function and passing in a character to find the first occurrence of the character, but we look carefully at the JDK's function declaration public int indexOf (int ch), notice that the local parameter of the function here is the data type is int, Instead of what we think is char, in Java the int type is defined as 4 bytes, and the char type is defined as 2 bytes, although we can automatically convert char to int, but why is the JDK not declared as public int indexOf (char ch) directly? That is the question we are going to discuss today.

First, the Unicode encoding length used by Java is 4 bytes, i.e. an int size is the length of the encoding that can hold a Unicode. The first byte of Unicode encoding is called a group, the second byte is called a face, the third byte is a row, and the fourth byte is called a dot. The No. 0 set of characters in the No. 0 plane can be expressed in only 2 bytes and covers most of the commonly used words. For easy salutation, Unicode gives it a name-the basic Multilingual Plane (BMP Basic multilingual Plane). The basic multilingual level and upper fields are 0 to FFFF, totaling 65,535 yards. And there are characters in Unicode in ASCII, and corresponding to the same encoding number, it is not simple to think that Unicode encodes a char to save data.

Let's look at the source code for string indexof (int ch):

<span style= "FONT-FAMILY:SIMSUN;FONT-SIZE:18PX;" > Public   int indexOf (int ch, int fromindex) {
        final int max = value.length;
        if (Fromindex < 0) {
            fromindex = 0;
        } else if (Fromindex >= max) {
            //Note:fromindex might be NEAR-1&G T;>>1.
            return-1;
        }

        if (Ch < character.min_supplementary_code_point) {
            //Handle most cases here (Ch are a BMP CODE point or a
            //n Egative Value (Invalid code point))
            final char[] value = this.value;
            for (int i = Fromindex i < max; i++) {
                if (value[i] = = ch) {return
                    i;
                }
            } return-1;
        } else {return
            indexofsupplementary (CH, fromindex);
        }
    } </span>
@param ch character (Unicode code point) parameter ch is a Unicode code dot

What is a code point. Code point: The code value that corresponds to a character in a Unicode encoded table. Char is used in Java to represent Unicode characters, since Unicode is used for up to 16bit representations at first. Therefore char is able to represent all Unicode characters. Later, due to Unicode4.0, Unicode-supported characters are far more than 65,536 characters long. Therefore char cannot now represent all Unicode characters. Can only represent characters between 0x000000 and 0x00ffff (00 represents Latin and its symbols). In other words, char cannot represent an additional character.

In Java, all Unicode code points are represented by Int. The 21 low (least significant bit) of int is used to represent Unicode code points, and 11 high (most significant digits) must be zero. In other words, an int can represent an additional character that Char cannot represent. We can also see that indexOf () has a if{}else{} statement that calls the Indexofsupplementartary () method when it exceeds the code complement range for Unicode. He is dealing with problems beyond the scope. Here we can actually remember that indexof (int ch) is actually an incoming Unicode code point, not a real character passed in, and that the code point in Java is represented by 32 for data, because the times are int instead of Char

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.