Java string interception (when encountering a half-character interception) Method 2 (used in the project)
Method 1 is to look at someone else's, personally think Method 1 concise
Package everyday;
Import java.io.UnsupportedEncodingException;
/**
* * Title:
Write a function that intercepts a string, enter it as a string and number of bytes, and output a string that is truncated by bytes. But to ensure that Chinese characters are not truncated half, such as "I abc" 4, should be cut to "I ab", input "I ABC Han def", 6, should be output as "I abc" rather than "I abc+ Han half."
GB2312, GBK, gb18030,cp936, and CNS11643 all meet the requirements-Chinese is 2 bytes, and English is 11 bytes.
Because Chinese is converted to byte bytes, the length of the converted bytes will not pass, as encoding is UTF-8, and a Chinese string converted to byte takes three bytes.
*
*/
public class Learncsplit {
/**
* Method 1, simpler than Method 2
* @param text
* Target String
* @param length
* Intercept Length
* @param encode
* The encoding method used
* @return
* @throws unsupportedencodingexception
*/
private static string substring (string str, int length1, string code) throws Unsupportedencodingexception {
if (str==null) {
return null;
}
StringBuilder sb=new StringBuilder ();
int currentlength=0;
For (char C:str.tochararray ()) {
currentlength+=string.valueof (c). GetBytes (code). length;
if (currentlength<=length1) {
Sb.append (c);
}else {
Break ;
}
}
return sb.tostring ();
}
public static void Main (string[] args) throws Unsupportedencodingexception {
//stringbuilder sb=null;//Thread is unsafe, high performance
String str= "I abc Han def";
int length1=3;
int length2=6;
String [] codes=new string[]{"GB2312", "GBK", "GB18030", "CP936", "CNS11643", "UTF-8"};
For (String code:codes) {
System.out.println (New StringBuilder (). Append ("with"). Append (code)
. Append ("encoded intercept string--" ""). Append (str). Append ( "" ")
. Append (length1). Append ("The result of a byte is" ")
. Append (substring (str,length1,code)). Append ("" "). toString ());
System.out.println (New StringBuilder (). Append ("with"). Append (code)
. Append ("encoded intercept string--" ""). Append (str). Append ( "" ")
. Append (length2). Append ("The result of a byte is" ")
. Append (substring (str,length2,code)). Append ("" "). toString ());
}
The above is Method 1
String value= "Urumqi Test and Test Development Resource Service Co., Ltd. Dabancheng branch 1A2B3";
Number of statistics bytes
int Countbytes=conutbyte (value);
40 bytes of known field length
if (countbytes>40) {
Value=substr (value,0,40);
SYSTEM.OUT.PRINTLN ("Output a string of the specified field length:" +value);
}
}
/**
* Statistics of bytes
* @param value
* @return
*/
private static int Conutbyte (String value) {
if (value==null) {
return 0;
}
Byte[] BS;
try {
BS = value.getbytes ("GB18030");
int lenbs=bs.length;
return lenbs;
} catch (Unsupportedencodingexception e) {
TODO auto-generated Catch block
E.printstacktrace ();
}
return 0;
}
/**
* Intercept characters
* @param str
* @param begin
* @param ZDCD
* @return
*/
private static string substr (string str, int begin, int zdcd) {
if (str = = null) {
return str;
}
String str2;
Str=getsubstring (STR,ZDCD);//intercept a string of the specified byte length, and cannot return half Chinese characters 20
Zdcd=conutbyte (str);//The number of bytes from the new calculation, 19
I'm going to bad fun 123 I'm going to
Byte[] BS;
try {
BS = str.getbytes ("GB18030");
str2 = new String (BS, Begin, ZDCD, "GB18030");
return str2;
} catch (Unsupportedencodingexception e) {
E.printstacktrace ();
}
Return "";
}
/**
* <b> intercept A string of the specified byte length, cannot return half kanji </b>
* @param str
* @param ZDCD
* @return
*/
private static string getsubstring (string str, int zdcd) {
int count=0;
int offset=0;
Char[] C=str.tochararray ();
for (int i = 0; i < c.length; i++) {
if (c[i]>256) {
offset=2;
count+=2;
}else{
Offset=1;
count++;
}
if (COUNT==ZDCD) {
Return str.substring (0,I+1);
}
if ((count==zdcd+1 && offset==2)) {
Return str.substring (0,i);
}
}
Return "";
}
}
Console output Results:
Intercept a string with GB2312 encoding--"I am ABC def" The result of 3 bytes is "I A"
Using GB2312 encoding to intercept a string--"I abc def" 6 bytes result is "I abc"
Intercept a string with GBK encoding--"I am ABC def" The result of 3 bytes is "I A"
Using GBK encoding to intercept a string--"I abc def" 6 bytes result is "I abc"
Intercept a string with GB18030 encoding--"I am ABC def" The result of 3 bytes is "I A"
Using GB18030 encoding to intercept a string--"I abc def" 6 bytes result is "I abc"
Intercept a string with CP936 encoding--"I am ABC def" The result of 3 bytes is "I A"
Using CP936 encoding to intercept a string--"I abc def" 6 bytes result is "I abc"
Intercept a string with CNS11643 encoding--"I am ABC def" The result of 3 bytes is "I A"
Using CNS11643 encoding to intercept a string--"I abc def" 6 bytes result is "I abc"
Intercept string with UTF-8 encoding--"I abc def" 3 bytes result is "I"
Using UTF-8 encoding to intercept a string--"I abc def" 6 bytes result is "I abc"
Output a string specifying the length of a field: Urumqi Test and Test Development Resource Service Co., Ltd.
Method 3: Intercept a string of the specified length
public class Characterssplit {
public static void Main (string[] args) {
String value= "Ulu a Muzzi co-sheng Human Resources Services Limited liability company Dabancheng Branch 1a2b3";//24+6+2=32
Value=getsubstring (Value,value.tochararray (). length);//Ulu a wood equating
Value=getsubstring (value,89);
Value=value.substring (0, 6);//Ulu a wood equating//This if it is 89, it will be reported to cross the mark
System.out.println (value);
}
/**
*description: Intercepts a string of a specified length
* Compared to the string substring method, can not be long enough to intercept the problem of subscript out of bounds.
*/
Public static String getsubstring (string sOurce, int len) {
if (Source.isempty ()) {
Return "";
}
if (Source.length () <= len) {//32=32
Source.length () =value.tochararray (). length
return sOurce;
}
Return source.substring (0, Len);
}
}
Run output: Ulu a Muzzi Human Resources Service Co., Ltd. Dabancheng branch 1A2B3
Java string interception