[Code] use regular expressions to extract a fixed-length string (including Chinese characters) from the specified starting position in the source string. [version 4]
[Code] use regular expressions to extract a string of a certain length from the source string starting from the specified starting position [version 4]
[Code] use a regular expression to extract a string of a certain length from the source string starting from the specified start position. [fourth correction]
[Code] uses a regular expression to extract a string of a certain length in the source string starting from the string header.
[Code] uses a regular expression to extract a string of a certain length from the source string starting from the specified start position.
(BTW: The Chinese encoding is complex and unreasonable. The high position is 0xa1-0xfe (excluding 0xff because 0xff is 255, which plays an important role in the telnet Protocol), and the low position is 0x40-0xfe; GBK extended the high position to 0x81-0xfe for unicode ing.
Description of whether the last byte is truncated into incorrect Chinese characters:
For the last byte, if half of the Chinese characters are intercepted, it should be a high byte with an ASCII code greater than 0x81.
Because the Chinese high byte is greater than 0x81, the low byte is not limited.
A complete Chinese Character: [0x81-0xfe] [0x40-0xfe]
Therefore, regular expressions are used to extract Chinese and non-Chinese Characters in sequence.
For the last byte, if half of the Chinese character is intercepted, it will be a non-Chinese character and a high byte of the Chinese character.
To determine whether the byte is in [0x81-0xfe], you can see whether the truncation error is correct.
<? Php
//---------------------------------------------------------------
// File name: preg_substr.php
// Description: uses a regular expression to extract a certain degree of string from the source string starting from the specified start position.
//-----------------------------------------------------------
/// Function Description
/// Function name: preg_substr
/// Function version: Fourth Revision
/// Function: uses a regular expression to extract a certain degree of string from the source string starting from the specified start position.
/// Function parameters:
/// $ StrSource: Source string
/// $ IntStart: Start position. The default value is 0, indicating the start point.
/// $ IntLen: Specifies the truncation length. The default value is 32.
Function preg_substr ($ strSource, $ intStart = 0, $ intLen = 32)
{
Is_int ($ intLen )? 0: die ("len isn' t a integer ");
Is_int ($ intStart )? 0: die ("start isn' t a integer ");
If ($ intStart> = 0 & $ intLen> 0 & @ preg_match ('/^ (. {'. $ intStart. '})(. {0 ,'. $ intLen. '})/si', $ strSource )){
@ Preg_match ('/^ (. {'. $ intStart. '})(. {0 ,'. $ intLen. '})/si', $ strSource, $ regs );
@ Preg_match_all ('/([x81-xFE]. |.)/sim', $ regs [1], $ regs1, PREG_PATTERN_ORDER );
@ Preg_match ('/^ [x81-xFE] $/', $ regs1 [1] [count ($ regs1 [1])-1])? $ IntStart --: 0;
@ Preg_match ('/^ (. {'. $ intStart. '})(. {0 ,'. $ intLen. '})/si', $ strSource, $ regs );
@ Preg_match_all ('/([x81-xFE]. |.)/sim', $ regs [2], $ regs1, PREG_PATTERN_ORDER );
@ Preg_match ('/^ [x81-xFE] $/', $ regs1 [1] [count ($ regs1 [1])-1])? $ IntLen --: 0;
@ Preg_match ('/^ (. {'. $ intStart. '})(. {0 ,'. $ intLen. '})/si', $ strSource, $ regs );
$ StrResult = $ regs [2];
} Else {
$ StrResult = "";
}
Return $ strResult;
}
Function preg_substr2 ($ strSource, $ intStart = 0, $ intLen = 32)
{
Is_int ($ intLen )? 0: die ("len isn' t a integer ");
Is_int ($ intStart )? 0: die ("start isn' t a integer ");
If ($ intStart> = 0 & $ intLen> = 0)
{
$ StrResult = substr ($ strSource, 0, $ intStart );
@ Preg_match_all ('/([x81-xFE]. |.)/sim', $ strResult, $ regs, PREG_PATTERN_ORDER );
If (@ preg_match ('/^ [x81-xFE] $/', $ regs [1] [count ($ regs [1])-1], $ regs )){
$ IntStart --;
}
$ StrResult = substr ($ strSource, $ intStart, $ intLen );
@ Preg_match_all ('/([x81-xFE]. |.)/sim', $ strResult, $ regs, PREG_PATTERN_ORDER );
If (@ preg_match ('/^ [x81-xFE] $/', $ regs [1] [count ($ regs [1])-1], $ regs )){
$ StrResult = substr ($ strSource, $ intStart, -- $ intLen );
}
}
Return $ strResult;
}
$ StrHTML = <HTML
AB