WordPress 4.2.2 patch Truncation in 4.2.1
Vulnerability Analysis
In the description of this patch, one of them is to fix the xss issue after the patch bypass in version 4.2.1, the following describes the specific verification process after the different truncation methods used in the xss generation process are repaired multiple times.
The patch introduces two switch variables.
Wp422/wp-nodes des/wp-db.php 2612 rows $ truncate_by_byte_length = 'byte' ===$ value ['length'] ['type']; $ needs_validation = true;
Both switches are true by default. $ truncate_by_byte_length is used to check whether bytes length is detected. $ needs_validation can be interpreted as whether verification is required. The two switches can be differentiated based on the division of labor as follows:
$ Truncate_by_byte_length: indicates the compliance verification of the byte length. The comment content is verified here.
$ Needs_validation: verifies whether multi-byte characters are within the compliance scope and length compliance. Only fields involving multi-byte characters will pass this verification.
First introduce the first $ truncate_by_byte_length
Wp422/wp-nodes des/wp-db.php 2626 rows if ($ truncate_by_byte_length) {mbstring_binary_safe_encoding (); if (false! ==$ Length & strlen ($ value ['value'])> $ length) {$ value ['value'] = substr ($ value ['value'], 0, $ length);} reset_mbstring_encoding (); if (! $ Needs_validation) {continue ;}}
We can see that this solution solves the problem caused by the comparison between the actual length measured by mb_strlen and the specified length online in 4.2.1. Instead, we use strlen to measure the length and cut the extra-long part, then, the 4.2.1 patch bypass caused by the comparison of different length units will not be generated.
For more information, see the Wordpress 4.2.1 stored xss patch bypass article.
After the verification, the system starts to determine whether to perform multi-byte authentication represented by $ needs_validation.
Wp422/wp-shortdes/wp-db.php 2506 rows protected function check_ascii ($ string) {if (function_exists ('mb _ check_encoding ') {if (mb_check_encoding ($ string, 'ascii ') {return true ;}} elseif (! Preg_match ('/[^ \ x00-\ x7F]/', $ string) {return true;} return false ;}
This function is used to check whether the encoding is ASCII or a single-byte character.
Wp422/wp-pair des/wp-db.php 2620 rows (! Isset ($ value ['ascii ']) & $ this-> check_ascii ($ value ['value']) {$ truncate_by_byte_length = true; $ needs_validation = false ;}
If it is ascii or a single-byte character, turn off the switch $ needs_validation and do not perform multi-character verification.
(The multi-byte verification here is to prevent the previous four-byte UTF-8 characters from being truncated in the database, forming an xss)
Wp422/wp-pair des/wp-db.php 2638 rows
// utf8 can be handled by regex, which is a bunch faster than a DB lookup.if ( ( 'utf8' === $charset || 'utf8mb3' === $charset || 'utf8mb4' === $charset ) && function_exists( 'mb_strlen' ) ) {$regex = '/((?: [\x00-\x7F] # single-byte sequences 0xxxxxxx| [\xC2-\xDF][\x80-\xBF] # double-byte sequences 110xxxxx 10xxxxxx| \xE0[\xA0-\xBF][\x80-\xBF] # triple-byte sequences 1110xxxx 10xxxxxx * 2| [\xE1-\xEC][\x80-\xBF]{2}| \xED[\x80-\x9F][\x80-\xBF]| [\xEE-\xEF][\x80-\xBF]{2}';if ( 'utf8mb4' === $charset ) {$regex .= '| \xF0[\x90-\xBF][\x80-\xBF]{2} # four-byte sequences 11110xxx 10xxxxxx * 3| [\xF1-\xF3][\x80-\xBF]{3}| \xF4[\x80-\x8F][\x80-\xBF]{2}';}$regex .= '){1,40} # ...one or more times)| . # anything else/x';$value['value'] = preg_replace( $regex, '$1', $value['value'] );if ( false !== $length && mb_strlen( $value['value'], 'UTF-8' ) > $length ) {$value['value'] = mb_substr( $value['value'], 0, $length, 'UTF-8' );}continue;}
It can be seen that in this process, if it is not utf8mb4 (this encoding can store four-byte characters without truncation), it will only take less than or equal to 3-byte UTF8 characters in the range, some special characters are removed from the range. After the characters are extracted, the encoding length of the Multi-byte characters is determined again. This fix is used to insert four-byte UTF-8 characters into the database for truncation and other special characters Truncation in analyticdb 4.2 or later versions.