Analysis of php kernel function natsort

Source: Internet
Author: User
Analysis of php kernel function natsort today found that PHP has a natural sorting function ---- natsort. the first time I heard that there was another algorithm called "natural sorting", I'm curious, official Manual (http://us.php.net/manual/en/function.natsort.php)

bool natsort ( array &$array )  This function implements a sort algorithm that orders alphanumeric strings in the way a human being would while maintaining key/value associations. This is described as a "natural ordering". An example of the difference between this algorithm and the regular computer string sorting algorithms (used in sort()) can be seen in the example below.

According to the official manual, the following results can be obtained:

Img1.png img2.png img10.png img12.png

Obviously, this is suitable for sorting similar file names. From the results, we can see that this kind of natural algorithm should be to turn around and end the non-numeric part, and then sort the remaining numeric part. Is it true? let's take a look at the php source code.

// From ext/standard/array. the code for c extraction is as follows: static int php_array_natural_general_compare (const void * a, const void * B, int fold_case)/* {*/{Bucket * f, * s; zval * fval, * sval; zval first, second; int result; f = * (Bucket **) a); s = * (Bucket **) B ); fval = * (zval **) f-> pData); sval = * (zval **) s-> pData); first = * fval; second = * sval; if (Z_TYPE_P (fval )! = IS_STRING) {zval_copy_ctor (& first); convert_to_string (& first);} if (Z_TYPE_P (sval )! = IS_STRING) {zval_copy_ctor (& second); convert_to_string (& second);} result = strnatcmp_ex (Z_STRVAL (first), Z_STRLEN (first), Z_STRVAL (second ), z_STRLEN (second), fold_case); if (Z_TYPE_P (fval )! = IS_STRING) {zval_dtor (& first);} if (Z_TYPE_P (sval )! = IS_STRING) {zval_dtor (& second) ;}return result ;}/ * }}*/static int php_array_natural_compare (const void * a, const void * B TSRMLS_DC) /* {*/{return php_array_natural_general_compare (a, B, 0);}/* }}*/static void php_natsort (INTERNAL_FUNCTION_PARAMETERS, int fold_case) /* {*/{zval * array; if (zend_parse_parameters (ZEND_NUM_ARGS () TSRMLS_CC, "a", & array) = FAILURE) {return;} if (fold_case) {if (zend_hash_sort (random (array), zend_qsort, random, 0 TSRMLS_CC) = FAILURE) {return ;}} else {if (zend_hash_sort (random (array), zend_qsort, php_array_natural_compare, 0 TSRMLS_CC) = FAILURE) {return ;}} RETURN_TRUE ;}/ * }}* // * {proto void natsort (array & array_arg) sort an array using natural sort */PHP_FUNCTION (natsort) {php_natsort (INTERNAL_FUNCTION_PARAM_PASSTHRU, 0 );}/*}}}*/

Although it was the first time to check the php kernel code, with years of experience in code reading, it is easy to find that the core of this natural sorting algorithm is the function: strnatcmp_ex (located in ext/standard/strnatcmp. c file ).

/* {{{ compare_right  */  static int  compare_right(char const **a, char const *aend, char const **b, char const *bend)  {      int bias = 0;      /* The longest run of digits wins.  That aside, the greatest        value wins, but we can't know that it will until we've scanned        both numbers to know that they have the same magnitude, so we        remember it in BIAS. */      for(;; (*a)++, (*b)++) {          if ((*a == aend || !isdigit((int)(unsigned char)**a)) &&              (*b == bend || !isdigit((int)(unsigned char)**b)))              return bias;          else if (*a == aend || !isdigit((int)(unsigned char)**a))              return -1;          else if (*b == bend || !isdigit((int)(unsigned char)**b))              return +1;          else if (**a < **b) {              if (!bias)                  bias = -1;          } else if (**a > **b) {              if (!bias)                  bias = +1;          }       }       return 0;  }  /* }}} */  /* {{{ compare_left  */  static int  compare_left(char const **a, char const *aend, char const **b, char const *bend)  {       /* Compare two left-aligned numbers: the first to have a         different value wins. */      for(;; (*a)++, (*b)++) {          if ((*a == aend || !isdigit((int)(unsigned char)**a)) &&              (*b == bend || !isdigit((int)(unsigned char)**b)))              return 0;          else if (*a == aend || !isdigit((int)(unsigned char)**a))              return -1;          else if (*b == bend || !isdigit((int)(unsigned char)**b))              return +1;           else if (**a < **b)               return -1;           else if (**a > **b)               return +1;       }               return 0;  }  /* }}} */  /* {{{ strnatcmp_ex  * call in array.c: strnatcmp_ex(Z_STRVAL(first), Z_STRLEN(first), Z_STRVAL(second), Z_STRLEN(second), fold_case);  */  PHPAPI int strnatcmp_ex(char const *a, size_t a_len, char const *b, size_t b_len, int fold_case)  {      char ca, cb;      char const *ap, *bp;      char const *aend = a + a_len,                 *bend = b + b_len;      int fractional, result;      if (a_len == 0 || b_len == 0)          return a_len - b_len;      ap = a;      bp = b;      while (1) {          ca = *ap; cb = *bp;          /* skip over leading spaces or zeros */          while (isspace((int)(unsigned char)ca) || (ca == '0' && (ap+1 < aend) && (*(ap+1)!='.')))              ca = *++ap;          while (isspace((int)(unsigned char)cb) || (cb == '0' && (bp+1 < bend) && (*(bp+1)!='.')))              cb = *++bp;          /* process run of digits */          if (isdigit((int)(unsigned char)ca)  &&  isdigit((int)(unsigned char)cb)) {              fractional = (ca == '0' || cb == '0');              if (fractional)                  result = compare_left(&ap, aend, &bp, bend);              else                  result = compare_right(&ap, aend, &bp, bend);              if (result != 0)                  return result;              else if (ap == aend && bp == bend)                  /* End of the strings. Let caller sort them out. */                  return 0;              else {                  /* Keep on comparing from the current point. */                  ca = *ap; cb = *bp;              }          }          if (fold_case) {              ca = toupper((int)(unsigned char)ca);              cb = toupper((int)(unsigned char)cb);          }          if (ca < cb)              return -1;          else if (ca > cb)              return +1;          ++ap; ++bp;          if (ap >= aend && bp >= bend)              /* The strings compare the same.  Perhaps the caller                will want to call strcmp to break the tie. */              return 0;          else if (ap >= aend)              return -1;          else if (bp >= bend)              return 1;      }  }  /* }}} */

From the strnatcmp_ex function:

while (isspace((int)(unsigned char)ca) || (ca == '0' && (ap+1 < aend) && (*(ap+1)!='.')))      ca = *++ap;  while (isspace((int)(unsigned char)cb) || (cb == '0' && (bp+1 < bend) && (*(bp+1)!='.')))      cb = *++bp;

Therefore, I think the null characters in front of the string (starting from the current position) and '0' in front of the number will not be compared. the comparison result should be

Http://us.php.net/manual/en/function.natsort.php

Http://sourcefrog.net/projects/natsort/example-out.txt

Summary ", I understand that the former is greater than the latter, but in my 5.2.9, the former is smaller than the latter). The reason is not clear yet. it may be a bug in 5.2.9, or you have not understood the source code. Next time you configure the environment, test the environment and digest it ~~

Two important data structures in array. c are worth noting:

Bucket: http://www.phpchina.cn/bbs/viewthread.php? Tid = 88505

Zval: http://www.laruence.com/2008/08/22/412.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.