Today in writing a script, statistics a pure English text document TXT, inside the number of words appear, the code is as follows:
<?php/** * Any English text-only file that counts the number of words that appear.
* Created by Phpstorm.
* User:paul * DATE:2016/11/5 * time:23:18/$content = file_get_contents (' 4/youth.txt ');
$res = Count_word ($content, 1);
Print_r ($res);
/** * Any English text-only file that counts the number of words that appear in it. * @param string $string strings * @param int $lower Case 1: Case-insensitive 0: Case sensitive * @return Array/function Count_word ($string,
$lower = 0) {$string = Trim ($string);
if ($lower) {$string = Strtolower ($string); //filter out some punctuation $string = str_replace (['; ', ', ', ', ', ', ', ', ', '] ', ', ', '. '), ' ', ' ', ', ', '-', '-', '!, ': ', ' (', ') ', ' ... ', ', ', ' ', '(', ')', '。
', ' \ R ', ' \ n '], ', $string ';
$array = Explode (", $string);
$res = Array (); foreach ($array as $value) {//To filter out quotes after words such as I ' ll, you ' re, masters ' s, leaving only the words I, you, Master, if (Strpos ($value, "
"!== false) {$value = Strstr ($value,", true);
} if (Strpos ($value, "'")!== false) {$value = Strstr ($value, "'", true); }
//Filter out empty if (empty ($value) = = true) {continue;
} if (Array_key_exists ($value, $res)) {$res [$value]++;
else {$res [$value] = 1;
}//Sort Array_multisort ($res, Sort_desc, sort_numeric);
return $res; }
After the run, there is a situation in which a word is changed after another word, and the two words are judged to be a word, as follows:
Array (
[repression] => 1
[thoroughness] => 1
[bleached] => 1
[tow] => 1
[Inspired] = > 1
[Uniformwell] => 1
[Panamas] => 1
[caps when
] => 1
)
The code has been replaced with \ R, \ n, and the TXT file is not opened with Windows from the Text tool to open the edit, is opened with sublime and has been set to encode utf-8, but this situation will still occur.
Solution: You can solve this problem by asking and looking for some data in the Segmenfault, because when you refer to an escape character, you have to use double quotes, no single quotes, and a reference variable, for example:
<?php
$aa = ' Hello \ r \ n I'm not good ';
echo $aa;
$BB = "Hello \ r \ n I am not good";
Echo $BB;
Output:
Hello, \ r \ n I'm not good.
So, the above code is modified to:
<?php/** * Any English text-only file that counts the number of words that appear.
* Created by Phpstorm.
* User:paul * DATE:2016/11/5 * time:23:18/$content = file_get_contents (' 4/youth.txt ');
$res = Count_word ($content, 1);
Print_r ($res);
/** * Any English text-only file that counts the number of words that appear in it. * @param string $string strings * @param int $lower Case 1: Case-insensitive 0: Case sensitive * @return Array/function Count_word ($string,
$lower = 0) {$string = Trim ($string);
if ($lower) {$string = Strtolower ($string); //filter out some punctuation marks (note: The newline character \ r, \ n, etc. must be in double quotes, not single quotes) $string = Str_replace (['; ', ', ', ', ', ', ', ', ', ', ', ', ', ', '] ', ' ', ' ', ', ', '-', '-', '!, ':', '(', ')', '...', ' ', '"', '(', ')', '。
', ' \ R ', ' \ n '], ', $string ';
$array = Explode (", $string);
$res = Array (); foreach ($array as $value) {//To filter out quotes after words such as I ' ll, you ' re, masters ' s, leaving only the words I, you, Master, if (Strpos ($value, "
"!== false) {$value = Strstr ($value,", true); } if (Strpos ($value, "'")!== false) {$value = Strstr ($value, "'",true);
}//Filter out empty if (empty ($value) = = true) {continue;
} if (Array_key_exists ($value, $res)) {$res [$value]++;
else {$res [$value] = 1;
}//Sort Array_multisort ($res, Sort_desc, sort_numeric);
return $res; }