recently, When you use an app to memorize words, a word similar to that is listed under a Word. So I was wondering how the feature was done, using the PHP version and doing a simple example.
The general idea is as Follows:
1. Create the English word library and place the word in Redis (of course database is also Available)
2, get the user word, generate similar words
First of all we look at the first step, my approach is to find a very large text from the internet, Big.txt. This text contains tens of thousands of words, and then uses the regular, the inside of the word, to take out and coexist in the Redis library.
But the problem is, when we use PHP to read large files, and also to use the regular to match words, This action is very memory-intensive, so I would like to do, one line to read the text, and put this line
The words inside are taken out and stored in the Library.
The core code is as Follows:
Public function Word ($perLine) {preg_match_all ('/[a-z]+/i ', $perLine, $matches), if ($matches [0]) {foreach ($matches [0] As $key = $value) {$word = strtolower ($value); if ($this, redisobj->exists ($word)) {$this, redisobj-> INCR ($word);} Else{$this-redisobj->set ($word, 1);}}} /** * Returns the contents of the file from the X line to the Y line (supports php5, php4) * @param string $filename file name * @param int $startLine The number of lines beginning * @param int $endLine end of line Number * @return string */public function getfilelines ($filename, $startLine = 1, $endLine =50, $method = ' rb ') {$content = AR Ray (); $count = $endLine-$startLine; Determine the PHP version (because you want to use Splfileobject,php>=5.1.0) if (version_compare (php_version, ' 5.1.0 ', ' >= ')) {$fp = new Splfi Leobject ($filename, $method); $fp->seek ($startLine-1);//go to Nth row, The Seek method parameter counts from 0 for ($i = 0; $i <= $count; + + $i) {$lineContent = $f P->current ();//current () Gets the contents $this word ($lineContent); $fp->next ();//next line}}}
This program code may have to be performed for a long time, depending on the pc, anyway my computer executes for almost a couple of hours. well, in short, we made a thesaurus of our own, though it doesn't contain all the words, but at least the test is Ok.
PHP matches similar words