Teach you how to create a keyword matching project (search engine) ---- 21st days, teach you how to do 21st days
Guest string: hacker form artifacts, database issues
Object-oriented Sublimation: object-oriented cognition-new first cognition, object-oriented imagination-sleepwalking (1), object-oriented cognition-how to find a class
Server Load balancer: Server Load balancer-concepts, Server Load balancer-configuration implementation (Nginx)
Remarks: currently, the owed articles are:Object-Oriented Knowledge-class conversion and object-oriented thinking-sleepwalking (2 ),Server Load balancer-file service policy,Teach you how to create a keyword matching project (search engine). Too much. Can I have a rest.
21st days
Start Point: Teach you how to create a keyword matching project (search engine) ---- Day 1
Review: Experts teach you how to perform keyword matching projects (search engines)-20th days
Today I have a theoretical knowledge to understand, called test-driven programming. I mentioned this concept before. I teach you how to create a keyword matching project (search engine)-11th days.
Today, Shuai was amused and used this idea.
Okay. The following text begins.
He said that Shuai gave the business word splitting method he wrote to the boss, and he was very happy.
However, the phrase used for business word splitting is limited. When the data size of business word splitting increases, the computing time will increase.
As mentioned by the boss, whether other word splitting extensions can be used to make up for word splitting shortcomings.
After all, what professionals do is more reliable.
With great experience, we recommend that you understand the usage of SCWS.
SCWS is the abbreviation of Simple Chinese Word Segmentation (I .e., Simple Chinese Word Segmentation System ).
Http://www.xunsearch.com/scws/index.php
Of course, I am very happy to hear about Shuai, because there are new knowledge points.
Shuai installed SCWS according to the SCWS installation document.
I installed the php extension and tried to write a test code:
<? Phpclass TestSCWS {public static function split ($ keyword) {if (! Extension_loaded ("scws") {throw new Exception ("scws extension load fail");} $ so = scws_new (); $ so-> set_charset ('utf8 '); $ so-> send_text ($ keyword); $ ret = array (); while ($ res = $ so-> get_result () {foreach ($ res as $ tmp) {if (self: isValidate ($ tmp) {$ ret [] = $ tmp ;}}$ so-> close (); return $ ret ;} public static function isValidate ($ scws_words) {if ($ scws_words ['len'] = 1 & ($ scws_words ['word'] = "\ r" | $ scws_words ['word'] = "\ n ")) {return false;} return true;} var_dump (TestSCWS: split ("xxl dress "));
The test passed. Just like the ideal one, Shuai was very happy. He asked the boss: I will use SCWS for the boss. What should I do next?
If the boss is not flustered, he will say to Shuai: You should first write a ScwsSplitter to split the keywords.
Shuai is very happy because he has learned new knowledge and is very nice to the boss.
The Code is as follows:
class ScwsSplitter { public $keyword; public function split(){ if (!extension_loaded("scws")) { throw new Exception("scws extension load fail"); } $keywordEntity = new KeywordEntity($this->keyword); $so = scws_new(); $so->set_charset('utf8'); $so->send_text($this->keyword); while ($res = $so->get_result()) { foreach ($res as $tmp) { if ($this->isValidate($tmp)) { $keywordEntity->addElement($tmp["word"]); } } } $so->close(); return $keywordEntity; } public function isValidate($scws_words) { if ($scws_words['len'] == 1 && ($scws_words['word'] == "\r" || $scws_words['word'] == "\n")) { return false; } return true; } }
He ran to the boss again and said: I have written the Scws word segmentation code.
He also admired the efficiency of Shuai.
I also said: If I use both of them at the same time, I first use the business word segmentation, and the remaining words use Scws word segmentation. Does Shuai have a good solution?
Shuai asked: Why? This is not the case.
What should I do if SCWS cannot be used in terms of business?
Shuai said: I saw the dictionary and rule file settings when reading the document. Can we use it?
The boss said: This is acceptable, but we must learn how to ensure maintenance by operators to hand over these things.
Shuai :.......
After a moment of silence, Shuai thought that the two classes have been written. Using them together is the fastest solution, and he agreed:Okay, I'll go back to change ....
First, Shuai wrote the entry code based on the idea of test-driven programming:
Class SplitterApp {public static function split ($ keyword, $ cid) {$ keywordEntity = new KeywordEntity ($ keyword); # business word segmentation $ termSplitter = new TermSplitter ($ keywordEntity ); $ seg = new DBSegmentation (); $ seg-> cid = $ cid; $ termSplitter-> setDictionary ($ seg-> transferDictionary (); $ termSplitter-> split (); # SCWS word segmentation $ scwsSplitter = new ScwsSplitter ($ keywordEntity); $ scwsSplitter-> split (); # Processing of remaining words or phrases $ elementW Ords = $ keywordEntity-> getElementWords (); $ remainKeyword = str_replace ($ elementWords, ":", $ keywordEntity-> keyword); $ remainElements = explode ("::", $ remainKeyword); foreach ($ remainElements as $ element) {if (! Empty ($ element) $ keywordEntity-> addElement ($ element );}
Return $ keywordEntity ;}}
The handsome guy shouted, but with the test entry, he was afraid of other questions.
First, getElementWords of KeywordEntity should be handled first.
Class KeywordEntity {public $ keyword; public $ elements = array (); public function _ construct ($ keyword) {$ this-> keyword = $ keyword ;} public function addElement ($ word, $ times = 1) {if (isset ($ this-> elements [$ word]) {$ this-> elements [$ word]-> times + = $ times;} else $ this-> elements [$ word] = new KeywordElement ($ word, $ times);} public function getElementWords () {$ elementWords = array_keys ($ this-> e Lements); usort ($ elementWords, function ($ a, $ B) {return (UTF8: length ($ a) <UTF8: length ($ B ))? 1:-1 ;}); return $ elementWords ;} /*** @ desc calculate UTF8 string weight * @ param string $ word * @ return float */public function calculateWeight ($ word) {$ element = $ this-> elements [$ word]; return ROUND (strlen ($ element-> word) * $ element-> times/strlen ($ this-> keyword), 3) ;}} class KeywordElement {public $ word; public $ times; public function _ construct ($ word, $ times) {$ this-> word = $ word; $ this-> times = $ times ;}}
The second is word segmentation. First, the public class is extracted. Then, the Splitter becomes a public class. What are the methods?
1. Abstract split method
2. Get the phrase for the keyword to be split
3. Need to split
According to this, Shuai wrote the following code:
Abstract class Splitter {/*** @ var KeywordEntity $ keywordEntity */public $ keywordEntity; public function _ construct ($ keywordEntity) {$ this-> keywordEntity = $ keywordEntity ;} public abstract function split ();/*** get unsegmented strings and filter words ** @ return array */public function getRemainKeywords () {$ elementWords = $ this-> keywordEntity-> getElementWords (); $ remainKeyword = str_replace ($ elementWords, ":", $ this-> keywordEntity-> keyword ); $ remainElements = explode (":", $ remainKeyword); $ ret = array (); foreach ($ remainElements as $ element) {if ($ this-> isSplit ($ element) {$ ret [] = $ element;} return $ ret ;} /*** whether to split ** @ param $ element * @ return bool */public function isSplit ($ element) {if (UTF8: isPhrase ($ element )) {return true;} return false ;}}
Then he continued to implement the business Splitting Algorithm and Scws splitting algorithm. The handsome guy smiled. This little thing can be done.
Class TermSplitter extends Splitter {private $ dictionary = array (); public function setDictionary ($ dictionary = array () {usort ($ dictionary, function ($ a, $ B) {return (UTF8: length ($ a) <UTF8: length ($ B ))? 1:-1 ;}); $ this-> dictionary = $ dictionary;} public function getDictionary () {return $ this-> dictionary ;} /*** split keywords into phrases or words ** @ return KeywordScore [] $ keywordScores */public function split () {foreach ($ this-> dictionary as $ phrase) {$ remainKeyword = implode (":", $ this-> getRemainKeywords (); $ matchTimes = preg_match_all ("/$ phrase/", $ remainKeyword, $ matches ); if ($ matchTimes> 0) {$ this-> k EywordEntity-> addElement ($ phrase, $ matchTimes) ;}}} class ScwsSplitter extends Splitter {public function split () {if (! Extension_loaded ("scws") {throw new Exception ("scws extension load fail");} $ remainElements = $ this-> getRemainKeywords (); foreach ($ remainElements as $ element) {$ so = scws_new (); $ so-> set_charset ('utf8'); $ so-> send_text ($ element ); while ($ res = $ so-> get_result () {foreach ($ res as $ tmp) {if ($ this-> isValidate ($ tmp )) {$ this-> keywordEntity-> addElement ($ tmp ['word']) ;}}$ so-> close ();}} /*** @ param array $ scws_words * @ return bool */public function isValidate ($ scws_words) {if ($ scws_words ['len'] = 1 & ($ scws_words ['word'] = "\ r" | $ scws_words ['word'] = "\ n ")) {return false;} return true ;}}
Shuai finally completed all the code. I'm glad that he also gave the UML diagram to everyone:
The growth of handsome boy is really amazing. After reading it, the boss praised him three times.
To test the code, Shuai writes the test code as follows:
Class SplitterAppTest {public static function split ($ keyword) {$ keywordEntity = new KeywordEntity ($ keyword); # business word segmentation $ termSplitter = new TermSplitter ($ keywordEntity ); $ seg = new TestSegmentation (); $ termSplitter-> setDictionary ($ seg-> transferDictionary (); $ termSplitter-> split (); # SCWS word segmentation $ scwsSplitter = new ScwsSplitter ($ keywordEntity); $ scwsSplitter-> split (); # Processing of subsequent legacy words or phrases $ elementWords = $ keywordEn Tity-> getElementWords (); $ remainKeyword = str_replace ($ elementWords, ":", $ keywordEntity-> keyword); $ remainElements = explode ("::", $ remainKeyword); foreach ($ remainElements as $ element) {if (! Empty ($ element) $ keywordEntity-> addElement ($ element);} return $ keywordEntity ;}} SplitterAppTest: split ("Dress xl wide dress ");
The handsome guy is masturbating. He thinks that he will step on you one day.
It is very suitable for office workers and students who want to make a fortune.
Everyone has a good card in his life. Unfortunately, many people have wasted it. They have a rich card in their hands, but have made themselves a poor man.
Many people are filled with negative dust, disappointing sludge, poor and backward thoughts, and even resentful seeds, so that you will never be happy and rich. Poor: Are there any secrets to getting rich and doing business?
RICH: Everything has different internal rules. The so-called secret is actually just a little bit.
The water opened when the 19th level was reached. The difference between boiling water and warm water is that once. The reason why some things are quite different is often because they were insignificant. I saw such a thing in the newspaper.
Two laid-off women, each with an early shop on the side of the road, sold bags and camellia. A business is booming and a 30-day stall is said to be caused by an egg problem.
When the business is booming, customers always ask whether to put an egg in a camellia or two eggs whenever they arrive. Two different methods can always make the first seller sell more eggs. When the eggs are sold much more, the profits will be high, and various expenses will be paid, and the business will continue. If the eggs are sold less, the profits are less, and the fees are not charged, the stalls have to be closed. There is only one egg difference between success and failure.
99% of the famous Coca-Cola products are water, sugar, carbonate and caffeine, and the composition of all the drinks in the world is also like this. However, 1% of the items in Coca-Cola are absolutely exclusive. It is said that it is the mysterious 1%, making it more than 0.4 billion of the net profit every year, while other brands of drinks, we are satisfied with the annual revenue of 80 million US dollars.
In this world, the distance between success and defeat is just a little bit, and the so-called secret is just a little bit, but this little thing is the most valuable. Many people need to replace it with repeated failures, then proceed to success. Poor: is it easy to succeed if you know the secrets of a certain business and engage in this project?
RICH: all kinds of businesses have their own secrets. No one will tell this secret to others, because some of them cannot be put on the desktop, and others are afraid of learning them, they all put it in the secret recipe of the ancestral family. A friend from that clinic told me that to make money in a clinic, in principle: cheap and effective. But if you do so, you will not be able to pay for it. Since it is cheap, you cannot charge for it. If it is effective, you will be able to pay for it once you get sick. In this way, you can only pay for the amount of money you earn from the competent authority, rent, and employee salaries, and there is almost no fees left in the social network ...... It is better to close the door early. If you want to engage in any industry, you must first become friends with people engaged in this industry or work with them as employees. You can learn this secret recipe with your heart. This is much more cost-effective than having lost much time in practice.
The boss is doing things, the boss is doing the city, and the big boss is doing the trend!
Many of us use physical strength to make money, many use technology to make money, few use knowledge to make money, and few use wisdom to make money. In the age of wealth, there are too few and too few smart people. There are even more rare people who are smart and can seize business opportunities. As long as we use our brains and wisdom, we can seize the opportunity and become the masters of wealth.
It is suitable for office workers and students who want to make a fortune.
???