Recently made a self-active error Correction demo Web page: nfabo.cn when there are some typos in Query, the search engine tries to correct errors by similar pinyin
The search engine restores these words to pinyin, replacing them with a known Query with the same pinyin.
However, when the wrong character is Polyphone . Especially when there are multiple such error inputs, all of the search engines are basically whatever. Or use only one of the most frequently used tones to correct.
Because all possible combinations of pinyin are considered, the exponential explosion can be caused in extreme cases!
My algorithm overcomes this exponential explosion problem.
- This demo page now contains only 8 million phrases + word frequency. The data is not too clean.
- The algorithm is all executed in memory and uses 360M of memory. This data volume, assuming that the traditional method of brute force is achieved, and achieves this performance, requires dozens of GB of memory
- This server is a rented virtual cloud host, single core, 3 times times slower than my 2009 notebook computer
Error correction based on editing distanceFind in a known search termEdit DistanceThe word with the smallest user Query, the use of my algorithm can also be efficiently resolved (not yet a demo page)
Recently made a self-active error Correction demo Web page