If you use Pinyin to find Chinese characters how to get?
If a txt Chinese text is called, the contents of each line are converted to pinyin,
This is simple, and high efficiency, then use pinyin to find Chinese characters, suddenly found the efficiency surprisingly low,
!!!!! in the Headache
Define ("DIR", DirName (__file__));
$Titleline =file (DIR. " /hanzi.txt ");
$TC =count ($Titleline);
$pinyin = "Woaini";
for ($i = 1; $i <= $TC; $i + +) {
$x =strcmp ($yuming, py ($Titleline [$i]));
if ($x = = 0):
$Title 1= $Titleline [$i];
Break
endif
}
The code looks like this.
PY is the phonetic function of Chinese characters
The problem arises when the hanzi.txt size exceeds 1-3m, and, the speed will be very slow,,,
How to deal with this problem,,,
Reply to discussion (solution)
First of all, because of the existence of the same word (many different meanings of the same pronunciation)
So it is meaningless to find Chinese characters from Pinyin.
Organize data with multiple hash tables to efficiently query
First of all, because of the existence of the same word (many different meanings of the same pronunciation)
So it is meaningless to find Chinese characters from Pinyin.
Using multiple hash tables to organize the data can be efficiently queried this sounds the same as possible I will lower,,,
Using multiple hash tables to organize data efficiently query This can give an example?
First of all, because of the existence of the same word (many different meanings of the same pronunciation)
So it is meaningless to find Chinese characters from Pinyin.
Organizing data with multiple hash tables can be efficiently queried, I can't use the database. can only operate txt
You can use the trie algorithm.
According to "Chinese Phonetic alphabet" all the Chinese characters only the initials and the vowel composition, if can add the tone is better
At most, the triple hash table is done.
You can use the trie algorithm.
According to "Chinese Phonetic alphabet" all the Chinese characters only the initials and the vowel composition, if can add the tone is better
At most that is the triple hash table completed the moderator experience, said I this rookie do not know how to do ....