Introduction
Recently, I have been studying Bo's vocabulary, which is about 12000 words.
It is very challenging to remember so many words in a short time. My personal habits are as follows:
- Divide unfamiliar long words into familiar small words, make sentences with small words, and include the meaning of the new words;
- Read more sentences of this new word to deepen the meaning in the context;
- You can use a word that is very similar to this new word to remember the new word.
- Ebbinhot has reviewed several times to reduce forgetting.
The third method is to use the "shape near word" to remember.
I think this method is very useful, just like a friend introducing you to new friends.
In addition, when memorizing a word, it is often easy to confuse a word with its near-word shape. Therefore, it is better to search for near-word shape when memorizing a word.
Put some near words together to facilitate memory and discrimination.
For example:
complexitycomplicacycomplicatecomplicitysimplicity
Although many English learners and research institutions have compiled word libraries on the internet, unfortunately, no dictionary can provide word search functions.
Although wildcard characters can be used in dictionary software such as Kingsoft, It is very troublesome to use wildcard characters to search for near words.
I made a simple small program, which helped me memorize words.
Method
This problem is actually a word form approximation problem [4]. in fact, it is the same with spell checking [3] to automatically provide spell suggestion and spell correction based on word similarity,
Regular Expressions can also be used. However, you need to define more than n Deformation Rules by yourself. This is very troublesome.
Here, we mainly calculate the editing distance between two words (edit distance) [1.
A toy program is developed in Matlab to implement the near-word search function using the classic Levenshtein Distance [2] algorithm (for algorithm details, see Wikipedia and related articles. word-to-word matching uses dynamic planning, so the speed is fast.
In addition, Jaro-Winkler distance [5] and phonetic distance [6] can be considered. I only use the L distance here.
An intuitive example of L's algorithm:
Usage
- Provide the words you want to query;
- Provide the similarity threshold n (n is the edit distance, which means that two words can be matched through several-dimensional editing operations, insertion, deletion, and replacement );
- Provide the dictionary you want to query (a txt word list is used here );
After the algorithm is run, the word list is traversed to calculate the edit distance between each word in the dictionary and the word to be queried. Finally, a threshold value is used to filter out the most similar words.
Effect
The word library uses the four or six-level word library.
Set edit distance to 3.
There are four simple near words inserted into the vocabulary of level 4 and level 6 (counted as its own ):
Code
Main function:
%% this is the main function% Input :%wordToMatch - input word%distThresh- edit distance threshold, usually use 3%dicPath- file path of the word list file, txt format with every% line of a single word% % Output :%command window output % % Created by visionfans @ 2011.07.20function findSimilarWords(wordToMatch, distThresh, dicPath )global word;word = wordToMatch;%% check parametersswitch nargincase 0,error('Wrong arguments!');case 1,distThresh = 3;dicPath = '46.txt';end%% load word listwordList = loadWordList(dicPath);%% calculate edit distanceeditDist = cellfun(@calcEditDist,wordList);%% filter the similar wordssimilarWords = wordList(editDist < distThresh);%% display resultsfprintf('There are %d similar words with "%s" : \n', length(similarWords), word);cellfun(@(x)fprintf('\t%s\n', x),similarWords);end
L distance calculation function:
%% this function is used to calculate the Levenshtein Edit Distance% % S1 and S2 are two words you want to calculate their edit distance% % Created by visionfans @ 2011.07.20function dist = calcEditDist(s1,s2) global word; if nargin == 1 s2 = word; end %% calculate the edit distance with DP m = length(s1); n = length(s2); if m*n == 0 dist = Inf; return; end table = zeros(m,n); table(:,1) = 0:m-1; table(1,:) = 0:n-1; for i=2:m for j=2:n if s1(i-1)==s2(j-1) table(i,j) = table(i-1,j-1); else table(i,j) = 1 + min([(min(table(i-1,j),table(i,j-1))),table(i-1,j-1)]); end end end %% set result dist = table(m,n); return;end
Dictionary loading function:
%% this function is used to load the dictionary file% The dictionary file is a text file with the format of every line be a% single word.% % You can find a word list file with adequate common words here:% Kevin's Word List Page - http://wordlist.sourceforge.net/% % Created by visionfans @ 2011.07.20function wordList = loadWordList(dictPath) fprintf('Loading word list ...\n'); fid = fopen(dictPath); i = 1; tline = fgetl(fid); while ischar(tline) wordList{i,1} = tline; tline = fgetl(fid); i = i+1; end fclose(fid);end
Supplement
There are many more comprehensive dictionary files, which can be found in [7.
Thanks to the Jukuu network engineer YNYS for providing the word list and Thanks.
This article has been extended to [8]. If you are interested, you can perform the test.
--------------------------------------------------------------------------- For personal use, so I am too lazy to change C.
References
[1] Edit distance, http://nlp.stanford.edu/IR-book/html/htmledition/edit-distance-1.html
[2] Levenshtein distance-Wikipedia, the free encyclopedia, http://en.wikipedia.org/wiki/Levenshtein_distance
[3] How to Write a Spelling Corrector, http://norvig.com/spell-correct.html
[4] Approximate string matching-Wikipedia, the free encyclopedia, http://en.wikipedia.org/wiki/Approximate_string_matching
[5] Jaro-Winkler distance-Wikipedia, the free encyclopedia, http://en.wikipedia.org/wiki/Jaro-Winkler_distance
[6] Soundex-Wikipedia, the free encyclopedia, http://en.wikipedia.org/wiki/Soundex
[7] Dictionary & Glossary Links; Downloadable Word Lists, http://www.net-comber.com/wordurls.html
Vocabulary, http://download.csdn.net/source/3455828