The past few days in the old code to find that the previous written comment section has a lot of misspelled words, these words are not ridiculous, you should be able to use tools to automatically correct most of the error. It's easy to write a spelling checker script in Python, which is even easier if you can take advantage of Aspell/ispell's ready-made gadgets.
Points
1, enter a misspelled word, call aspell-a to get some of the correct words, and then use distance editing further sunscreens to select more accurate words. For example, run aspell-a and enter ' Hella ' to get the following results:
Hell, Helli, Hello, heal, Heall, he ' ll, Hells, Heller, Ella, Hall, Hill, Hull, Hall, Heel, Hill, Hula, Hull, Helga, Helsa, Bella, Della, Mella, Sella, Fella, Halli, Hally, Hilly, Holli, Holly, Hallo, Hilly, Holly, Hullo, Hell ' s, Hell ' s
2. What is distance editing (edit-distance, also called Levenshtein algorithm)? That is, given a word, by inserting, deleting, swapping, and replacing single-character all possible correct spellings, such as input ' Hella ', after multiple insertions, deletions, swaps, and replacements single-character become:
' Helkla ', ' Hjlla ', ' Hylla ', ' Hellma ', ' Khella ', ' Iella ', ' Helhla ', ' hellag ', ' Hela ', ' Vhella ', ' Hhella ', ' hell ', ' Heglla ', ' Hvlla ', ' Hellaa ', ' Ghella ', ' hellar ', ' Heslla ', ' Lhella ', ' Helpa ', ' hello ', ...
3, combined with the results of the above 2 sets, and take into account some theoretical knowledge can improve the accuracy of the spelling, such as generally speaking the wrong words are unintentional or wrong, the probability of a complete error is very small, and the first letter of the word is generally not misspelled. So can be in the above set to remove the first letter of the non-conforming words, such as: ' Sella ', ' Mella ', Khella ', ' Iella ' and so on, here Vpsee do not delete the word, and put these words from the queue to put the queue at the end (priority reduction), so it really does not match the H The beginning of the word to match those words that begin with other letters.
4, the program uses the external tool Aspell, how to capture the input and output of external programs in python to process these inputs and outputs in a python program? Python 2.4 Introduces the subprocess module, which can be used with subprocess. Popen to deal with.
5, Google Daniel Peter Norvig wrote a How to write a spelling corrector is worth a look, Daniel is Daniel, 21 lines of Python to solve the spelling problem, but also without external tools, only need to read in advance a dictionary file. The EDITS1 function of this program is copy from the cattle family.
Code
#!/usr/bin/python# A simple spell checkerimport OS, sys, subprocess, Signalalphabet = ' ABCDEFGHIJKLMNOPQRSTUVWXYZ ' def fou nd (Word, args, cwd = None, Shell = True): Child = subprocess. Popen (args, Shell = shell, stdin = subprocess. PIPE, stdout = subprocess. PIPE, CWD = cwd, Universal_newlines = True) child.stdout.readline () (stdout, stderr) = Child.communicate (Word) If ":" In stdout: # remove \ n \ stdout = Stdout.rstrip ("\ n") # Remove left part until:left, candidates = s Tdout.split (":", 1) candidates = Candidates.split (",") # making an error in the first letter of a word was less # probable, so we remove those candidates and append them # to the tail of the queue, make them less precedence for ITE M in candidates:if item[0]! = word[0]: Candidates.remove (item) candidates.append (item) return Cand Idates else:return none# Copy from Http://norvig.com/spell-correct.htmldef edits1 (word): n = len (word) return set ([ WOrd[0:i]+word[i+1:] for I in range (n)] + [word[0:i]+word[i+1]+word[i]+word[i+2:] for I in range (n-1)] + [ Word[0:i]+c+word[i+1:] for I in range (n) for C in Alphabet] + [word[0:i]+c+word[i:] for I in range (n+1) for C in Alphab ET]) def correct (word): Candidates1 = Found (Word, ' aspell-a ') if not candidates1:print "no suggestion" return C Andidates2 = edits1 (word) candidates = [] for word in Candidates1:if Word in candidates2:candidates.append (wor D) If not Candidates:print "suggestion:%s"% candidates1[0] else:print "suggestion:%s"% max (candidates) def si Gnal_handler (signal, frame): Sys.exit (0) If __name__ = = ' __main__ ': signal.signal (signal. SIGINT, signal_handler) while true:input = Raw_input () correct (input)
A simpler approach
Of course, directly in the program to call the relevant module is the simplest, there is a library called Pyenchant support spell check, install Pyenchant and enchant can be directly in the Python program import:
>>> import enchant>>> d = enchant. Dict ("en_US") >>> D.check ("Hello") true>>> D.check ("Helo") false>>> d.suggest ("Helo") [' He Lo ', ' He-lo ', ' Hello ', ' helot ', ' help ', ' Halo ', ' Hell ', ' Held ', ' Helm ', ' Hero ', ' He ' ll ']>>>