Python implements word spelling check and python word spelling check
In the past few days, I found many spelling mistakes in the comments I wrote in the previous Code. These mistakes are not outrageous. I should be able to use tools to automatically correct most of them. It is easy to write a spelling check script in Python. It is easier to use the ready-made tools such as aspell and ispell.
Key Points
1. Enter a misspelled word, call aspell-a to obtain some correct candidate words, and then use distance editing to further extract more accurate words. For example, run aspell-a and enter 'hella' to get the following results:
Hell, Helli, hello, heal, Heall, he'll, hells, Heller, Ella, Hall, Hill, Hull, hall, heel, hill, hula, hull, Helga, Helsa, bella, Della, Mella, Sella, fella, Halli, Hally, Hilly, Holli, Holly, hallo, hilly, holly, hullo, Hell's, hell's
2. What is Edit-Distance, also called Levenshtein algorithm? That is to say, given a word, after multiple insert, delete, exchange, or replace a single character operations, all possible spelling is given, such as inputting 'hella ', after multiple insert, delete, swap, and replace single-character operations, the result is:
'Helkla', 'hjla', 'hyler', 'hellma', 'khella', 'iella ', 'helhla', 'hellag', 'ha', 'vhella ', 'hhella', 'hell ', 'hegler', 'hvlla', 'hellaa', 'ghella', 'hellar ', 'hesler', 'lhela', 'helpa ', 'Hello ',...
3. Combining the results of the above two sets, and considering some theoretical knowledge, the accuracy of spelling check can be improved. For example, if a wrong word is written unintentionally or by mistake, the possibility of a completely wrong word is very small, in addition, the first letter of a word is generally not misspelled. Therefore, you can remove the words that do not match the first letter in the above set, such as 'sella', 'mella ', khella', and 'iella. VPSee does not delete words here, these words are extracted from the queue and placed at the end of the queue (with lower priority). Therefore, words starting with h cannot match those starting with other letters.
4. The program uses the external tool aspell. How can I capture the input and output of the external program in Python to process the input and output in the Python program? The subprocess module is introduced in Python 2.4 and can be processed using subprocess. Popen.
5. Google Daniel Peter Norvig wrote an article on How to Write a Spelling Corrector which is worth seeing. Daniel is Daniel, and 21 lines of Python solves Spelling problems without external tools, you only need to read a dictionary file in advance. The edits1 function of this program is copied from niujia.
Code
#!/usr/bin/python# A simple spell checkerimport os, sys, subprocess, signalalphabet = 'abcdefghijklmnopqrstuvwxyz'def found(word, args, cwd = None, shell = True): child = subprocess.Popen(args, shell = shell, stdin = subprocess.PIPE, stdout = subprocess.PIPE, cwd = cwd, universal_newlines = True) child.stdout.readline() (stdout, stderr) = child.communicate(word) if ": " in stdout: # remove \n\n stdout = stdout.rstrip("\n") # remove left part until : left, candidates = stdout.split(": ", 1) candidates = candidates.split(", ") # making an error on the first letter of a word is less # probable, so we remove those candidates and append them # to the tail of queue, make them less priority for item in candidates: if item[0] != word[0]: candidates.remove(item) candidates.append(item) return candidates else: return None# copy from http://norvig.com/spell-correct.htmldef edits1(word): n = len(word) return set([word[0:i]+word[i+1:] for i in range(n)] + [word[0:i]+word[i+1]+word[i]+word[i+2:] for i in range(n-1)] + [word[0:i]+c+word[i+1:] for i in range(n) for c in alphabet] + [word[0:i]+c+word[i:] for i in range(n+1) for c in alphabet])def correct(word): candidates1 = found(word, 'aspell -a') if not candidates1: print "no suggestion" return candidates2 = edits1(word) candidates = [] for word in candidates1: if word in candidates2: candidates.append(word) if not candidates: print "suggestion: %s" % candidates1[0] else: print "suggestion: %s" % max(candidates)def signal_handler(signal, frame): sys.exit(0)if __name__ == '__main__': signal.signal(signal.SIGINT, signal_handler) while True: input = raw_input() correct(input)
Simpler Method
Of course, it is easiest to directly call the relevant modules in the program. A library called PyEnchant supports spelling check. After installing PyEnchant and Enchant, you can directly import them in the Python program:
>>> import enchant>>> d = enchant.Dict("en_US")>>> d.check("Hello")True>>> d.check("Helo")False>>> d.suggest("Helo")['He lo', 'He-lo', 'Hello', 'Helot', 'Help', 'Halo', 'Hell', 'Held', 'Helm', 'Hero', "He'll"]>>>