Recently wrote a python script that annotated the text with the Tagme API and parsed the returned JSON data. In this process encountered a lot of problems, learned something new, summed up.
1. csv file processing
A CSV is a formatted file that consists of rows and columns, and the delimiter can vary as needed. Only the delimiter is comma ', ' when it is displayed as a column in Excel.
Python's CSV module provides reader and writer functions to read and write data in CSV format.
csv.
reader
(csvfile, dialect= ' Excel ', **fmtparams)
csv.
writer
(csvfile, dialect= ' Excel ', **fmtparams)
CSVFile if an object can support an iterative operation, such as a file object or a list object.
**if csvfile is a file object, it must being opened with the ' B ' flag in platforms where that makes a diffe Rence.
The CSV module does not support input of Unicode characters, and all input should be UTF-8 encoded or ASCII.
Official Document: Https://docs.python.org/2/library/csv.html
2. Character encoding
The default character encoding for Python 2 is ASCII, so an exception is thrown when the stream of characters processed does not belong to the ASCII range Unicodeencodeerror: ... : Ordinal not in range (128).
One workaround is to modify the default encoding for Python 2, which can be declared directly in the program:
Import sysreload (SYS) sys.setdefaultencoding ('utf-8')
However, this method leaves some bugs for the program, which can be consulted as follows:
Http://blog.ernest.me/post/python-setdefaultencoding-unicode-bytes
3. JSON processing
Python provides a JSON module that can be used to parse a JSON-formatted string or file.
json.
dump
(obj, fp, skipkeys=false, ensure_ascii=true, check_circular=true, Allow_nan=true, cls=none, indent=none, separators=none, encoding= "Utf-8 ",default=none, sort_keys=false, **kw)
Serializes an object into a JSON-formatted data stream and outputs it to the file object.
json.
dumps
(obj, skipkeys=false, ensure_ascii=true, check_circular=true, allow_nan= True, cls=none, indent=none,separators=none, encoding= "Utf-8", default=none, sort_keys=false, **kw)
will be An object is serialized into a JSON-formatted string.
json .
load
fp [,  encoding [, cls [, object_hook [, parse_ float [, parse_int [, parse_constant [, object_pairs_hook [, **kw span class= "optional" >]]] ]]]]) /span>
< span class= "optional" > < span class= "optional" > Loads a JSON-formatted file object as a Python object.
< Span class= "highlighted" >json.
loads
s [,&NBSP; encoding [, cls [, object_hook [, parse_ float [, parse_int [, parse_constant [, object_pairs_hook [, **kw ]]]]]< span class= "optional" >]]])
< span class= "optional" > < Span class= "Sig-paren" > < span class= "optional" > a JS The on format string is loaded as a Python object.
< span class= "optional" > < span class= "optional" > < span class= "optional" > Official document: https:// Docs.python.org/2.7/library/json.html?highlight=json
4. Traceback
Python provides a module traceback for handling exception stacks that can provide specific information about the current exception, such as the location of the exception, the statement that the exception occurred, the type of exception, and so on.
Traceback.print_exc (file=sys.stdout) #在终端中输出异常信息
Fp=open ("Error.txt", ' W ')
Traceback.print_exc (FILE=FP) #将错误信息输出到文件中
Traceback.format_exc () #将错误信息转化为字符串类型
Refer to this blog for the Python traceback module: Http://www.tuicool.com/articles/f2uumm
5. Formatted output
Http://www.pythondoc.com/pythontutorial3/inputoutput.html
6. File renaming
Import osos.rename (SRC,DST)
src--the name of the file to be modified, dst--the modified file name.
When renaming, if a new file name already exists, it will be reported as ' Windowserror: [Error 183] '.
Some of the things that you learned about working with text data in Python