Some of the things that you learned about working with text data in Python

Source: Internet
Author: User
Tags python script

Recently wrote a python script that annotated the text with the Tagme API and parsed the returned JSON data. In this process encountered a lot of problems, learned something new, summed up.

1. csv file processing

A CSV is a formatted file that consists of rows and columns, and the delimiter can vary as needed. Only the delimiter is comma ', ' when it is displayed as a column in Excel.

Python's CSV module provides reader and writer functions to read and write data in CSV format.

csv.reader(csvfile, dialect= ' Excel ', **fmtparams)

csv.writer(csvfile, dialect= ' Excel ', **fmtparams)

CSVFile if an object can support an iterative operation, such as a file object or a list object.

**if csvfile is a file object, it must being opened with the ' B ' flag in platforms where that makes a diffe Rence.

The CSV module does not support input of Unicode characters, and all input should be UTF-8 encoded or ASCII.

Official Document: Https://docs.python.org/2/library/csv.html

2. Character encoding

The default character encoding for Python 2 is ASCII, so an exception is thrown when the stream of characters processed does not belong to the ASCII range Unicodeencodeerror: ... : Ordinal not in range (128).

One workaround is to modify the default encoding for Python 2, which can be declared directly in the program:

Import sysreload (SYS) sys.setdefaultencoding ('utf-8')

However, this method leaves some bugs for the program, which can be consulted as follows:

Http://blog.ernest.me/post/python-setdefaultencoding-unicode-bytes

3. JSON processing

Python provides a JSON module that can be used to parse a JSON-formatted string or file.

json.dump(obj, fp, skipkeys=false, ensure_ascii=true, check_circular=true, Allow_nan=true, cls=none, indent=none, separators=none, encoding= "Utf-8 ",default=none, sort_keys=false, **kw)

Serializes an object into a JSON-formatted data stream and outputs it to the file object.

json.dumps(obj, skipkeys=false, ensure_ascii=true, check_circular=true, allow_nan= True, cls=none, indent=none,separators=none, encoding= "Utf-8", default=none, sort_keys=false, **kw)

will be An object is serialized into a JSON-formatted string.

json . load fp [,  encoding [,   cls [,  object_hook [,  parse_ float [,  parse_int [,  parse_constant [,  object_pairs_hook [,  **kw span class= "optional" >]]] ]]]]) /span>

< span class= "optional" > < span class= "optional" > Loads a JSON-formatted file object as a Python object.

< Span class= "highlighted" >json. loads s [,&NBSP; encoding [,   cls [,  object_hook [,  parse_ float [,  parse_int [,  parse_constant [,  object_pairs_hook [, **kw ]]]]]< span class= "optional" >]]])

< span class= "optional" > < Span class= "Sig-paren" > < span class= "optional" > a JS The on format string is loaded as a Python object.

< span class= "optional" > < span class= "optional" > < span class= "optional" > Official document: https:// Docs.python.org/2.7/library/json.html?highlight=json

4. Traceback

Python provides a module traceback for handling exception stacks that can provide specific information about the current exception, such as the location of the exception, the statement that the exception occurred, the type of exception, and so on.

Traceback.print_exc (file=sys.stdout) #在终端中输出异常信息

Fp=open ("Error.txt", ' W ')

Traceback.print_exc (FILE=FP) #将错误信息输出到文件中

Traceback.format_exc () #将错误信息转化为字符串类型

Refer to this blog for the Python traceback module: Http://www.tuicool.com/articles/f2uumm

5. Formatted output

Http://www.pythondoc.com/pythontutorial3/inputoutput.html

6. File renaming

Import osos.rename (SRC,DST)

src--the name of the file to be modified, dst--the modified file name.

When renaming, if a new file name already exists, it will be reported as ' Windowserror: [Error 183] '.

Some of the things that you learned about working with text data in Python

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.