Selenium+python parameterization: Read TXT file

Last Update:2015-12-02 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Overview

From the Selenium modularization article, we can see the necessity of parameterization, this article introduces the method of reading the external TXT file.

How to open a file

The following two functions can be applied to open a file:

1. Open (File_name,access_mode)

file_name: File path and name;

Access_mode: Access method, the specific parameters are as follows, and no parameters are provided, the default is R:

R: indicates read;
W: Indicates write;
A: means to add;
+: indicates read and write;
B: indicates 2 binary access;

2 , File function

The file () built-in function is equal to open (), as described in the documentation:

>>> help(open)

open(...)

File Object

Open a file using the file () type, returns a file object. The

Preferred to open a file. See file.__doc__ for further information. (END)

Read TXT in English

Next introduce the method of reading TXT file content, Python provides several ways to read the file, as follows;

Read () reads the entire file
ReadLines () reads the entire file by line
Readeline () reads a line of content by row

Now suppose that the read TXT file stores the test data for the user's login name and password, as follows:

Admin,adminguest,guesttest,test

This is a good way to get a file in a row-by-line manner, as in the following example:

#Coding:utf-8ImportCodecsdefStr_reader_txt (address): FP=open (Address,'R') Users=[] PWDs=[] Lines=Fp.readlines () forDatainchlines:name,pwd=data.split (',') name=name.strip ('\t\r\n') PWD=pwd.strip ('\t\r\n') users.append (name) pwds.append (PWD)Print "user:%s (len (%d))"%(Name,len (name))Print "pwd:%s (len (%d))"%(Pwd,len (pwd))returnUsers,pwds fp.close ()

The above through ReadLines () read TXT file content by line, and use the split () function to cut the string, respectively, to get the user name and password, you need to note that the read out of the word have the last face of the carriage return, it is necessary to filter the Strip function.

Read Chinese txt

But in the actual testing process, it may also be necessary to enter the Chinese user and password, can test pass? Modify the test document TXT user name is in Chinese, the content is as follows:

Administrator, admin Guest, guest tester, test

After executing the above script, the results are as follows:

As can be seen, the above script, in the Chinese processing, encountered an exception, the characters displayed garbled, the following two solutions:

Method One

#Coding:utf-8ImportCodecsdefStr_reader_txt (address): FP=open (Address,'R') Users=[] PWDs=[] Lines=Fp.readlines () forDatainchlines:Printtype (data) data=data.decode ("GB18030")#dealing with Chinese coding problems        Printtype (data) Name,pwd=data.split (',') name=name.strip ('\t\r\n') PWD=pwd.strip ('\t\r\n') users.append (name) pwds.append (PWD)Print "user:%s (len (%d))"%(Name,len (name))Print "pwd:%s (len (%d))"%(Pwd,len (pwd))returnUsers,pwds fp.close ()

The method is displayed after the code is decode ("GB18030") before the content is split, and the result is as follows

Reason Description

Referring to Unicode in Python, generally refers to Unicode objects, such as ' haha ' Unicode object is U ' \u54c8\u54c8 '
STR, which is a byte array, represents the format of the storage after encoding the Unicode object (which can be utf-8, GBK, cp936, GB2312). Here it is just a stream of words, no other meaning, if you want to make this byte stream display content meaningful, you must use the correct encoding format, decoding display.

In the above script run, use type (data) to print the data format before and after Decode, as follows:

As you can see, when the built-in open () method opens the file, read () reads the STR format:

Read () reads, if the parameter is str (and the content contains Chinese), after reading it needs to use the correct encoding format for decode (), after the conversion to Unicode characters, to display correctly.
Write (), if the parameter is Unicode, you need to encode () with the encoding you wish to write, and if it is a different encoded format str, you need to first decode () with that Str's encoding, Convert to Unicode and then use the written encoding for Encode ().

Method Two (recommended)

When the file is opened directly by specifying the use of GB18030 format read, you can directly operate, in addition, the method for the Chinese txt and English txt processing is applicable

#Coding:utf-8ImportCodecsdefStr_reader_txt (address): FP=codecs.open (Address,'R',"GB18030")    #Fp=open (address, ' R ')users=[] PWDs=[] Lines=Fp.readlines () forDatainchlines:name,pwd=data.split (',') name=name.strip ('\t\r\n') PWD=pwd.strip ('\t\r\n') users.append (name) pwds.append (PWD)Print "user:%s (len (%d))"%(Name,len (name))Print "pwd:%s (len (%d))"%(Pwd,len (pwd))returnUsers,pwds fp.close ()

Note: Codecs.getreader can also achieve the same effect, as follows:

#Coding:utf-8ImportCodecsdefstr_reader_txt_csv (address): F=file (Address,'RB') Users=[] PWDs=[] CSV=codecs.getreader ('GB18030') (f)#Codecs.getreaderf Method     forDatainchcsv:name,pwd=data.split (',') name=name.strip ('\t\r\n') PWD=pwd.strip ('\t\r\n') users.append (name) pwds.append (PWD)returnUsers,pwds f.close ()

Reason Description

Module codecs provides an open () method that can specify an encoding for opening a file, and using this method to open a file read returned will be Unicode.

When writing, if the parameter is Unicode, the encoding specified when using open () is encoded and then written;

If it is STR, it is decoded into Unicode and then the aforementioned operation according to the character encoding declared by the source code file. For the built-in open (), this method is less prone to coding problems, it is recommended to use

Why use the GB18030 encoding format

Here is the result of the comparison test, which shows the results of using the GB18030 and UTF-8 operations:

Under the Windows platform, the default document is saved in ANSI, and the ANSI code represents GB2312 encoding under the Simplified Chinese system.

When TXT is saved, when you modify the save format to UTF-8, you can use UTF-8 encoding to open it, but its character length differs for the following reasons:

One need to mention is the BOM (Byte Order Mark). When we save the file, the encoding used for the file is not saved, and when we open it we need to remember the encoding we used when we saved it and open it with this code, which creates a lot of trouble.

When the Notepad opened the file, it did not make the selected code? To open a TXT document saved in UTF-8 encoded format, open Notepad and then use file.

UTF introduces a BOM to represent its own encoding, and if the first few bytes read are one of them, then the encoding used to represent the text to be read is the corresponding encoding:

Bom_utf8 ' \XEF\XBB\XBF '
Bom_utf16_le ' \xff\xfe '
Bom_utf16_be ' \xfe\xff '

How can I get the contents of the BOM in the case of UTF-8 format files? Codec there is a method Codecs.bom_utf8 can refer to, here does not explain in detail

The differences and relations between GB2312, GBK and GB18030

Here is a reference link, http://www.zhihu.com/question/19677619

This article describes the more comprehensive and clear, summed up is:

GBK fully compatible with GB2312
GB 18030 is fully compatible with GB 2312, basic compatible with GBK, support GB 13000 and Unicode all the unified Chinese characters, a total of 70,244 Chinese characters.

GB 18030, full name: National standard GB 18030-2005 "information Technology Chinese coded character set", is the People's Republic of China is the latest internal code word set, GB 18030-2000 "Information Technology information interchange with Chinese character encoding set basic set of the expansion of the" revision.

Summary of Chinese processing process

The best way to work with Chinese data is as follows:
1. Decode early (early Decode, convert the contents of the file into Unicode and proceed to the next step)
2. Unicode everywhere (Unicode for program internal processing)
3. Encode Late (finally Encode back the required encoding, such as writing the final result into the result file)

Here are a few things to explain:
* The so-called "correct" encoding means that the specified encoding must be the same as the encoding of the string itself. This is actually not so easy to judge, generally speaking, we directly input the Simplified Chinese characters, there are two possible encodings: GB2312 (GBK, GB18030), and UTF-8
* GB2312, GBK, GB18030 are essentially the same coding standard. It just expands the number of characters on the former basis.
* UTF-8 and GB encoding not compatible

* Second, when you convert str to Unicode, you can use the following two methods: Convert gb2312 encoded STR to Unicode encoding

Unicode (str, ' gb2312 ')
Str.decode (' gb2312 ')

* In addition, when defining a string, Chinese is used, which is defined using str=u ' kanji '.

Resources

In-depth analysis of Python Chinese garbled problem

Http://www.jb51.net/article/26543.htm

Python character encoding in detail

Http://www.cnblogs.com/huxi/archive/2010/12/05/1897271.html

Detailed python Chinese coding and processing

http://my.oschina.net/leejun2005/blog/74430

Encode and decode of Python string--solving garbled problem

http://blog.csdn.net/lxdcyh/article/details/4018054

Selenium+python parameterization: Read TXT file

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Selenium+python parameterization: Read TXT file

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support