Python uses the list or dict field mode to read files,

Source: Internet
Author: User
Tags processing text pprint

Python uses the list or dict field mode to read files,

Preface

Python is a powerful tool for processing text data. It is extremely simple to read, split, filter, and convert, so developers do not need to consider complicated stream file processing processes (compared with JAVA, hey ). Some complex text data processing and computing tasks, including writing Streaming programs on HADOOP, are completed in Python.

In the process of text processing, loading a file into memory is the first step. This involves how to map a column in the file to a specific variable. The most stupid way is to do this, it is referenced according to the subscript of the field, for example:

# Fields reads a row, and the list user_id = fields [0] user_name = fields [1] user_type = fields [2]

If you read the file in this way, once the file has a sequence, increase or decrease column changes, code maintenance is a nightmare. This code must be eliminated.

We recommend two elegant methods to read data: Configuring the field mode and then reading data in the mode, while the dictionary mode and list mode;

Reads files and splits them into field data lists based on delimiters.

First, read the file, split the data of each row according to the delimiter, and return the field list for subsequent processing.

The Code is as follows:

Def read_file_data (filepath): ''' the file is read by row based on the path. The parameter filepath: the absolute path of the file @ param filepath: The Path to read the file @ return: '''fin = open (filepath, 'R') for line in fin: try: line = line [: -1] if not line: continue failed T: continue try: fields = line. split ("\ t") failed t: continue # Throw the current row's split list yield fields fin. close ()

The yield keyword is used to throw the split data of a single row each time, so thatfor fields in read_file_data(fpath)To read each row.

Method 1: assemble the read data list using the configured dictionary Mode

In this way, configure a {"field name": field location} dictionary as the data mode, then assemble the list data read in this mode, and finally access the data in the dictionary mode.

Functions used:

@ Staticmethoddef map_fields_dict_schema (fields, dict_schema): "" return the corresponding values of the mode and data value based on the field mode. For example, fields is ['A', 'B ', 'C'], schema is {'name': 0, 'age': 1}, then {'name': 'A', 'age' is returned ': 'B'} @ param fields: An array containing data. Generally, it is obtained by dividing a Line String by \ t. @ param dict_schema: a dictionary with key as the field name, value is the position of the field; @ return: dictionary, key is the field name, value is the field value "" pdict = {} for fstr, findex in dict_schema.iteritems (): pdict [fstr] = str (fields [int (findex)]) return pdict

With this method and the previous method, you can read data in the following ways:

# Coding: utf8 "@ author: www.crazyant.net test the advantage of loading a data list in dictionary mode: For multi-column files, you only need to configure the fields to be read, disadvantages of reading the data of the corresponding column: if there are many fields and the location of each field is configured, it is more difficult to "" import file_utilimport pprint # The dictionary mode to be read after configuration, you can configure only the positions of the columns you care about. dict_schema = {"userid": 0, "username": 1, "usertype": 2} for fields in file_util.FileUtil.read_file_data ("userfile.txt "): # map the field list to dict_fields = file_util.FileUtil.map_fields_dict_schema (fields, dict_schema) pprint in dictionary mode. pprint (dict_fields)

Output result:

{'userid': '1', 'username': 'name1', 'usertype': '0'}{'userid': '2', 'username': 'name2', 'usertype': '1'}{'userid': '3', 'username': 'name3', 'usertype': '2'}{'userid': '4', 'username': 'name4', 'usertype': '3'}{'userid': '5', 'username': 'name5', 'usertype': '4'}{'userid': '6', 'username': 'name6', 'usertype': '5'}{'userid': '7', 'username': 'name7', 'usertype': '6'}{'userid': '8', 'username': 'name8', 'usertype': '7'}{'userid': '9', 'username': 'name9', 'usertype': '8'}{'userid': '10', 'username': 'name10', 'usertype': '9'}{'userid': '11', 'username': 'name11', 'usertype': '10'}{'userid': '12', 'username': 'name12', 'usertype': '11'}

Method 2: assemble the read data list in the configured list mode.

If you want to read all the columns of the file or the previous columns, configuring the dictionary mode is complicated because you need to configure the index location for each field, in addition, these locations are counted after 0, which are low-level labor and need to be eliminated.

The list mode is born of fate. First, the configured list mode is converted to the dictionary mode, and then loaded according to the dictionary.

Conversion Mode and Code read in list mode:

@ Staticmethoddef transform_list_to_dict (para_list): "" converts ['A', 'B'] to {'A': 0, 'B ': 1} form @ param para_list: List, which contains the field name @ return: dictionary corresponding to each column, the ing between field names and positions is "res_dict = {} idx = 0 while idx <len (para_list): res_dict [str (para_list [idx]). strip ()] = idx + = 1 return res_dict @ staticmethoddef map_fields_list_schema (fields, list_schema): "" return the corresponding values of the mode and data value based on the field mode; for example, if fields is ['A', 'B', 'C'], schema is {'name', 'age'}, then {'name' is returned ': 'A', 'age': 'B'} @ param fields: An array containing data. Generally, @ param list_schema is obtained by dividing a Line String by \ t: list of column names @ return: dictionary, key is the field name, and value is the field value "" dict_schema = FileUtil. transform_list_to_dict (list_schema) return FileUtil. map_fields_dict_schema (fields, dict_schema)

You can use the configuration mode in the form of a list. You do not need to configure indexes more easily:

# Coding: utf8 "@ author: www.crazyant.net test the advantages of loading a data list in list mode: If you read all columns, in list mode, you only need to write the field names of each column in order. Disadvantages: you cannot read only the fields you care about, you need to read all "" import file_utilimport pprint # The list mode to be read. You can only configure the previous column, or all list_schema = ["userid", "username ", "usertype"] for fields in file_util.FileUtil.read_file_data ("userfile.txt"): # map the field list to the dictionary mode dict_fields = file_util.FileUtil.map_fields_list_schema (fields, list_schema) pprint. pprint (dict_fields)

The running result is exactly the same as that in dictionary mode.

All file_util.py code

The following are all the code in file_util.py, which can be used in your own public class library.

#-*-Encoding: utf8-*-''' @ author: www.crazyant.net @ version: 2014-12-5 ''' class FileUtil (object ): '''common operation methods for files and paths ''' @ staticmethod def read_file_data (filepath): ''' the file is read by row based on the path. The parameter is filepath: absolute path of the file @ param filepath: path to the file to be read @ return: list of data in each row split by \ t ''' fin = open (filepath, 'R ') for line in fin: try: line = line [:-1] if not line: continue limit T: continue try: fields = line. split ("\ t") failed t: continue # Throw the current row's split list yield fields fin. close () @ staticmethod def transform_list_to_dict (para_list): "" converts ['A', 'B'] to {'A': 0, 'B ': 1} form @ param para_list: List, which contains the field name @ return: dictionary corresponding to each column, the ing between field names and positions is "res_dict = {} idx = 0 while idx <len (para_list): res_dict [str (para_list [idx]). strip ()] = idx + = 1 return res_dict @ staticmethod def map_fields_list_schema (fields, list_schema): "returns the corresponding values of the mode and data value based on the field mode; for example, if fields is ['A', 'B', 'C'], schema is {'name', 'age'}, then {'name' is returned ': 'A', 'age': 'B'} @ param fields: An array containing data. Generally, @ param list_schema is obtained by dividing a Line String by \ t: list of column names @ return: dictionary, key is the field name, and value is the field value "" dict_schema = FileUtil. transform_list_to_dict (list_schema) return FileUtil. map_fields_dict_schema (fields, dict_schema) @ staticmethoddef map_fields_dict_schema (fields, dict_schema): "" returns the corresponding values of the mode and data value based on the field mode. For example, fields is ['A ', 'B', 'C'], schema is {'name': 0, 'age': 1}, then {'name': 'A' is returned ', 'age': 'B'} @ param fields: An array containing data. Generally, @ param dict_schema is obtained by dividing a Line String by \ t, key is the field name, value is the field location; @ return: dictionary, key is the field name, value is the field value "" pdict = {} for fstr, findex in dict_schema.iteritems (): pdict [fstr] = str (fields [int (findex)]) return pdict

Summary

The above is all about this article. I hope this article will help you learn or use python. If you have any questions, please leave a message.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.