The application of Python built-in function map, reduce, filter in text processing

Source: Internet
Author: User
Tags iterable pprint

A file consists of a number of rows that make up a list, and Python provides three functions that are useful for handling lists: Map, reduce, filter. So in text processing, you can use these three functions to achieve a more streamlined and clear code.

Map, reduce is the built-in function of Python, it has nothing to do with the map and reduce functions of hadoop, but the purpose is somewhat similar, and the map function does the preprocessing and reduce functions to aggregate.

The use of map, reduce and filter in text processing

The following is the content of a text file, the 1th column is the ID, the 4th column is the weight, our goal is to get all the IDs are odd rows, the weight of these rows doubled, and finally return the sum of the weights.

ID Key value Weight
1 name1 value1 11
2 name2 value2 12
3 Name3 value3 13
4 Name4 value4 14
5 Name5 Value5 15
6 Name6 Value6 16
7 Name7 Value7 17
8 Name8 Value8 18
9 Name9 value9 19
Ten Name10 Value10 20

The code for using the filter, map, and reduce functions is as follows;

The code is as follows Copy Code

#coding =utf8

'''
Created on 2013-12-15

@author: Www.111cn.net
'''
Import Pprint

def read_file (File_path):
'''
Read each line of the file, press T to split and return to the field list;
'''
With open (File_path, "R") as FP:
For line in FP:
Fields=line[:-1].split ("T")
Yield fields
Fp.close ()

Def is_even_lines (Fields):
'''
Determines whether the row in the first column is an even number;
'''
return int (fields[0])%2==0

Def double_weights (Fields):
'''
Double the value of each row's weight in this field
'''
Fields[-1]=int (Fields[-1]) *2
return fields

def sum_weights (Sum_value, fields):
'''
Add the number x to the number sum_value above;
Returns the new Sum_value value;
'''
Sum_value+=int (Fields[-1])
Return Sum_value

If __name__== "__main__":
#读取文件中的所有行
File_lines=[x for X in Read_file ("Test_data")]
Original line in print ' File: '
Pprint.pprint (File_lines)

print '----'

#过滤掉ID为偶数的行
Odd_lines=filter (Is_even_lines,file_lines)
print ' filters out rows with an even number of IDs: '
Pprint.pprint (Odd_lines)

print '----'

#将每行的权重值翻倍
Double_weights_lines=map (Double_weights,odd_lines)
print ' Doubles the weight value of each row: '
Pprint.pprint (Double_weights_lines)

print '----'

#计算所有的权重值的和
#由于传给sum函数的每个元素都是一个列表, you need to provide the cumulative initial value, which is specified as 0
Sum_val=reduce (sum_weights, double_weights_lines, 0)
Print ' Calculates the combination of weight values for each row: '
Print Sum_val

Run Result:

The original line in the file:

The code is as follows Copy Code
[[' 1 ', ' name1 ', ' value1 ', ' 11 '],
[' 2 ', ' name2 ', ' value2 ', ' 12 '],
[' 3 ', ' Name3 ', ' value3 ', ' 13 '],
[' 4 ', ' name4 ', ' value4 ', ' 14 '],
[' 5 ', ' name5 ', ' value5 ', ' 15 '],
[' 6 ', ' name6 ', ' value6 ', ' 16 '],
[' 7 ', ' Name7 ', ' value7 ', ' 17 '],
[' 8 ', ' Name8 ', ' Value8 ', ' 18 '],
[' 9 ', ' name9 ', ' value9 ', ' 19 '],
[' Ten ', ' Name10 ', ' value10 ', ' 20 ']]
----
Filter out rows with an even number of IDs:
[[' 2 ', ' name2 ', ' value2 ', ' 12 '],
[' 4 ', ' name4 ', ' value4 ', ' 14 '],
[' 6 ', ' name6 ', ' value6 ', ' 16 '],
[' 8 ', ' Name8 ', ' Value8 ', ' 18 '],
[' Ten ', ' Name10 ', ' value10 ', ' 20 ']]
----
Double the weight value of each row:
[[' 2 ', ' name2 ', ' value2 ', ' 24],
[' 4 ', ' name4 ', ' value4 ', 28],
[' 6 ', ' name6 ', ' Value6 ', 32],
[' 8 ', ' Name8 ', ' Value8 ', 36],
[' Ten ', ' Name10 ', ' Value10 ', 40]]
----
Calculate the combination of weight values per row:
160

Features of map, reduce, and filter functions

Filter Function: Returns a list of elements that meet the criteria, with a list as a parameter, similar to where in SQL A=1
map function: Takes a list as a parameter, handles each element, returns a list of these processed elements, similar to the Select A*2 in SQL
reduce function: Aggregates, totals, averages, etc. aggregate functions with list as parameters; SQL-like select SUM (a), average (b)
The official explanations of these functions

Map (function, iterable, ...)

Apply function to every item of iterable and return a list of the results. If additional iterable arguments are passed, function must take that many arguments and are applied to the Terables in parallel. If one iterable is shorter than another it are assumed to being extended with None items. The If function is None, the identity function is assumed; If there are multiple arguments, map () returns a list consisting of tuples containing the corresponding items from all ITE Rables (a kind of transpose operation). The iterable arguments may is a sequence or any iterable object; The result is always a list.

Reduce (function, iterable[, initializer])

Apply function of two arguments cumulatively to the ' items of iterable, from left to right, so as to reduce the iterable to A single value. For example, reduce (lambda x, Y:x+y, [1, 2, 3, 4, 5]) calculates ((((1+2) +3) +4). The left argument, X, are the accumulated value and the right argument, y, are the update value from the iterable. If the optional initializer is present, it's placed before the items of the iterable in the calculation, and serves as a Default when the iterable is empty. If initializer is isn't given and Iterable contains only one item, the ' the ' is returned. Roughly equivalent to:

The code is as follows Copy Code

def reduce (function, iterable, Initializer=none):
it = iter (iterable)
If initializer is None:
Try
initializer = next (IT)
Except stopiteration:
Raise TypeError (' reduce () of empty sequence with no initial value ')
Accum_value = initializer
For x in it:
Accum_value = function (Accum_value, x)
Return Accum_value

Filter (function, iterable)

Construct a list from those elements of iterable for which function returns TRUE. Iterable May is either a sequence, a container which supports, or an iteration. If iterable is a string or a tuple, the result also has that type; Otherwise it is always a list. The If function is None, the identity function is assumed, which is, all elements of iterable that are false are removed.

Note This filter (function, iterable) is equivalent to [item as item in Iterable if function (item)] If function are not Non E and [item for item in Iterable if item] if the IF function is None.

Itertools.ifilter () and Itertools.ifilterfalse () for iterator versions The This function, including a variation that fi Lters for elements where the function returns FALSE.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.