A file consists of a number of rows that make up a list, and Python provides three functions that are useful for handling lists: Map, reduce, filter. So in text processing, you can use these three functions to achieve a more streamlined and clear code.
Map, reduce is the built-in function of Python, it has nothing to do with the map and reduce functions of hadoop, but the purpose is somewhat similar, and the map function does the preprocessing and reduce functions to aggregate.
The use of map, reduce and filter in text processing
The following is the content of a text file, the 1th column is the ID, the 4th column is the weight, our goal is to get all the IDs are odd rows, the weight of these rows doubled, and finally return the sum of the weights.
ID Key value Weight
1 name1 value1 11
2 name2 value2 12
3 Name3 value3 13
4 Name4 value4 14
5 Name5 Value5 15
6 Name6 Value6 16
7 Name7 Value7 17
8 Name8 Value8 18
9 Name9 value9 19
Ten Name10 Value10 20
The code for using the filter, map, and reduce functions is as follows;
The code is as follows |
Copy Code |
#coding =utf8
''' Created on 2013-12-15
@author: Www.111cn.net ''' Import Pprint
def read_file (File_path): ''' Read each line of the file, press T to split and return to the field list; ''' With open (File_path, "R") as FP: For line in FP: Fields=line[:-1].split ("T") Yield fields Fp.close ()
Def is_even_lines (Fields): ''' Determines whether the row in the first column is an even number; ''' return int (fields[0])%2==0
Def double_weights (Fields): ''' Double the value of each row's weight in this field ''' Fields[-1]=int (Fields[-1]) *2 return fields
def sum_weights (Sum_value, fields): ''' Add the number x to the number sum_value above; Returns the new Sum_value value; ''' Sum_value+=int (Fields[-1]) Return Sum_value
If __name__== "__main__": #读取文件中的所有行 File_lines=[x for X in Read_file ("Test_data")] Original line in print ' File: ' Pprint.pprint (File_lines)
print '----'
#过滤掉ID为偶数的行 Odd_lines=filter (Is_even_lines,file_lines) print ' filters out rows with an even number of IDs: ' Pprint.pprint (Odd_lines)
print '----' #将每行的权重值翻倍 Double_weights_lines=map (Double_weights,odd_lines) print ' Doubles the weight value of each row: ' Pprint.pprint (Double_weights_lines)
print '----'
#计算所有的权重值的和 #由于传给sum函数的每个元素都是一个列表, you need to provide the cumulative initial value, which is specified as 0 Sum_val=reduce (sum_weights, double_weights_lines, 0) Print ' Calculates the combination of weight values for each row: ' Print Sum_val |
Run Result:
The original line in the file:
The code is as follows |
Copy Code |
[[' 1 ', ' name1 ', ' value1 ', ' 11 '], [' 2 ', ' name2 ', ' value2 ', ' 12 '], [' 3 ', ' Name3 ', ' value3 ', ' 13 '], [' 4 ', ' name4 ', ' value4 ', ' 14 '], [' 5 ', ' name5 ', ' value5 ', ' 15 '], [' 6 ', ' name6 ', ' value6 ', ' 16 '], [' 7 ', ' Name7 ', ' value7 ', ' 17 '], [' 8 ', ' Name8 ', ' Value8 ', ' 18 '], [' 9 ', ' name9 ', ' value9 ', ' 19 '], [' Ten ', ' Name10 ', ' value10 ', ' 20 ']] ---- Filter out rows with an even number of IDs: [[' 2 ', ' name2 ', ' value2 ', ' 12 '], [' 4 ', ' name4 ', ' value4 ', ' 14 '], [' 6 ', ' name6 ', ' value6 ', ' 16 '], [' 8 ', ' Name8 ', ' Value8 ', ' 18 '], [' Ten ', ' Name10 ', ' value10 ', ' 20 ']] ---- Double the weight value of each row: [[' 2 ', ' name2 ', ' value2 ', ' 24], [' 4 ', ' name4 ', ' value4 ', 28], [' 6 ', ' name6 ', ' Value6 ', 32], [' 8 ', ' Name8 ', ' Value8 ', 36], [' Ten ', ' Name10 ', ' Value10 ', 40]] ---- Calculate the combination of weight values per row: 160 |
Features of map, reduce, and filter functions
Filter Function: Returns a list of elements that meet the criteria, with a list as a parameter, similar to where in SQL A=1
map function: Takes a list as a parameter, handles each element, returns a list of these processed elements, similar to the Select A*2 in SQL
reduce function: Aggregates, totals, averages, etc. aggregate functions with list as parameters; SQL-like select SUM (a), average (b)
The official explanations of these functions
Map (function, iterable, ...)
Apply function to every item of iterable and return a list of the results. If additional iterable arguments are passed, function must take that many arguments and are applied to the Terables in parallel. If one iterable is shorter than another it are assumed to being extended with None items. The If function is None, the identity function is assumed; If there are multiple arguments, map () returns a list consisting of tuples containing the corresponding items from all ITE Rables (a kind of transpose operation). The iterable arguments may is a sequence or any iterable object; The result is always a list.
Reduce (function, iterable[, initializer])
Apply function of two arguments cumulatively to the ' items of iterable, from left to right, so as to reduce the iterable to A single value. For example, reduce (lambda x, Y:x+y, [1, 2, 3, 4, 5]) calculates ((((1+2) +3) +4). The left argument, X, are the accumulated value and the right argument, y, are the update value from the iterable. If the optional initializer is present, it's placed before the items of the iterable in the calculation, and serves as a Default when the iterable is empty. If initializer is isn't given and Iterable contains only one item, the ' the ' is returned. Roughly equivalent to:
The code is as follows |
Copy Code |
def reduce (function, iterable, Initializer=none): it = iter (iterable) If initializer is None: Try initializer = next (IT) Except stopiteration: Raise TypeError (' reduce () of empty sequence with no initial value ') Accum_value = initializer For x in it: Accum_value = function (Accum_value, x) Return Accum_value |
Filter (function, iterable)
Construct a list from those elements of iterable for which function returns TRUE. Iterable May is either a sequence, a container which supports, or an iteration. If iterable is a string or a tuple, the result also has that type; Otherwise it is always a list. The If function is None, the identity function is assumed, which is, all elements of iterable that are false are removed.
Note This filter (function, iterable) is equivalent to [item as item in Iterable if function (item)] If function are not Non E and [item for item in Iterable if item] if the IF function is None.
Itertools.ifilter () and Itertools.ifilterfalse () for iterator versions The This function, including a variation that fi Lters for elements where the function returns FALSE.