This article will be solved by a few aspects.
1, the main function of the program
2. Realization Process
3. Definition of Class
4. Dynamically update each object with the generator generator and return the object
5. Use strip to remove unnecessary characters
6. Rematch Matching string
7. Using Timestrptime to extract strings into time objects
8. Complete code
Main functions of the program
Now there's a table-like document that stores user information: The first line is a property, each property is separated by a comma (,), and the second line starts with a value for each property, and each row represents a user. How do I read this document and output one user object per line?
There are also 4 small requirements:
Each document is large, and memory crashes if you save as many objects as a list of all rows generated at once. Only one row-generated object can be saved at a time in a program.
Each string separated by commas, may have double quotation marks (") or single quotation marks ('), such as" Zhang San ", to remove the quotation marks, if the number, there are +000000001.24 such, to the front of the + and 0 are removed, extract 1.24
There is time in the document, which may be 2013-10-29, or 2013/10/29 2:23:56. To convert such a string to a time type
This kind of document has many, each property is different, for example this is the user's information, that is the call record. So what are the specific attributes in the class that are dynamically generated based on the first line of the document?
Implementation process
1. Definition of class
Because properties are dynamically added, property-value pairs are also dynamically added, with the class containing updateAttributes()
and updatePairs()
two member functions, and by storing the properties in the list attributes
, the dictionary attrilist
stores the mappings. Where init()
the function is a constructor. The __attributes
underscore indicates a private variable and cannot be called directly outside. It can only be instantiated a=UserInfo()
with no parameters.
Class UserInfo (object): ' Class to restore UserInformation ' def __init__ (self): self.attrilist={} self.__ Attributes=[] def updateattributes (self,attributes): self.__attributes=attributes def updatepairs (self,values ): For I in range (Len (values)): Self.attrilist[self.__attributes[i]]=values[i]
2. Dynamically update each object with the generator (generator) and return the object
The generator is equivalent to a function that can be automatically run multiple times once, and each loop returns a result. However return
, the function returns the result, and the generator yield
returns the result. Each run is yield
returned, and the next run yield
starts after. For example, we implement the Fibonacci sequence, using functions and generators, respectively:
def fib (max): N, a, b = 0, 0, 1 while n < max: print (b) A, B = B, a + b n = n + 1 return ' done '
We calculate the first 6 numbers of a series:
>>> fib (6) 112358 ' Done '
If you use a generator, just print
change yield
it. As follows:
def fib (max): N, a, b = 0, 0, 1 while n < max: yield b A, B = B, a + b n = n + 1
How to use:
>>> f = fib (6) >>> F<generator object fib at 0x104feaaa0>>>> for i in F: ... Print (i) ... 112358>>>
As you can see, the generator fib itself is an object, and each execution to yield breaks back one result and continues yield
the next line of code from the next. The generator can also be used for generator.next()
execution.
In my program, the generator section code is as follows:
def objectgenerator (maxlinenum): Filename= '/home/thinkit/documents/usr_info/user.csv ' attributes=[] linenum=1 a= UserInfo () file=open (filename) while linenum < maxlinenum: values=[] line=str.decode (File.readline (), ' gb2312 ') #linecache. getline (filename, linenum, ' gb2312 ') if line== ': print ' reading fail! Please check filename! ' Break str_list=line.split (', ') for item in Str_list: item=item.strip () item=item.strip (' \ "') Item=item.strip (' \ ') item=item.strip (' +0* ') item=catchtime (item) if linenum==1: Attributes.append (item) else: values.append (item) if linenum==1: a.updateattributes ( attributes) else: a.updatepairs (values) yield a.attrilist #change to ' a ' to use linenum = LineNum + 1
Where the a=UserInfo()
class is UserInfo
instantiated. Because the document is GB2312 encoded, the corresponding decoding method is used. Because the first row is a property, there is a function to save the property list UserInfo
in, that is, the updateAttributes();
next row to read the property-value pairs into a dictionary to store. p.s.python
the dictionary in the equivalent map (map).
3. Use strip to remove unnecessary characters
From the code above, you can see str.strip(somechar)
the characters that are used to remove Str before and after somechar
. somechar
can be a symbol, or it can be a regular expression, as above:
Item=item.strip () #除去字符串前后的所有转义字符, such as \t,\n item=item.strip (' \ "') #除去前后的" Item=item.strip (' \ ') item=item.strip (' +0* ') #除去前后的 +00...00,* indicates that the number of 0 can be any number or no
4.re.match Matching string
function Syntax:
Re.match (Pattern, string, flags=0)
Function parameter Description:
Parameter description
Pattern-matched Regular expression
String to match.
Flags flags that govern how regular expressions are matched, such as case sensitivity, multiline matching, and so on.
If the match succeeds, the Re.match method returns a matching object, otherwise none is returned. `
>>> s= ' 2015-09-18 '
>>> Matchobj=re.match (R ' \d{4}-\d{2}-\d{2} ', S, flags= 0)
>>> Print Matchobj
<_sre. Sre_match Object at 0x7f3525480f38>
1
2
3
4
5
5. Using Time.strptime to extract a string into a time object
In the time
module, time.strptime(str,format)
You can convert the str
format
format to a time object, format
in the common format is:
%y Two-digit year representation (00-99)
%Y Four-digit year representation (000-9999)
%m Month (01-12)
One day in%d months (0-31)
%H 24-hour hours (0-23)
%I 12-hour hours (01-12)
%M minutes (00=59)
%s seconds (00-59)
In addition, you need to use the re
module, with regular expressions, to match the string to see if it is a general time format, such as YYYY/MM/DD H:M:S, YYYY-MM-DD
In the above code, the function catchtime is to determine whether the item is a time object, which translates to a time object.
The code is as follows:
Import Timeimport redef Catchtime (item): # Check if it ' s time Matchobj=re.match (R ' \d{4}-\d{2}-\d{2} ', item, flags= 0) if Ma tchobj!= None: Item =time.strptime (item, '%y-%m-%d ') #print "returned time:%s"%item return Item Else: Matchobj=re.match (R ' \d{4}/\d{2}/\d{2}\s\d+:\d+:\d+ ', item,flags=0) if matchobj!= None: item = Time.strptime (item, '%y/%m/%d%h:%m:%s ') #print "returned time:%s"%item return item
Full code:
Import Collectionsimport timeimport Reclass UserInfo (object): ' Class to restore UserInformation ' def __init__ (self): sel F.attrilist=collections. Ordereddict () # ordered self.__attributes=[] def updateattributes (self,attributes): Self.__attributes=attributes def Updatepairs (self,values): For I in range (len values): Self.attrilist[self.__attributes[i]]=values[i]def catchtime ( Item): # Check if it ' s time Matchobj=re.match (R ' \d{4}-\d{2}-\d{2} ', item, flags= 0) if matchobj!= none:item =time.strpti Me (item, '%y-%m-%d ') #print "returned time:%s"%item return item Else:matchobj=re.match (R ' \d{4}/\d{2}/\d{2}\s\d+:\d+: \d+ ', item,flags=0) if matchobj!= none:item =time.strptime (item, '%y/%m/%d%h:%m:%s ') #print "returned time:%s"%i TEM return itemdef objectgenerator (maxlinenum): Filename= '/home/thinkit/documents/usr_info/user.csv ' attributes=[] Linenum=1 A=userinfo () file=open (filename) while LineNum < maxlinenum:values=[] Line=str.decode (File.readline (), ' gb2312 ') #linecachE.getline (filename, linenum, ' gb2312 ') if line== ': print ' reading fail! Please check filename! ' Break Str_list=line.split (', ') for item in Str_list:item=item.strip () Item=item.strip (' \ "') Item=item.strip (' \ ') Item=item.strip (' +0* ') item=catchtime (item) if Linenum==1:attributes.append (item) Else:values.append (item If Linenum==1:a.updateattributes (attributes) Else:a.updatepairs (values) yield a.attrilist #change to ' a ' to Use LineNum = linenum +1if __name__ = = ' __main__ ': for n in objectgenerator: print n #输出字典 to see if it is correct
Summarize
The above is the whole content of this article, I hope that everyone's study or work to bring certain help, if there are questions you can message exchange, thank you for the support of topic.alibabacloud.com.