Python's Data Processing learning (II.)

Source: Internet
Author: User

This article refers to the book "Head First Python" by Paul Barry, which can be downloaded from the http://python.itcarlow.ie/site. If there is any fallacy in this article, I hope to enlighten you

two. code Module1. Prepare to learn (1) data read
With
Open (james.txt) as JAF: #打开文件 data = Jaf.readline () #读数据行 James =d Ata.strip (). Split (', ') #将数据转换为列表
Description: Data.strip (). Split (', ') is called a method string chain, and strip () is applied to data rows in it, removing all whitespace characters from the string, and processing the result by split (', ') of the second method, split (', ') Indicates that the result is split in form, returning the list. (2) Data cleansing definition function sanitize () to unify the list format of each contestant's performance into the mins.secs format
def sanitize (time_string): if '-' in time_string: splitter = '-' if ': ' In Time_string: splitter = ': ' else: return (time_string) (mins,secs) = t Ime_string.split (splitter) return (mins + '. ' + secs)
Description: Split is a built-in function that represents the decomposition of a string(3) Conversion List---Deduction list For example, the General list conversion method and the way to use the derivation list:
clean_mikey = [] #列表创建for each_t in Mikey: #迭代 clean_mikey.append (sanit Ize (each_t)) #转换与追加
Equivalent to
Clean_mikey = [Sanitize (each_t) for each_t in Mikey]
Description: Sanitize () is a custom data cleansing function, and the built-in function sorted is to sort the entire list(4) Delete duplicate data--not in list operation method:
unique_james = []for each_t in James: if each_t not in Unique_james: Unique_j Ames.append (each_t)
Collection Action method: (The Python collection is characterized by the unordered nature of the data items in the collection, and does not allow duplicates) example: 
distances = Set (James)
(5) "Shard" To access multiple list items in a list
print (sorted (set ([Sanitize (T)] for T in James])) [0:3]
)
(6) Change multiple repeating code to function
def get_coach_data (filename): try: with open (filename) as AF: re Turn (Data.strip (). Split (', ')) except IOError as Ioerr: print (' File error: ' + str (ioerr)) ' c7> return (None)
2. Customizing Data Objects(1) New data format, james2.txt,julie2.txt,mikey2.txt,sarah2.txt, files are opened as follows: (full name, date of birth, training score)
James Lee,2002-3-14,2-34,3:21,2.34,2.45,3.01,2:01,2:01,3:10,2-22,2-01,2.01,2:16julie Jones, 2002-8-17,2.59,2.11,2:11,2:23,3-10,2-23,3:10,3.21,3-21,3.01,3.02,2:59sarah Sweeney, 2002-6-17,2:58,2.58,2:39,2-25,2-55,2:54,2.18,2:55,2:55,2:22,2-21,2.22mikey McManus, 2002-2-24,2:22,3.01,3:01,3.02,3:02,3.02,3:22,2.49,2:38,2:40,2.22,2-31
(2) Data extraction: (Take Sarah for example)
Sarah = get_coach_data (' sarah2.txt ') (SARAH_NAME,SARAH_DOB) = Sarah.pop (0), Sarah.pop (0)pop (0) The call deletes and returns the first item in the list and assigns a value to the specified variable name and date of birth
(3) using a dictionary to correlate data, a dictionary is a built-in data structure that allows data to be associated with keys rather than numbers so that the in-memory data is aligned with the structure of the actual data. For example, key associated data name--> Sarah Sweeney dob--> 2002-6-17 times--> 2:58,2.58,2:39,2-25,2-55,2:54,2.18,2:55,2:55,2:22,2-21,2.22 How to create a dictionary: Brace creation: Cleese = {} factory function: Palin = di CT () adds data in two ways:cleese[' Name '] = ' John Cleese 'palin = {' Name ': ' Michael Palin '}(4) Application:sarah_data = {}sarah_data[' Name '] = Sarah.pop (0)sarah_data[' DOB ' = Sarah.pop (0)sarah_data[' times ' = Sarahprint (sarah_data[' Name ' + ' s fastest times is: "+ str (sorted (set[sanitize (t) for T in sarah_data[' Times ']) ) [0:3]))(5) Complete the creation of the dictionary at once and return to the dictionary
def get_coach_data (filename): Try: With open (filename) as F: data = F.readline () Templ = Data.strip (). Split (', ') return ({' Name ': Templ.pop (0), ' DOB ': Templ.pop (0), ' Times ': Str (sorted (Set ([Sanitize (t) for T in Templ])) [0:3])}) except IOError as Ioerr: print (' File error: ' + str (ioerr)) return (None)  
(6) package the code and its data in a class
class Athlete: def __init__ (self,a_name,a_dob,a_times=[]): self.name = A_name Self.dob=a_dob Self.times=a_times     def top3 (self): return (sorted (set ([Sanitize (t) for T in Self.times])) [0:3] )     def get_coach_data (filename):        Try With open (filename) as F: data = F.readline () Templ = Data.strip (). Split (', ') Return (athlete (templ.pop (0), Templ.pop (0), Templ)      Except IOError as Ioerr: print (' File error: ' + str (ioerr))Return (None)
(7) Class call and result output
James = Get_coach_data (' james2.txt ')
Result output: James Lee's fastest times is: [' 2.01 ', ' 2.16 ', ' 2.22 '] The next lesson about class inheritance

Python's Data Processing Learning (ii)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.