Then use Python to write a text-processing stuff.

Source: Internet
Author: User
Tags imap rfc822

My friend got into a little trouble and I volunteered to help. The thing is this:

-Their business system, data from a mailbox;

-Each message contains a record;

-These records are plain text, and the fields are separated by some special characters;

-they need to remove each message in bulk from the mailbox and put it in an Excel file.

This is a piece of cake for Python. (It turns out that there are some small pits that give me a headache for a while.) )

Because it is a beginner, there is no need to start from python2, I directly use the Python3.

The first is to receive the letter. The mailbox does not support POP3 to win the trust, in support of IMAP. Check, Python3 have a special library can do.

Then you want to process the text with a regular expression.

What does Excel need to use to create a third-party library, looking for a bit, did not come down. Simply make it simple and generate a CSV file.

==============

1def main ():2M = Imaplib. Imap4_ssl ("my-host.com", "993")3T=04     Try:5         Try:6M.login (' My-username ', ' My-password ')7ExceptException  asE:8             Print(' Login error:%s '%e)9M.Close ()Ten          OneM.select (' INBOX ',False) A          -         #result, message = M.select () -         #tips: If you want to find Essh mail, use the         #type, data = M.search (None, ' (SUBJECT "essh") ') -         #The inside to use a parenthesis, the representative is a query condition, you can specify multiple query conditions, such as from XXXX SUBJECT "AAA", -         #Notice that the command is enclosed in parentheses (painful attempt) -Typ, data = M.search (None, ' all ') +  -Msglist = data[0].Split() +         Print("Total mails:" +str (len (msglist ))) ALast = Msglist[len (msglist)-1] at         #First = msglist[0] -         #M.store (First, '-flags ', ' (\seen) ') -         #M.store ("1:*", ' +flags ', ' \\Deleted ') #Flag all Trash as Deleted -output=path+ ' \output.csv ' -Fp2=open (output, ' W ') -  inLast_id=Read_config () -         Count=0 to          forIDX inRange(int (last_id), Len (msglist)): +             Print("Curr ID:" +STR (idx) + ' \ n ') -Type,data=m.fetch (Msglist[idx], ' (RFC822) ') theDeal_mail (data,FP2) *             Count=Count+1 $             if Count>500:Panax Notoginseng                  Break -  the write_config (IDX) +         #print (str (IDX)) A         Print("ok!") theM.Logout () +ExceptException  asE: -         Print(' IMAP error:%s '%e) $M.close ()

This is the main () section. The main is to connect the IMAP server, the trust, call processing functions.

I found that the interface provided by IMAP was rather weird. Anyway, no how to drop the pit, the online information is very complete. The syntax for the search and the commands to delete and to read/unread are placed in comments.

The logic is: first get the last processed ordinal last_id, starting from here, processing 500 letters. The new last_id is then written to the configuration file for the next read.

In writing this program, the most trouble encountered is about STR and bytes types. Since many of the code on the Web is from Python2, I have encountered multiple prompts in Python3:

cannot use a string pattern on a Bytes-like object write () argument must is str, not bytes error:a Bytes-like object is required, not ' str ' error:string argument without an encoding Error:cannot Use a string pattern on a Bytes-like objectWait a minute... One of the first two big. For example, I want to replace the half-width comma, the most simple function, and then tried for half a day:

This error:

Content=content.replace (', ', ', ')
Error:a Bytes-like object is required, not ' str '

This is also wrong:

Content=content.replace (', ', Bytes (', '))
error:string argument without an encoding

In the end, that's right:

Content=content.replace (ByteArray (', ', ' GBK '), ', '. Encode (' GBK '))

But when I continue to turn half-width double quotes into full-width double quotes, the situation is different:

Matchobj = Re.match (R ' .*<body> (. *) </body>.* ', Content.decode (' GBK '), re. M|re. I|re. S) if matchobj:    found# message body    aa=found.  Split# breaks down into a field    # I don't need it in front of me! Content=content.replace (ByteArray (', ', ' GBK '), ', '. Encode (' GBK '))

I sweat ... In a word there is something still not understand, leading to a lot of detours. Write it down and help people.

The following is the full source

#-*-coding:utf-8-*-Import Imaplib,string,emailimport osimport reconfig_file= ' Last_id.txt 'PATH=os.path.Split(Os.path.Realpath(__file__)) [0]def Main ():M= Imaplib. Imap4_ssl ("my-host.com", "993") T=0Try:Try:M. Login (' my-username ', ' My-password ') exceptException  asE:Print(' Login error:%s '%e) M.Close () M. Select (' INBOX ',False)                #result, message = M.select ()        #tips: If you want to find Essh mail, use        #type, data = M.search (None, ' (SUBJECT "essh") ')        #The inside to use a parenthesis, the representative is a query condition, you can specify multiple query conditions, such as from XXXX SUBJECT "AAA",        #Notice that the command is enclosed in parentheses (painful attempt)Typ, data = M.search (None, ' all ') Msglist= Data[0].Split()        Print("Total mails:" +str (len (msglist))) last= Msglist[len (msglist)-1]        #First = msglist[0]        #M.store (First, '-flags ', ' (\seen) ')        #M.store ("1:*", ' +flags ', ' \\Deleted ') #Flag all Trash as Deletedoutput=path+ ' \output.csv 'FP2=open (output, ' W ') last_id=Read_config ()Count=0 forIDX inRange(int (last_id), Len (msglist)):Print("Curr ID:" +STR (idx) + ' \ n ') Type, Data=m.fetch (Msglist[idx], ' (RFC822) ') deal_mail (data,FP2)Count=Count+1if Count&GT;500: Breakwrite_config (IDX)#print (str (IDX))        Print("ok!") M.logout () exceptException  asE:Print(' IMAP error:%s '%e) M.Close () def main2 ():Path=os.path.Split(Os.path.Realpath(__file__)) [0] Input=path+ ' \input2.txt 'Output=path+ ' \output.csv 'FP=open (input, ' RB ') FP2=open (output, ' W ')    if True: Line=FP.Read () pharse_content (FP2,Line )def get_mime_version (msg):ifMsg! = None:returnEmail.utils.parseaddr (Msg.get (' mime-version ')) [1]    Else:empty_obj () def get_message_id (msg):ifMsg! = None:returnEmail.utils.parseaddr (Msg.get (' Message-id ')) [1]    Else:empty_obj ()#read config file, get last maximum ID, start reading message from this IDDef read_config ():ifOs.path.isfile (path+ "\ \" +config_file):_FP=open (path+ "\ \" +config_file) ID=_fp.read () _fp.Close ()Else:ID=0returnID#writes the maximum ID of the message being processed to config so that the next readdef write_config (ID):_FP=open (path+ "\ \" +config_file, ' W ') _FP.Write (str (ID)) _FP.Close () def deal_mail (data, FP2):msg=email.message_from_string (Data[0][1].decode (' GBK '))) MessageID=get_message_id (msg)Print(MessageID) content=msg.get_payload (decode=True)    #Print (content)Pharse_content (FP2, content,MessageID) def pharse_content (FP2, content, MessageID):#convert half-width into full-width,    #content=content.replace (', ', ', ') # Error:a Bytes-like object is required, not ' str '    #content=content.replace (', ', Bytes (', ')) # error:string argument without an encodingContent=content.replace (ByteArray (', ', ' GBK '), ', '. Encode (' GBK ')))    #Print (Content.decode (' GBK '))    #strinfo=re.compile (', ')    #content=strinfo.sub (', ', content) # Error:cannot use a string pattern on a Bytes-like objectMatchobj= Re.match (R ' .*<body> (. *) </body>.* ', Content.decode (' GBK '), re. M|re. I|re.S)ifMatchobj:found=matchobj.group (1)#message bodyAa=found.Split(' #$ ')#break it down into a field        #get a complaint involving a number. Matching mode: The appeal question involves the number: 18790912404;Mobileobj=re.match (R ') * The complaint concerns the number: (. *); ', Aa[9], re. M|re. I|re.S)ifMobileobj:Mobile=mobileobj.group (1)        Else:Mobile= ' '#BB is the result array that corresponds to the generated CSV file columnAa[9]=aa[9].replace (' "', '" ')#I don't need it in front of me! Content=content.replace (ByteArray (', ', ' GBK '), ', '. Encode (' GBK '))BB=[']*40#array of 40 elements, corresponding to 40 columnsBB[3]=AA[0]#column DBB[4]=AA[4]#EBb[5]=mobile#FBB[6]=AA[5]#GBB[7]=AA[2]#HBB[8]=AA[1]#IBB[9]=AA[3]#JBB[11]=AA[6]#LBB[12]=AA[6]#Mbb[22]= ' website '#The source of the complaint. You can modify it to the specified type here .Bb[36]= ' "' +aa[9]+ '" '#AK, plus "" on both sides to ensure that multiple lines of text are put into a cellDELI= ', '#fp2.write ("AAAAA," +deli.join (BB) + "\\n")Fp2.write (DELI.Join(BB) + "\ n")    Else:Print("No match!!") Main ()

Then use Python to write a text-processing stuff.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.