ETL Application: A method of acquiring one platform interface file at a time

Source: Internet
Author: User

ETL Application scenario, if the interface file is not provided, the task will be in the loop wait until the peer to provide, the method greatly consumes the system resources. To this end think of a method, one time to obtain a platform file, the realization of the following ideas:

1, the first time to obtain the peer platform to provide the directory under the given date all the interface files, and save the file list;

2, the subsequent restart every n minutes to get the task, each time to get a list of files, and the last list to compare, when the following situation occurs, will be re-acquired:

A, a new document is produced;

B, there is a change in file size

The implementation method is as follows:

[ftp.properties]ipaddress = 10.25.xxx.xxxusername = Xxxxxpassword = xxxxx#\u5f53 encryption \u6709\u503C\u65F6\uFF0C\ U5c06\u8fdb\u884c\u5bc6\u7801\u89e3\u6790encryption = #\u5f53resolve \u4e3a false\u65f6\uff0c\u9700\u8981\u66ff\ U6362\u8fdc\u7a0b\u76ee\u5f55\u548c\u5f53\u524d\u76ee\u5f55\u7684\u53c2\u6570resolve = 1remoteDir =/bosscdr/tobak /jf_basslocaldir =/interface/cyg/[sdt_yyyymmdd]#\u4e0a\u6b21\u4fdd\u5b58\u7684\u6587\u4ef6\u83b7\u53d6\u5217\ U8868lastfilelist =/interface/cyg/lastfilelist.txt
#-*-coding:utf-8-*-" "function Description: Get remote file Write time: 2015-5-5 Author: Chenyangang----------------------------------------Implementation method: 1, get the FTP server information set up in the configuration file, User name is encrypted 2, get a list of files in remote directory, if there is a list of saved files, compare, extract the difference file 3, according to the difference file for file acquisition" "ImportdatetimeImportConfigparserImportOSImportFtplibImportCpickleclassGetdatabasediff (object):def __init__(Self, config, InterfaceID = none, Interfacedate = none, delay =0): Self.config=config Self.interfaceid=InterfaceID#The default is today's date        ifInterfacedate = =None:self.interfaceDate= Datetime.date.strftime (Datetime.date.today ()-Datetime.timedelta (delay),"%y%m%d")                    defGetConfig (Self, interfacedate): Readconfig=Configparser.configparser () with open (Self.config,'R') as CONFIGFILE:READCONFIG.READFP (configfile) hostaddr= Readconfig.get ('ftp.properties','IPAddress') Username= Readconfig.get ('ftp.properties','username')                                  #whether to parse parameters and encryptResolve = Readconfig.get ('ftp.properties','Resolve') Encryption= Readconfig.get ('ftp.properties','Encryption')                        #directory informationRemotedir = Readconfig.get ('ftp.properties','Remotedir') Localdir= Readconfig.get ('ftp.properties','Localdir')                         #store last fetch file listLastfilelist = Readconfig.get ('ftp.properties','lastfilelist')                         ifEncryption = ="': Password= Readconfig.get ('ftp.properties','Password')              Else: Command= encryption +' '+ Readconfig.get ('ftp.properties','Password') Password=os.popen (command)ifResolve = ='1': Month= Interfacedate[0:6] Remotedir= Remotedir.replace (r"[SDT_YYYYMMDD]", interfacedate) Remotedir= Remotedir.replace (r"[Sdt_yyyymm]", month) Localdir= Localdir.replace (r"[SDT_YYYYMMDD]", interfacedate) Localdir= Localdir.replace (r"[Sdt_yyyymm]", month)returnhostaddr, username, password, remotedir, Localdir, LastfilelistdefConnect (self, hostaddr, username, password):Try: Connftp=Ftplib. FTP (HOSTADDR)exceptFtplib.error_perm:Print "The IPAddress (IPAddress) refused!"%{'IPAddress': hostaddr}Try: Connftp.login (username, password)exceptFtplib.error_perm:Print "This username (username) refuse Connect, please check your username or password!"%{'username': Username}returnconnftpdefgetfilelist (self, connftp, Remotedir):#get file details, including permissions, file size, owner information, and the 5th item is file sizeconnftp.cwd (remotedir) Filesdetail= Connftp.nlst ('- L')                #save file name and sizeFileList = {}                 forFiledetailinchFilesdetail:filelistfromdetail=Filedetail.strip (). Split () filelist[filelistfromdetail[-1]] = filelistfromdetail[4]                  returnfileListdefcomparisonfilelist (self, Lastfilelist, newfilelist):#load last file for information        ifLen (Open (Lastfilelist,"RB"). ReadLines ()) >0:with Open (lastfilelist,"RB") as FP:Try: Lastfilelist=cpickle.load (FP)exceptEoferror:Print "Load (filename) was failed"%{'filename': Lastfilelist}Else: Lastfilelist={} Lastfileset=Set (Lastfilelist.keys ()) Newfileset=Set (Newfilelist.keys ())#extract List of new filesDifffilelist = List (Newfileset-lastfileset) Samefilename= List (Newfileset &lastfileset)#file list with inconsistent file size before and after extraction         forSamefilenameinchSamefilename:ifNewfilelist[samefilename]! =Lastfilelist[samefilename]: difffilelist.append (samefilename)dellastfilelist#Save latest file get listfp = open (Lastfilelist,"WB") Lastfilelist=Cpickle.dump (newfilelist, FP) fp.close ()returndifffilelistdefmachedfilelist (self, difffilelist, InterfaceID, interfacedate):return[Flist forFlistinchDifffilelistifInterfaceIDinchflist andInterfacedateinchFlist]defDownload (self, connftp, Localdir, getfilelist):#go to local directory        if  notOs.path.isdir (Localdir): Os.makedirs (Localdir)Try: Os.chdir (localdir)except :            Print 'dose\ ' t enter the directory, mybe you has not authority!'                #get the latest files         forRemoteFileinchgetfilelist:Try: Connftp.retrbinary ("RETR%s"%remotefile, open (RemoteFile,"WB"). Write)exceptFtplib.error_perm:Print 'error:cannot Read File "%s"'%remotefile connftp.quit ()if __name__=='__main__': Interfacedate='20150520'InterfaceID=None Getdatabasediff= Getdatabasediff ('./config.properties', interfacedate, 0) hostaddr, username, password, remotedir, Localdir, lastfilelist=getdatabasediff.getconfig (interfacedate) connectionftp=getdatabasediff.connect (hostaddr, username, password) fileList=getdatabasediff.getfilelist (connectionftp, remotedir) difffilelist=getdatabasediff.comparisonfilelist (lastfilelist, fileList)ifInterfaceID is  notNone andLen (difffilelist) >0:getfilelist=getdatabasediff.machedfilelist (difffilelist, InterfaceID, interfacedate) getdatabasediff.download (ConnectionF TP, Localdir, getfilelist)Else: Getdatabasediff.download (connectionftp, Localdir, difffilelist)

As above, is the code that you try to write after learning Python. You can modify the configuration file to configure multiple platforms to obtain multiplatform interface data.

ETL Application: A method of acquiring one platform interface file at a time

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.