[Synchronization Script]mysql-elasticsearch synchronization

Source: Internet
Author: User

The Elasticsearch of the company's Project Search section, then the data synchronization between the two is a problem.

Online to find a few bags, but all have their own shortcomings, the final decision or write a script, the general idea is as follows:

1. Constant Select table specified in the dead loop

2. Read all rows in the table that were updated later than a certain point in time ("1970-01-01 00:00:00" at initialization time)

3. Update the required fields to Elasticsearch

Note: 1. In the middle to consider the script interrupt, or restart so the last update time recorded to a fixed TXT file

2. In order to make the script more generic, do not change the script for a table drastically, consider the dynamic generation of variables, using the locals and Globals

The code is as follows:

#!/usr/bin/env python#Coding=utf-8ImportSyssys.path.append ('/users/cangyufu/work_jbkj/elabels-flask') fromModules.utils.commonsImportapp, Redispool, Db_master, Db_slave fromSQLAlchemyImporttextImportOSImportdatetimeImport Time fromService.myelasticsearch.indexImportes fromModules.utils.mysqldbImportdb_obj_dictImportDatetimeconst_sleep= 3Work_index=' Test'#Https://stackoverflow.com/questions/136168/get-last-n-lines-of-a-file-with-python-similar-to-taildefTail (F, Lines=1): total_lines_wanted=Lines Block_size= 1024f.seek (0,2) Block_end_byte=F.tell () Lines_to_go=total_lines_wanted Block_number=-1blocks= []#blocks of size block_size, in reverse order starting                #From the end of the file     whileLines_to_go > 0 andBlock_end_byte >0:if(Block_end_byte-block_size >0):#Read the last block we haven ' t yet readF.seek (Block_number*block_size, 2) Blocks.append (F.read (block_size) )Else:            #file too small, start from beginingF.seek (0,0)#Only read what is not readblocks.append (F.read (block_end_byte)) Lines_found= Blocks[-1].count ('\ n') Lines_to_go-=Lines_found Block_end_byte-=block_size Block_number-= 1All_read_text="'. Join (Reversed (blocks))return '\ n'. Join (All_read_text.splitlines () [-total_lines_wanted:])defis_file_exists (filename):if  notos.path.isfile (filename): File= open (filename,'WB') File.write ("1970-01-01 00:00:00\n") File.close ()#Pass in the name of the table to monitordefSync_main (*args): forTableinchargs:Try: Callable (Globals () ['Monitor_'+table]) exceptException:RaiseException ('lack function monitor_{}'. Format (table)) forTableinchArgs:filename="'. Join (['Monitor_', table,'. txt']) locals () [Table+'Path'] = Os.path.join (Os.path.dirname (__file__), filename) is_file_exists (locals () [Table+'Path']) locals () [Table+'file'] = open (Locals () [Table+'Path'],'rb+')    Try:        Print "begin"         whileTrue:count=0 forTableinchargs:Print 'handleing'+Table Last_time= Tail (locals () [Table+'file'], 1) Update_time= Globals () ['Monitor_'+Table] (last_time)PrintUpdate_timeifUpdate_time = =Last_time:count+ = 1Continuelocals () [Table+'file'].write (update_time+'\ n') locals () [Table+'file'].flush ()ifCount = =len (args): Time.sleep (const_sleep)exceptException, E:PrinteRaiseefinally:         forTableinchargs:locals () [Table+'file'].close ()############################################################################################################ ##############If you want to monitor which table, you must implement function monitor_table_name, such as to monitor the Table1 table, you have to implement the Monitor_table1 function,#The incoming parameter is the starting time for the start update, and the time to initialize is 1970-01-01 00:00:00, which returns the latest update to############################################################################################################# ############defMonitor_table1 (last_time):Pass    returnLast_time
def Monitor_table2 (last_time):    pass    return last_time 
Def
return " %y-%m-%d%h:%m:%s " )


Sync_main (' table1',' table2')

[Synchronization Script]mysql-elasticsearch synchronization

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.