The Elasticsearch of the company's Project Search section, then the data synchronization between the two is a problem.
Online to find a few bags, but all have their own shortcomings, the final decision or write a script, the general idea is as follows:
1. Constant Select table specified in the dead loop
2. Read all rows in the table that were updated later than a certain point in time ("1970-01-01 00:00:00" at initialization time)
3. Update the required fields to Elasticsearch
Note: 1. In the middle to consider the script interrupt, or restart so the last update time recorded to a fixed TXT file
2. In order to make the script more generic, do not change the script for a table drastically, consider the dynamic generation of variables, using the locals and Globals
The code is as follows:
#!/usr/bin/env python#Coding=utf-8ImportSyssys.path.append ('/users/cangyufu/work_jbkj/elabels-flask') fromModules.utils.commonsImportapp, Redispool, Db_master, Db_slave fromSQLAlchemyImporttextImportOSImportdatetimeImport Time fromService.myelasticsearch.indexImportes fromModules.utils.mysqldbImportdb_obj_dictImportDatetimeconst_sleep= 3Work_index=' Test'#Https://stackoverflow.com/questions/136168/get-last-n-lines-of-a-file-with-python-similar-to-taildefTail (F, Lines=1): total_lines_wanted=Lines Block_size= 1024f.seek (0,2) Block_end_byte=F.tell () Lines_to_go=total_lines_wanted Block_number=-1blocks= []#blocks of size block_size, in reverse order starting #From the end of the file whileLines_to_go > 0 andBlock_end_byte >0:if(Block_end_byte-block_size >0):#Read the last block we haven ' t yet readF.seek (Block_number*block_size, 2) Blocks.append (F.read (block_size) )Else: #file too small, start from beginingF.seek (0,0)#Only read what is not readblocks.append (F.read (block_end_byte)) Lines_found= Blocks[-1].count ('\ n') Lines_to_go-=Lines_found Block_end_byte-=block_size Block_number-= 1All_read_text="'. Join (Reversed (blocks))return '\ n'. Join (All_read_text.splitlines () [-total_lines_wanted:])defis_file_exists (filename):if notos.path.isfile (filename): File= open (filename,'WB') File.write ("1970-01-01 00:00:00\n") File.close ()#Pass in the name of the table to monitordefSync_main (*args): forTableinchargs:Try: Callable (Globals () ['Monitor_'+table]) exceptException:RaiseException ('lack function monitor_{}'. Format (table)) forTableinchArgs:filename="'. Join (['Monitor_', table,'. txt']) locals () [Table+'Path'] = Os.path.join (Os.path.dirname (__file__), filename) is_file_exists (locals () [Table+'Path']) locals () [Table+'file'] = open (Locals () [Table+'Path'],'rb+') Try: Print "begin" whileTrue:count=0 forTableinchargs:Print 'handleing'+Table Last_time= Tail (locals () [Table+'file'], 1) Update_time= Globals () ['Monitor_'+Table] (last_time)PrintUpdate_timeifUpdate_time = =Last_time:count+ = 1Continuelocals () [Table+'file'].write (update_time+'\ n') locals () [Table+'file'].flush ()ifCount = =len (args): Time.sleep (const_sleep)exceptException, E:PrinteRaiseefinally: forTableinchargs:locals () [Table+'file'].close ()############################################################################################################ ##############If you want to monitor which table, you must implement function monitor_table_name, such as to monitor the Table1 table, you have to implement the Monitor_table1 function,#The incoming parameter is the starting time for the start update, and the time to initialize is 1970-01-01 00:00:00, which returns the latest update to############################################################################################################# ############defMonitor_table1 (last_time):Pass returnLast_time
def Monitor_table2 (last_time): pass return last_time
Def
return " %y-%m-%d%h:%m:%s " )
Sync_main (' table1',' table2')
[Synchronization Script]mysql-elasticsearch synchronization