Scenario:
There is a business database that uses MySQL 5.5 and writes a large amount of data every day. It is easy to delete data from multiple tables before the specified period from time to time, write a few WHILE loops. Although MySQL also has similar functions, I am not proficient in it, so I use Python to implement it.
Script:
# Coding: UTF-8
Import MySQLdb
Import time
# Delete config
DELETE_DATETIME = '2017-08-31 23:59:59'
DELETE_ROWS = 10000
EXEC_DETAIL_FILE = 'exec_detail.txt'
SLEEP_SECOND_PER_BATCH = 0.5
DATETIME_FORMAT = '% Y-% m-% d % x'
# MySQL Connection Config
Default_MySQL_Host = 'localhost'
Default_MySQL_Port = 3358
Default_MySQL_User = "root"
Default_MySQL_Password = 'roo @ 01239876'
Default_MySQL_Charset = "utf8"
Default_MySQL_Connect_TimeOut = 120
Default_Database_Name = 'testdb001'
Def get_time_string (dt_time ):
"""
Obtains the time string in the specified format.
: Param dt_time: the time when the string is to be converted.
: Return: returns a string in the specified format.
"""
Global DATETIME_FORMAT
Return time. strftime (DATETIME_FORMAT, dt_time)
Def print_info (message ):
"""
Output the message to the console and write the message to the log file.
: Param message: string to be output
: Return: no return
"""
Print (message)
Global EXEC_DETAIL_FILE
New_message = get_time_string (time. localtime () + chr (13) + str (message)
Write_file (EXEC_DETAIL_FILE, new_message)
Def write_file (file_path, message ):
"""
Append the incoming message to the file specified by file_path.
First, create the directory where the file is located.
: Param file_path: Path of the file to be written
: Param message: information to be written
: Return:
"""
File_handle = open (file_path, 'A ')
File_handle.writelines (message)
# Append a line feed to facilitate browsing
File_handle.writelines (chr (13 ))
File_handle.close ()
Def get_mysql_connection ():
"""
Returns the database connection according to the default configuration.
: Return: database connection
"""
Conn = MySQLdb. connect (
Host = Default_MySQL_Host,
Port = Default_MySQL_Port,
User = Default_MySQL_User,
Passwd = Default_MySQL_Password,
Connect_timeout = Default_MySQL_Connect_TimeOut,
Charset = Default_MySQL_Charset,
Db = Default_Database_Name
)
Return conn
Def mysql_exec (SQL _script, SQL _param = None ):
"""
Execute the input script and return the number of affected rows.
: Param SQL _script:
: Param SQL _param:
: Return: number of rows affected by the execution of the last statement of the script
"""
Try:
Conn = get_mysql_connection ()
Print_info ("execute script on server {0}: {1}". format (
Conn. get_host_info (), SQL _script ))
Cursor = conn. cursor ()
If SQL _param is not None:
Cursor.exe cute (SQL _script, SQL _param)
Row_count = cursor. rowcount
Else:
Cursor.exe cute (SQL _script)
Row_count = cursor. rowcount
Conn. commit ()
Cursor. close ()
Conn. close ()
Except t Exception, e:
Print_info ("execute exception:" + str (e ))
Row_count = 0
Return row_count
Def mysql_query (SQL _script, SQL _param = None ):
"""
Run the input SQL script and return the query result.
: Param SQL _script:
: Param SQL _param:
: Return: returns the SQL query result.
"""
Try:
Conn = get_mysql_connection ()
Print_info ("execute script on server {0}: {1}". format (
Conn. get_host_info (), SQL _script ))
Cursor = conn. cursor ()
If SQL _param! = '':
Cursor.exe cute (SQL _script, SQL _param)
Else:
Cursor.exe cute (SQL _script)
Exec_result = cursor. fetchall ()
Cursor. close ()
Conn. close ()
Return exec_result
Except t Exception, e:
Print_info ("execute exception:" + str (e ))
Def get_id_range (table_name ):
"""
Obtain the maximum ID, minimum ID, and total number of rows to be deleted from the input table.
: Param table_name: table to be deleted
: Return: returns the maximum ID, minimum ID, and total number of rows to be deleted.
"""
Global DELETE_DATETIME
SQL _script = """
SELECT
MAX (ID) AS MAX_ID,
MIN (ID) AS MIN_ID,
COUNT (1) AS Total_Count
FROM {0}
WHERE create_time <= '{1 }';
". Format (table_name, DELETE_DATETIME)
Query_result = mysql_query (SQL _script = SQL _script, SQL _param = None)
Max_id, min_id, total_count = query_result [0]
# There is a pitfall, where total_count is not 0 but max_id and min_id are None.
# Determine whether max_id and min_id are NULL
If (max_id is None) or (min_id is None ):
Max_id, min_id, total_count = 0, 0, 0
Return max_id, min_id, total_count
Def delete_data (table_name ):
Max_id, min_id, total_count = get_id_range (table_name)
Temp_id = min_id
While temp_id <= max_id:
SQL _script = """
Delete from {0}
WHERE id <= {1}
And id >={ 2}
AND create_time <= '{3 }';
". Format (table_name, temp_id + DELETE_ROWS, temp_id, DELETE_DATETIME)
Temp_id + = DELETE_ROWS
Print (SQL _script)
Row_count = mysql_exec (SQL _script)
Print_info ("affected rows: {0}". format (row_count ))
Current_percent = (temp_id-min_id) * 1.0/(max_id-min_id)
Print_info ("current progress {0}/{1}, remaining {2}, progress: {3} % ". format (temp_id, max_id, max_id-temp_id, "%. 2f "% current_percent ))
Time. sleep (SLEEP_SECOND_PER_BATCH)
Print_info ("The current table {0} has no data to be deleted". format (table_name ))
Delete_data ('tb001 ')
Delete_data ('tb002 ')
Delete_data ('tb003 ')
Execution result:
Implementation principle:
Because the table has an auto-increment ID, we can find the maximum and minimum values that meet the deletion conditions, and then increment by ID, delete each small range (such as 10000.
Advantages:
It achieves the effect of cutting an ax and cutting a firewood. The transaction is small and has little impact on the online. It prints the "ID" currently processed and can be closed at any time, you can start with this ID by slightly modifying the code.
Lack of implementation:
To prevent high master/slave latency, the replication link is deleted for 1 second each time, which is relatively rough. The best way is to periodically scan the replication link and adjust the SLEEP cycle according to the delay, all of them are scripted. How can we be more intelligent!