Some bug improvements and performance optimizations in Python

Source: Internet
Author: User
Tags create index database issues

In the previous development, there are many errors or poor performance, and then checked the data have been corrected, here can share:

1. mysql Database issues

(1) MySQL database if you have previously installed, the installation may occur after the service has started such a situation, so that you can stop the MySQL service, the registry to remove the content of the service, and then call: SC delete mysql, and then install it.

(2) MySQL installation method: Address:, after downloading the bin directory into the path variable, Then modify the My.ini file, after this specific configuration (very large), and then run CMD,CD to the installed Bin directory with Administrator privileges, install the database service, execute the command Mysqld-install generally can be installed successfully, open the service net start MySQL That's all you can do. If you log in, use Mysql-u root-p Enter the password and you can operate it.

(3) MySQL in the Chinese garbled problem, this troubled me for several days, but also because of their carelessness, in the use of MYSQLDB connection when not set parameters charset = ' Utf-8 ', in fact, it is recommended that Python encoding, and database encoding, etc. are set to utf-8, if necessary to encode () and decode () or Unicode operations. In addition, when querying the information also see some other solutions, here also mention but no test: one is to reload the encoding format, because Python is the default ASCII to decode, this is not necessarily, you can specify the use of Utf-8 at the beginning, Another windfall changes in MySQL 256 line encoding return format, this I have tried not to know whether the method is not effective, and the other is the use of Creat_engine (), the document in MySQL has a very detailed description, But I also tried or not, one of them should be the format of the re-mismatch error, there is a need to refer to, there is decoding or coding, and so on, probably so several methods.

(4) Another reference to the configuration of the My.ini file, which is mainly to configure some path to the installation path (not to the bin) and the path of data. The key is the setting of several characters and some memory space allocation settings. The following is the other blog content that was picked up: the URL is Here I mainly focus on two points, one is the setting of the character, can refer to, there is the modification of the character settings may need to add loose-in front, it is said to be a bug, did not dig, But it does not add when the service can not be started, the other is the following 45 lines mentioned in the Max_allowed_packet, because one time I used the tuple batch query when there is a connection error, looked up may be because the query is too large to cause the statement, This MySQL also has a special introduction, so changed this parameter, and then OK. Tip: Show variables like ' view configuration status, sometimes useful.


1. Back_log

Specifies the number of possible connections for MySQL. When the MySQL main thread gets very many connection requests in a short period of time, the parameter works, and the main thread takes some time (albeit very short) to check the connection and start a new thread.

The value of the Back_log parameter indicates how many requests can be present in the stack for a short period of time before MySQL temporarily stops responding to a new request. If the system has many connections in a short period of time, you need to increase the value of this parameter, which specifies the size of the listening queue for incoming TCP/IP connections. Different operating systems have its own limitations on this queue size. Attempting to set a limit of back_log above your operating system will be invalid.

When observing the MySQL process list, find a lot of 264084 | Unauthenticated user | | NULL | Connect | NULL | Login | The Back_log value is increased when the NULL pending process is to be connected. The default value for Back_log is 50.

2. Basedir

The path to the MySQL main program, that is: the value of the--basedir parameter.

3. Bdb_cache_size

The buffer size assigned to the cache index and row arrangement of the BDB type data table, and if the DBD type data table is not used, the--SKIP-BDB parameter should be loaded when MySQL is started to avoid memory wastage.


The buffer size assigned to the cache index and row arrangement of the BDB type data table, if the DBD type data table is not used, the parameter value should be set to 0, or the--SKIP-BDB parameter should be loaded when MySQL is started to avoid memory wastage.


See--bdb-home options.

6. Bdb_max_lock

Specifies the maximum number of lock table processes (default is 10000), which can be used if the BDB type data table is used. If you find Bdb:lock table is out of available locks or Got error when performing a large-scale transaction or query ... Error, you should increase the parameter value.

7. Bdb_logdir

Specifies the location where the log is stored when the service is serviced using the BDB type data table. That is the value of--bdb-logdir.

8. Bdb_shared_data

If you use the--bdb-shared-data option, the parameter value is on.

9. Bdb_tmpdir

The temp file directory for the BDB type data table. That is the value of--bdb-tmpdir.

Ten. Binlog_cache_size

Specifies the cache size used by SQL query statements during query request processing for binary log. If you frequently apply to large, complex SQL expression processing, you should increase the parameter value to gain performance gains.


Specifies that the MyISAM Type data table table uses a special tree-structured cache. Using the whole block method (bulk) can speed up the insert operation (INSERT ... SELECT, INSERT ... The speed and efficiency of the VALUES (...), (...), ..., and LOAD DATA INFILE. This parameter restricts the tree-structured cache size used by each thread, and if set to 0 disables the accelerated caching feature. Note: The cache operation for this parameter can only be performed by the user into a non-empty data table! The default value is 8MB.

Character_set .

The default character set for MySQL.


The character set that MySQL can provide support for.


If this parameter is turned on, MySQL allows the INSERT operation while performing the SELECT operation. If you want to turn off this parameter, you can load the--safe option when you start mysqld, or use the--skip-new option. The default is on.


Specifies the maximum number of seconds that the MySQL service waits to answer a connection message, beyond which MySQL returns bad handshake to the client.


Specifies the database path. That is the value of the--datadir option.


This parameter is valid only for MyISAM type data tables. The following types of values are:

OFF: If the CREATE table is used in the build tables statement ... Delayed_key_writes, then ignore All


On: If the CREATE table is used in the building tables statement ... Delayed_key_writes, the option is used (default);

All: All open data tables will be processed according to Delayed_key_writes.

If Delayed_key_writes is turned on, the data table that is already open does not flush with the update for each index

The Delayed_key_writes option is a KEY buffer for the data table, unless the data table is closed. This parameter greatly increases the speed at which the key value is written.

Degree. If you use this parameter, you should check all data tables: Myisamchk--fast--force.


After inserting the Delayed_insert_limit line, the Insert delayed processing module checks to see if there are any non-executed SELECT statements. If there are, execute the Allow these statements before proceeding with the processing.


The time at which an insert delayed thread should wait for an INSERT statement before terminating.


The queue size (in behavior units) allocated for processing insert delayed. If the queue is full, any customer who makes an insert delayed must wait for the queue space to be freed before continuing.

. Flush

Load the--flush parameter when you start MySQL to turn on this feature.


If this is set to a value other than 0, then every flush_time second, all open tables will be closed to release resources and sync to disk. Note: This parameter is only recommended if you are using windows9x/me or if your current operating system resources are critically low!


Search engine maintainers want to change the operators that are allowed for logical full-text searches. These are controlled by the variable ft_boolean_syntax.


Specifies the minimum length of the keyword being indexed. Note: After changing the parameter value, the index must be rebuilt!


Specifies the maximum length of the keyword being indexed. Note: After changing the parameter value, the index must be rebuilt!


Specifies the maximum length of keywords that can be used during fast full-text index reconstruction using repair, CREATE INDEX, or ALTER table. Keywords that exceed this length limit are inserted in a low-speed manner. By increasing the value of this parameter, MySQL will build a larger temporary file (which will reduce the CPU load, but the efficiency will depend on disk I/O efficiency) and store less key values within a sort fetch.


Reads the list from the file specified by the Ft_stopword_file variable. After you modify the Stopword list, you must rebuild the fulltext index.


Yes:mysql supports INNODB type data tables; DISABLE: Use--skip-innodb to turn off support for InnoDB type data tables.


Yes:mysql supports the Berkeley type data sheet; DISABLE: Use--SKIP-BDB to turn off support for Berkeley type data tables.


YES: Enables MySQL to support RAID functionality.


YES: Enables MySQL to support the SSL encryption protocol.


Specifies a file that contains a SQL query statement that will be loaded when MySQL starts and the SQL statements in the file will be executed.


The number of seconds the server waits for an action on an interactive connection before shutting it down. An interactive customer is defined as a customer who uses the client_interactive option for Mysql_real_connect (). Also visible wait_timeout.


The size of the buffer used for all unions (join) (not an indexed junction). The buffer allocates a buffer to each of the 2 tables, and when the index is not possible, increase the value to get a faster full join. (usually the best way to get a quick coupling is to increase the index.) )


The buffer size used for the index block increases it to get better processing of the index (for all read and multiple writes), as much as you can afford. If you make it too big, the system will start to slow down. You must leave some space for the OS file system cache. To get more speed when writing multiple rows.


The language in which the user outputs the error message.

Panax Large_file_support.

Open large file support.


Use--memlock to lock the mysqld in memory.

. log

Records all query operations.


Turn on update log.


Turn on binary log.


You need to turn on this parameter if you are using chain synchronization or synchronizing between multiple slave.


If a query takes longer than the parameter value, the query operation is recorded in Slow_queries.


1:mysql always use lowercase letters for SQL operations;

0: Turn off the feature.

Note: If this parameter is used, all data tables should be converted to lowercase letters before they are enabled.

max_allowed_packet .

The maximum size of a query statement package. The message buffer is initialized to net_buffer_length bytes, but can be increased to max_allowed_packet bytes when needed. This value is too small to produce an error when processing large packets. If you use a large BLOB column, you must increase the value.


The communication buffer is reset to that size during the query. You do not usually change the value of the parameter, but if you have insufficient memory, you can set it to the size that the query expects. (That is, the length that the client expects from the SQL statement.) If the statement exceeds this length, the buffer is automatically enlarged until Max_allowed_packet bytes. )


Specifies the maximum capacity of the binary log cache, and if the settings are too small, MySQL will error when executing complex query statements.


Specifies the maximum capacity of a binary log file, which defaults to 1GB.


The number of customers that are allowed to connect to the MySQL server simultaneously. If this value is exceeded, MySQL will return too many connections error, but normally MySQL is able to resolve it itself.


For the same host, if there is an interrupt error connection that exceeds the number of values for this parameter, the host will be blocked from connecting. If you need to unblock the host, execute: FLUSH host;.

Wuyi Max_delayed_threads.

Do not start a thread with more than this number to handle the insert delayed statement. If you try to insert data into a new table after all insert delayed threads are used, the row is inserted, as if the delayed property was not specified.


The maximum capacity that the memory table can use.


If you are querying a union that has more than Max_join_size records, an error is returned. If you want to execute a statement without a where and spend a lot of time, and return the join of millions of rows, you need to increase the parameter value.


The number of bytes used when sorting the Blob or text value (only the first max_sort_length bytes per value are used; the rest is ignored).


Specifies the maximum number of connections from the same user. A setting of 0 means no limit.


(This parameter does not currently work). The maximum number of temporary tables that a customer can keep open at the same time.


When the number of Max_write_lock_count write locks is present, some read operations that are locked are allowed to begin execution. Avoid too many write locks, and read operations are in a long wait state.


That is the value of the--myisam-recover option.


(5) Executemany (Sql,tuple) method is a good method, mainly can improve the speed of writing, but also more suitable for Python use. Python can format the string first, and then pass in with the subsequent tuple. Note that the elements of the subsequent tuple are also tuples, and the number of formatted characters to match, in addition to the timely submission of connection.commit () data, of course, finally remember to close the cursor and connection.

(6) Avoid printing a large list, because at first I went to the database to insert a large list, for convenience, I print a bit, the results found that the program consumes most of the time spent in the face, you can use the Time.time () method to calculate the time consumption.

(7) I think the performance of several points, one is batch operation, batch submission, the second is to modify the front of a lot of parameters, the third is to reduce unnecessary operation and output of the program. Of course, and online those 1000W data a few seconds to fix the difference too far, I now speed is from the online database or from the local Excel read, insert mysql,450w data between 5-10 minutes, tens other not over 20 minutes, for me is enough, but if it can be faster, Then why not? This still needs to learn.

2. Some of the problems encountered in Python

(1) string problem, in fact, a headache, here is the time to notice what your string is encoded, what format is good, when necessary type (), or can be type (s) in [Type (U ")], you can determine whether it is Unicode.

(2) Tuple problem, tuple is immutable sequence, compare trouble you can be added by two tuples to get a new tuple to use, be careful not to try to change the tuple in use, in addition, more use of list will have better flexibility, of course, under the immutable requirements or the use of tuple is better.

Some bug improvements and performance optimizations in Python

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.