[Image Search] single server program + database Process Optimization record

Source: Internet
Author: User
Tags bulk insert server website

DHT capture program Open Source Address: https://github.com/h31h31/H31DHTDEMO

Data processing program Open Source Address: https://github.com/h31h31/H31DHTMgr

The article I introduced earlier does not have a design idea for large websites. I feel that there are many areas to be optimized. I would like to record it and learn from you.
1. What do you do when the server needs to query whether a local file exists in 200 MS (milliseconds? (There are 4096 level-1 folders, and each folder has 1000 files. 23. What should I do when the server website needs to search for keywords for around 5 seconds? (Currently, LIKE queries using SQL statements are searched)

 Question 1: first look at the time spent in moving files on the server:

The first 34 hours of moving 4096 folders are estimated to be 34 hours (two folders are processed in about one minute, 120 are processed in one hour, and 34 hours in total ),

It seems that the original design is very problematic. It takes so long for Optical mobility to increase the data size in the future. If the design needs to be modified, it will take more time.

Pathname = (I =; I <; I ++ temp1 =. Format ((! (J =; j <; j ++ temp2 =. Format ((! = (DirectoryInfo NextFolder (NextFolder. name. length = cnt = (FileInfo fileChild NextFolder. getFiles (++ fc = finddot = fc. indexOf (hashname = fc. substring (tempname1 = hashname. substring (, tempname2 = hashname. substring (hashname. length-, pathname1 = pathname2 = filename2 =, NextFolder. name ++ cnt. toString () ++ (ticktime2-ticktime1 ). toString () ++ cnt. toString () ++ (ticktime2-ticktime1 ). toString () + (NextFolder. name. length =Move the seed file to the subfolders

 

/// <Summary> /// batch insert /// </summary> public static int BulkInsertFile (string tablename, string filepath) {try {string strSql = string. format ("bulk insert {0} FROM '{1}' WITH (FIELDTERMINATOR = '|', ROWTERMINATOR = '; \ n', BATCHSIZE = 5000)", tablename, filepath); return dbsql. executeNonQuery (CommandType. text, strSql. toString (), null);} catch (System. exception ex) {H31Debug. printLn ("AddNewHashLOGFile" + filepath + ex. stackTrace);} return-1 ;}

Note how to ensure that Chinese characters are not garbled in the database.

Unicode encoding is used for saving. UTF-8 is not supported after SQLSERVER2005. Some Japanese files will be saved to the database, and Unicode encoding will be used.

                StreamWriter writer =  StreamWriter(filename, 

When BULKINSERT is used for batch insertion, the ID field in the HASH table needs to be removed from the auto-increment attribute. there is basically no problem in local operations because the local data volume is small, but it will not work once it reaches the server,

One of the 12 tables has a large data volume and cannot be modified. The other tables have been modified successfully, and they are so sweaty,

If it cannot be modified, batch insert is designed in vain. Because the ID field of the HASH table is associated with other tables, this ID must be available when a file is generated locally.

 

I found a bunch of methods on the internet, saying it was to modify the options in the toolbar, as shown in:

It is still useless after modification, and it is useless to restart the database.

When the system is in a hurry, stop the website and append the database separation and contraction before modification. The execution still times out.

When no one asks, I can only search for English articles by GOOGLE, and finally find a method to modify the registry.

This timeout setting value , but it still times    seconds. 

At the beginning, it could not be changed to 300 seconds. It could be changed to 1800 seconds. It basically didn't work. You could only restart the database and save the modification several times. Later, the modification was successful. it seems that the tool also needs to be configured ..

 

 
Because we had the database automatically generate auto-incremental ids before, it seems to be an incorrect design. Now we can only change the error half because we ran all the data again, it may take at least 15 days.

Now try not to deal with the database in real time. When the program starts, it traverses the maximum idnumber of the table, and then it is used locally to generate an auto-increment idnumber to directly store the data in the local file,

You need to ensure the uniqueness of the ID auto-increment. You need to consider the following:

1. Unique software running problems;

2. After the software starts running, ensure that all the previously generated SQL batch text files are inserted into the database before obtaining the maximum ID number;

3. Generate SQL batch text files. You need to consider the problems that may already exist in the database if insertion fails;

4. You need to read a piece of insert if there is a problem.

At present, the software process is almost changed, and less database is used, which puts less pressure on the website query speed.

 

Because the website still uses the SQL LIKE statement for search, the time is about 2-5 seconds, especially when the search results are large.

We recommend using hubble.net. Currently, the analysis result of paie.net is paie.net.

Because the current memory of the server is insufficient, the CPU can only be used. Lucene uses text to index the server. If the memory usage is small, only this architecture is available for testing.

Lucene is still under research, so there is nothing in the future. Please give us some advice.

1. I haven't figured out how long it takes to move many small files.

2. Will batch insert cause other problems? Further observation is required.

If you want to know more, please leave a message here.

Everyone looked tired, just move to the entertainment area http://h31bt.com to see, rest...

 

I hope you can recommend more... your suggestion is the motivation of the next article...

 

Happy National Day .........

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.