Seven suggestions for processing large batches of Files

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recently, due to project requirements, many documents are frequently used, and many detours have taken place during this period. It takes a long time to process large volumes of files. Most of the time is spent waiting anxiously. With the rich experience, I gradually found out some principles for processing large batches of files. Here: Principle 1: select the command line instead of the GUI.
For example, a folder stores millions of files and contains large quantum folders. To count the number of all files in a folder, right-click the folder attribute to view the number of files. However, this often directly causes no Windows response. The alternative method is to use the command line tool for statistics, such as the dir command under dos, or the linux Command Line tool in windows: unxutils, it can simulate most linux commands in windows and Use find. -type f | wc-l for fast statistics. In this way, not only is it several times faster than the GUI, but there is no unfriendly interface like "no response. Principle 2: Compressed storage and transmission sometimes requires massive data storage on the disk, and takes the file for a walk on the disk block, which not only occupies space, but also consumes network transmission time. For common text files, the compression ratio of common compression formats is close to 10%. To support cross-platform, you can select a common format such as zip or tar. The transmission of the entire large file also saves a lot of time than the transmission of massive small files. Even taking the compression and decompression time into account, this is much faster than directly transferring the dispersed files. Principle 3: cache common information if your program often needs to traverse all files in a folder for processing, and the file set remains stable, the system will spend a lot of time traversing. In this case, you can maintain a file list. The system only traverses the list when the list is generated for the first time. After that, the system no longer needs to traverse the folder and directly reads the file list information. The cost of the latter is much lower than that of the former. Of course, there are also many situations where frequently used information is often changed. This requires the 4th principles mentioned in this article. Principle 4: If the folder mentioned in incremental modification Information Principle 3 is changed frequently, does it mean that you have to traverse all the information each time to obtain the latest Folder Information? Of course, the answer is: unless in the worst case, almost all files are updated, you can modify only the changed part of the information, to avoid overhead of re-calculation. For example, if a folder adds new data of the day and deletes the Data seven days ago, you only need to update the data and keep the data of the six days in the middle. Principle 5: Parallel Processing of the experiences of using download tools to download files. If your bandwidth supports a maximum bandwidth of K data per second, if your current task only downloads K data per second, you should start several more tasks and download them in parallel, until the total download volume reaches the limit of KB per second. In this way, we can use the bandwidth to the maximum extent and do the most. Of course, once the throughput reaches the bottleneck, increasing the process or thread will not only increase the processing speed, but may also lead to resource deadlock and depletion. Principle 6: It is time-consuming to reduce I/O overhead and read/write I/O. If not necessary, no unnecessary operations are required. For example, many software have different levels of logs. If it is used only by common users, you do not need to save a lot of detailed log information. In case of a fault that requires diagnosis, engineers can turn on the log switch for debugging. Test engineers can also redirect logs or nohup logs during normal tests. It not only saves I/O, but also saves enough information for Error Tracking. Principle 7: Select batch processing, instead of one-by-one processing. Many software programs contain batch processing functions. batch processing is performed through the given execution list. You do not need to control the processing process each time. For example, wget supports downloading a given URL list. If you split the URL list by yourself and pass it to wget one by one, each transfer will result in a wget start and end operation, resulting in unnecessary overhead. Good software can ensure that a batch processing is performed at a startup/result. So before using any software, read the help manual to see if there is any batch processing function, so that you can get twice the result with half the effort.

This article is from the "cainiao William" blog, please be sure to keep this source http://williamwhe.blog.51cto.com/720802/155424

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Seven suggestions for processing large batches of Files

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Seven suggestions for processing large batches of Files

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support