Ossutil Upload Performance Tuning

Source: Internet
Author: User
Tags aliyun

Abstract: Often encounter internal classmates or external customers ask Ossutil about the performance of concurrent uploads. This paper briefly describes the Ossutil concurrent upload principle and illustrates. Users can get ossutil from here. Official website: https://help.aliyun.com/document_detail/50452.html code: Https://github.com/aliyun/ossutil parameter--recursive When uploading files to OSS, if File_url is a directory, you must specify the--recursive option, otherwise you do not need to specify the--recursive option.

Often encounter internal classmates or external customers ask Ossutil about the performance of concurrent uploads. This paper briefly describes the Ossutil concurrent upload principle and illustrates.

Users can get ossutil from here.

Official website: https://help.aliyun.com/document_detail/50452.html
Code: Https://github.com/aliyun/ossutil

Parameters
--recursive
When uploading files to OSS, if File_url is a directory, you must specify the--recursive option, otherwise you do not need to specify the--recursive option.
When downloading from OSS or copying files between OSS

If you do not specify the--recursive option, you are considered to be copying a single object, so make sure that src_url exactly specifies the object to be copied, and if object does not exist, an error is given.
If the--recursive option is specified, Ossutil will perform a prefix match lookup for Src_url, and for these objects bulk copies, if the copy fails, the copied copy will not be rolled back.
When a batch file is uploaded (or downloaded, copied), if one of the files fails, Ossutil will not exit, but will continue to upload (or download, copy) the other file and log the error message of the error file to the report file. File information that is successfully uploaded (or downloaded, copied) will not be recorded in the report file.

When a bulk operation terminates when an error occurs

If the bulk file iteration is not in progress and the error has occurred, the report file will not be generated and the Ossutil will terminate running. For example, when the user enters the CP command error, the report file is not generated, but the screen output error and exit.
If a file occurs during a bulk operation, the error is: Bucket does not exist, accesskeyid/accesskeysecret error caused by invalid permissions validation, Ossutil will screen output error and exit.
The report file is named: Ossutil_report_ Date _ time. Report. The report file is a Ossutil output file that is placed in the output directory of the Ossutil, which can be specified with the OutputDir option in the configuration file or the command line--output-dir option, if not specified, The default output directory is used: The Ossutil_output directory under the current directory.

Ossutil do not do the report file maintenance work, please view and clean up the user's report file, avoid producing too many report files.

concurrency control parameters
--jobs option to control multiple file uploads/downloads/copies, the number of concurrent files initiated between
--parallel controls the number of concurrency between shards when uploading/downloading/copying large files.
By default, Ossutil calculates the number of parallel based on the size of the file (this option does not work for small files, the large file file threshold for multipart upload/download/copy can be controlled by the--bigfile-threshold option), when the bulk file is uploaded/downloaded/ When copying, the actual number of concurrent jobs multiplied by the number of parallel. The two options can be adjusted by the user, and when Ossutil defaults to the user's performance requirements, the user can adjust the two options to lift performance.

--bigfile-threshold reference details, please refer to ossutil large file breakpoint continuation

--part-size Options
This option sets the size of each shard when large file shards are uploaded/downloaded/copied.

By default, this value does not need to be set, and Ossutil determines the Shard size and shard concurrency based on the size of the file, which can be set when the user uploads/downloads/copies performance is not up to demand, or if there are other special requirements.

If this option is set (shard size), the number of shards is: rounding up (file size/shard size), note that if the--parallel option value is greater than the number of shards, the extra parallel does not work and the actual number of concurrent occurrences is the number of shards.

Setting the part size value too small may affect the performance of the Ossutil file upload/download/copy, which can be too large to affect the number of shard concurrency that actually works, so set the part size option value appropriately.

Performance tuning
If the concurrency number is too large, due to inter-thread resource switching and looting, ossutil upload/download/Copy performance may be degraded, so please adjust the value of these two options according to the actual machine situation, if you want to do the pressure test, you can start to reduce the two values, slowly adjust to find the best value.

If the--jobs option and the--parallel option value are too large, in the case of limited machine resources, it may cause an EOF error due to the network transmission being too slow, so please reduce the--jobs option and--parallel option value appropriately.

If the number of files too much size is not very average, directly using--jobs=3--parallel=4 to set up (between files concurrency is 3, single file concurrency is 4), while observing mem, CPU, network conditions, if not full network, full CPU, You can continue to raise--jobs and--parallel.

Real case

Depending on the customer scenario at the time, the download speed is probably 265m/s.

Case analysis
By default, because it is a multi-file download, 5 files are downloaded at the same time (version<=1.4.0, the number of concurrency between files is 5).

Because the average file size is 1.1G, the default is to open 12 threads per downloaded file (the number of concurrent numbers within a single file is 12, which is calculated based on the file size if the parallel parameter and the Partsize parameter are not set).

Then in the customer's environment Ossutil at least 5*12= 60 threads running during the run. So many concurrent should be directly full network card, CPU should also be very crowded. It is recommended to observe the environment CPU, network, process/thread condition while downloading concurrently.

According to the customer, it is recommended that each file fragment 100m~200m be concurrent, such as set to 100M per shard, so that the number of concurrent downloads per file is filesize/partsize.
Ossutil CP oss://xxx Xxx-r--part-size=102400000

If the number of files is too large and the size is not too average, use--jobs=3--parallel=4 directly at the same time (between file concurrency is 3, single file concurrency is 4)

The general advice is: jobs * Parallel and CPU cores are 1:1,2:1, but not too big.

Further explanation
Not how many resources the OSS requires, is the CPU,MEM, network, etc. required for each concurrency (read file, shard, upload, etc.).

--jobs is the concurrency between multiple files, the default is 5 (version <= 1.4.0, followed by 3)
--parallel is a large file internal shard concurrency, calculated according to the file size without setting the parallel parameter and the partsize parameter, maximum no more than (version <= 1.4.0, followed by 12)
If the number of files too much size and less than average, you can use the--jobs=3--parallel=4 at the same time set (inter-file concurrency is 3, a single file concurrency of 4, the specific number according to the machine condition adjustment)
Summary
CP default concurrent execution, CP large file with the Shard concurrent download, small file with put; The CRC check is turned on by default.
Copying files between OSS, currently only supports copying object and does not support copying multipart.
General recommendations

Jobs * Parallel and CPU cores are 1:1,2:1, but not too big
Too many concurrent numbers will directly fill the network card, the CPU will be crowded. It is recommended to observe the environment CPU, network, process/thread condition while concurrency
Reference
Ossutil Large File Breakpoint continuation

Please add a link to the original link description

This article is the original content of the cloud-Habitat community and cannot be reproduced without permission.

Ossutil Upload Performance Tuning

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.