Cluster bottleneck: Disk IO must read

Last Update:2015-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First you need to know
What is Io:io is the input/output interface
Read this article to take the following questions
1. Cluster bottlenecks why IO?
2. How much do you know about IO?

This is only a personal point of view:
When we are confronted with a cluster of operations, what we want is to be read. But in the face of big data, reading data needs to go through IO, where IO can be understood as the pipeline of water. The larger the pipeline, the faster we can read the data for T-class. So the quality of IO directly affects the data processing of the cluster.

Below is a detailed overview of IO

Read/write IO
The disk controller issues a read/write instruction to the disk, giving the address of the start sector and the number of contiguous read/write sectors. Read/write Io is an IO, the sector number of the operation must be continuous, such as the IO request of the upper file system is a plurality of discontinuous sectors, will be split into multiple read/write IO by the disk controller to execute. (Hierarchical model is the most important idea of understanding a system, the hierarchy model from the bottom to the high level is a complex process, low-level module to the complexity of packaging, to provide a simple interface to the upper layer, from the high-level to the bottom is a hierarchical subdivision, refinement of the process. The logical cohesion between the layers reduces the coupling through protocol communication. The first IO of the filesystem layer is split into multiple IO executions by the disk storage layer, and the one-time IO concept between the different levels is different. ）
Large/Small Block IO
Small Block IO: refers to a small number of successive sectors of a read/write IO operation;
Bulk IO: Refers to a large number of successive sectors of a read/write IO operation;
Chunks and small pieces are not clearly differentiated.
Continuous random IO
Continuous IO: Refers to two different read/write Io, the previous end address is not the same as the beginning address of the last time;
Random io: Refers to two different read/write Io, the previous end address and the subsequent start of the address of a large difference;
Sequential/Concurrent IO
Sequential io: Refers to the disk controller must be completed in one IO instruction before the next IO instruction, the execution of instructions is sequential, synchronous. For a single-disk storage system, the IO used is sequential io;
Concurrent IO: The concurrent IO is for multi-disk storage system, refers to the disk controller after issuing an IO instruction, check the next IO instruction, if not the operation of the disk is not an ongoing disk, you can make the next IO instruction, the execution of the instruction is sequential, asynchronous.
Continuous/Intermittent IO
Stable/Burst IO
Real/Virtual IO
The real Io:io request contains the corresponding address of the actual data, read/write the sector data;
Virtual IO: IO request of non-entity data, just request some status information, metadata, etc.
IO concurrency probability
Description of the book: single disk, io concurrency probability is 0, because a disk can only be one IO at a time. For the raid0,2 block case, the stripe depth is relatively large (the stripe is too small to concurrent IO, as described below), the probability of 2 io is 1/2. In other cases, please perform your own calculations.
Personal understanding: Disk IO concurrency refers to whether the disk controller can perform concurrent execution of IO requests without waiting for the last IO request to complete before executing the next IO request. The Flixbox storage System must not be able to process IO concurrently, and the multi-disk storage system can process IO requests concurrently when the IO request consumes only a portion of the disk. As for the likelihood of concurrency is not clear.
IOPS
Set the time required for the t= disk controller to complete an IO. Then t= seek time + rotational delay + data transfer time; Iops=io concurrency factor/t. (io concurrency factor has not been explained yet, with the concurrent coefficient of Io to Google also did not find ...) ）
Io throughput per second
The size of the IO processed per second equals iops* average iosize. The size of the iosize is related to the read and write speed of the head.

Cluster bottleneck: Disk IO must read

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Cluster bottleneck: Disk IO must read

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Cluster bottleneck: Disk IO must read

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support