Cluster bottleneck: Disk IO must read

Source: Internet
Author: User

First you need to know
What is Io:io is the input/output interface
Read this article to take the following questions
1. Cluster bottlenecks why IO?
2. How much do you know about IO?


This is only a personal point of view:
When we are confronted with a cluster of operations, what we want is to be read. But in the face of big data, reading data needs to go through IO, where IO can be understood as the pipeline of water. The larger the pipeline, the faster we can read the data for T-class. So the quality of IO directly affects the data processing of the cluster.

Below is a detailed overview of IO

    • Read/write IO
      The disk controller issues a read/write instruction to the disk, giving the address of the start sector and the number of contiguous read/write sectors. Read/write Io is an IO, the sector number of the operation must be continuous, such as the IO request of the upper file system is a plurality of discontinuous sectors, will be split into multiple read/write IO by the disk controller to execute. (Hierarchical model is the most important idea of understanding a system, the hierarchy model from the bottom to the high level is a complex process, low-level module to the complexity of packaging, to provide a simple interface to the upper layer, from the high-level to the bottom is a hierarchical subdivision, refinement of the process. The logical cohesion between the layers reduces the coupling through protocol communication. The first IO of the filesystem layer is split into multiple IO executions by the disk storage layer, and the one-time IO concept between the different levels is different. )
    • Large/Small Block IO
      Small Block IO: refers to a small number of successive sectors of a read/write IO operation;
      Bulk IO: Refers to a large number of successive sectors of a read/write IO operation;
      Chunks and small pieces are not clearly differentiated.
    • Continuous random IO
      Continuous IO: Refers to two different read/write Io, the previous end address is not the same as the beginning address of the last time;
      Random io: Refers to two different read/write Io, the previous end address and the subsequent start of the address of a large difference;
    • Sequential/Concurrent IO
      Sequential io: Refers to the disk controller must be completed in one IO instruction before the next IO instruction, the execution of instructions is sequential, synchronous. For a single-disk storage system, the IO used is sequential io;
      Concurrent IO: The concurrent IO is for multi-disk storage system, refers to the disk controller after issuing an IO instruction, check the next IO instruction, if not the operation of the disk is not an ongoing disk, you can make the next IO instruction, the execution of the instruction is sequential, asynchronous.
    • Continuous/Intermittent IO
    • Stable/Burst IO
    • Real/Virtual IO
      The real Io:io request contains the corresponding address of the actual data, read/write the sector data;
      Virtual IO: IO request of non-entity data, just request some status information, metadata, etc.
    • IO concurrency probability
      Description of the book: single disk, io concurrency probability is 0, because a disk can only be one IO at a time. For the raid0,2 block case, the stripe depth is relatively large (the stripe is too small to concurrent IO, as described below), the probability of 2 io is 1/2. In other cases, please perform your own calculations.
      Personal understanding: Disk IO concurrency refers to whether the disk controller can perform concurrent execution of IO requests without waiting for the last IO request to complete before executing the next IO request. The Flixbox storage System must not be able to process IO concurrently, and the multi-disk storage system can process IO requests concurrently when the IO request consumes only a portion of the disk. As for the likelihood of concurrency is not clear.
    • IOPS
      Set the time required for the t= disk controller to complete an IO. Then t= seek time + rotational delay + data transfer time; Iops=io concurrency factor/t. (io concurrency factor has not been explained yet, with the concurrent coefficient of Io to Google also did not find ...) )
    • Io throughput per second
      The size of the IO processed per second equals iops* average iosize. The size of the iosize is related to the read and write speed of the head.

Cluster bottleneck: Disk IO must read

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.