Someone asked me, how did you manage to do the unified storage? I smiled, loudly told him: Ceph in hand, the world I have.
Ceph is a unified distributed storage system designed for outstanding performance, reliability, and scalability. After the adoption of OpenStack as the eldest brother is a very out of hand, by everyone's extensive attention. Of course, this is because it provides a variety of reliable and stable storage services.
Ceph supports three invocation modes, which can be used together in three ways:
- Object storage: Native API, also compatible with Swift and S3 APIs
- Block storage: Supports thin provisioning, snapshots, clones
- FileSystem mount (file): POSIX interface, snapshot support
is not the heart of the sudden heroic, ceph in the hand, when the world to save all AVI?
Remember the last four questions we focused on? How can you achieve scalability, high performance, and reliability in Ceph?
The original storage format or special storage format, which format is stored in order to easily manage data, ensure data migration and security.
Large files or small files, file system suitable for large files or small file storage, how to provide I/O efficiency.
High availability of data or space utilization, the use of replica technology to improve data availability will inevitably reduce the space utilization, how to choose.
Whether there is a metadata service, the metadata service is the service that holds the metadata information of the stored data, and both read and write data need to be connected to the metadata server to ensure consistency. The existence of metadata services is bound to have a single point of problem and performance bottlenecks.
Let's look at Ceph's infrastructure diagram first:
RADOS: Located at the lowest level of ceph, Reliable, autonomic, distributed object store, which is a reliable, automated, distributed storage of objects. All of Ceph's storage capabilities are based on the Rados implementation, which is the size of an object in Rados (typically 2MB or 4MB) to enable the Organization and management of the underlying storage. So the CEPH layer is also stored in a way that divides files into smaller files.
Librados: The function of this layer is to abstract and encapsulate the Rados and provide APIs to the upper layer for application development directly based on Rados, rather than the entire ceph. PHP, Ruby, Java, Python, C, and C + + support are now available, and versatility is important.
RADOS GW(RADOS Gateway): Provides gateway to Amazon S3 and Swift-compatible restful APIs for use in the development of appropriate object storage applications. RADOS GW provides a higher level of abstraction for the API, but is less powerful than Librados.
RBD(Reliable block Device): Provides a standard block interface that is commonly used to create volume for virtual machines in virtualized scenarios
Ceph FS: is a POSIX-compatible distributed File system
The top three modules are located in the Ceph application interface layer, which provides a higher level of abstraction, ease of application, or a client-side interface based on the Librados library.
With a brief introduction to the Ceph architecture above, we know that Ceph is also stored in a special storage format, which divides the files into 2m~4m object storage into Rados, which is supported for small files and large files.
Ceph has two important daemon processes: Osds and Monitors.
OSD (Object Storage Device): The process responsible for responding to client requests to return specific data. A ceph cluster typically has a number of OSD, which supports automatic backup and recovery.
Monitor: A ceph cluster requires a small cluster of multiple monitor, which synchronizes data through the Paxos protocol (zookeeper is also consistent through Paxos) to hold the OSD metadata.
This shows that ceph also needs metadata services, but to achieve a de-centralized (focus, need to understand the OSD and monitor functions, if you understand hbase, I think can be analogous to the relationship between zookeeper and Hregionserver).
Ceph Features
CRUSH (controlled Replication under scalable Hashing) algorithm for de-centering, no single point of failure (as discussed in the next article)
Unified storage architecture with different storage solutions
Support for replicas and EC two data redundancy methods
Self-management, self-healing
Designed for cloud infrastructure and emerging workloads
Scale-out, dynamic scaling, redundancy disaster recovery, load balancing, and more
Young, Ceph know?
Reference:
Ceph Official documentation
Welcome to follow me: three King data (unstable continuous update ~ ~ ~)