A study of Ceph

Source: Internet
Author: User
Tags auth mkdir valid ssh ssh access

Today, Ceph is configured, referencing the official document address of the multiparty document http://docs.ceph.com/docs/master/rados/configuration/ceph-conf/#the-configuration-file

Other great God's blog address http://my.oschina.net/oscfox/blog/217798

Http://www.kissthink.com/archive/c-e-p-h-2.html and so on a case of a.

Overall on the single-node configuration did not encounter any air crashes, but multi-node configuration always prompt me the Mon address is not recognized, I studied for a whole day and no clue, especially for reference to official documents, did not find any problems, so I once again refer to my single node configuration file, I found in the definition of "MON.A" When there is a space (that is, the configuration file ceph.conf every line of the head to write, at least Mon keyword under the head of each line to write), only to find that the original source code developers in the matching configuration file did not filter these insignificant characters, do not know that this is not a bug of ceph development at least this does not conform to the C language specification

1 About Ceph 1.1 Ceph definition

Ceph is a Linux PB-level Distributed File system. 1.2 ceph origin

Its name is related to the mascot of the UCSC (the birthplace of Ceph), the mascot is "Sammy", a banana-colored slug, a shell-free mollusk in the head-and-foot category. These multi-tentacles head-and-foot animals are a highly parallel image metaphor for a distributed file system.

Ceph was originally a PhD research project on storage systems, implemented by Sage Weil in University of California, Santacruz (UCSC). 1.3 Ceph System Architecture

The Ceph ecosystem architecture can be divided into four parts:

1. Clients: Client (data user)

2. Cmds:metadata server cluster, meta data server (cache and synchronize distributed metadata)

3. Cosd:object storage cluster, object storage cluster (storing data and metadata as objects, performing other key functions)

4. Cmon:cluster monitors, cluster monitor (perform monitoring function)

Figure 1 The conceptual architecture of the Ceph ecosystem

Figure 2 The simplified layered view of the Ceph ecosystem 1.4 Ceph Components

Once you understand the conceptual architecture of Ceph, you can dig to another level to understand the main components implemented in Ceph. One of the important differences between Ceph and traditional file systems is that it uses intelligence in an ecological environment rather than the file system itself.

Figure 3 shows a simple Ceph ecosystem. The Ceph Client is a user of the Ceph file system. Ceph Metadata Daemon provides a metadata server, and the Ceph Object Storage Daemon provides the actual storage (both for data and metadata). Finally, Ceph Monitor provides cluster management. It is important to note that Ceph customers, object storage endpoints, metadata servers (depending on the capacity of the file system) can have many, and at least a pair of redundant monitors. So, how is this filesystem distributed?

Figure 3 Simple Ceph ecosystem 1.4.1 Ceph client

Because Linux displays a common interface to the file system (via the virtual file system switch [VFS]), Ceph's user perspective is transparent. The perspective of the administrator must be different, considering that many servers contain a potential factor for storage systems. From the user's point of view, they access large-capacity storage systems, but do not know the following aggregation into a large storage pool of metadata servers, monitors, and separate object storage devices. Users simply see an installation point where they can perform standard file I/O.

The Ceph file system-or at least the client interface-is implemented in the Linux kernel. It is important to note that in most file systems, all control and intelligence is performed in the kernel's file system source itself. However, in Ceph, the intelligence of the file system is distributed across nodes, which simplifies the client interface and provides Ceph with large-scale (even dynamic) scaling capabilities.

Ceph uses an interesting alternative instead of relying on the allocation list (which maps the blocks on the disk to the metadata of the specified file). A file in the Linux perspective is assigned to a Inodenumber (INO) from the metadata server, which is a unique identifier for the file. The file is then pushed into some objects (depending on the size of the file). With INO and object number (ONO), each object is assigned to an object ID (OID). Using a simple hash on the OID, each object is assigned to a placement group. A placement group (identified as Pgid) is the conceptual container for an object. Finally, the mapping of the placement group to the object storage device is a pseudo-random mapping using an algorithm called controlled Replication under Scalable Hashing (CRUSH). This way, the mapping of placing groups (and replicas) to storage devices does not rely on any metadata, but rather relies on a pseudo-random mapping function. This is ideal because it minimizes the overhead of storage and simplifies allocation and data querying.

The final component of the assignment is the cluster map. The cluster map is a valid representation of the device and shows the storage cluster. With Pgid and cluster mappings, you can locate any object 1.4.2 Ceph metadata Server

The work of the metadata server (CMDS) is to manage the namespace of the file system. While both metadata and data are stored in the object storage cluster, they are managed separately to support extensibility. In fact, metadata is further split on a single metadata server cluster, and the metadata server is able to replicate and allocate namespaces adaptively to avoid hotspots. As shown in Figure 4, the metadata server manages the namespace portion, which can overlap (for redundancy and performance). The metadata server-to-namespace mapping is performed using dynamic subtree logical partitioning in Ceph, which allows ceph to adjust for changing workloads (migrating namespaces between metadata servers) while preserving the performance location.

Figure 4 Partition of the Ceph namespace for the metadata server

But because each metadata server simply manages the namespace of the client population, its primary application is an intelligent metadata cache (because the actual metadata is ultimately stored in the object storage cluster). The metadata for the write operation is cached in a short-term log, which is eventually pushed into the physical memory. This action allows the metadata server to return the most recent metadata to the customer (which is common in metadata operations). This log is also useful for failback: If the metadata server fails, its logs are replayed to ensure that the metadata is securely stored on disk.

The metadata server manages the Inode space and transforms the file name into metadata. The metadata server transforms the file names into index nodes, file sizes, and segmented data (layouts) used by the Ceph client for file I/O. 1.4.3 Ceph Monitor

Ceph contains monitors that implement cluster mapping management, but some of the features of fault management are performed in the object store itself. When an object storage device fails or a new device is added, the monitor detects and maintains a valid cluster map. This function is performed in a distributed manner, in which the mapping upgrade can communicate with the current traffic. Ceph uses Paxos, which is a series of distributed consensus algorithms. 1.4.4 Ceph Object Storage

Similar to traditional object storage, Ceph storage nodes include not only storage, but also intelligence. The traditional driver is a simple target that responds only to commands from the initiator. But the object storage device is a smart device that can serve as a target and initiator to support communication and collaboration with other object storage devices.

From a storage perspective, the Ceph object storage device performs a mapping from objects to blocks (tasks that are often performed in the client's file system layer). This action allows the local entity to determine in the best way how to store an object. Earlier versions of Ceph implement a custom low-level file system on a local storage called Ebofs. The system implements a non-standard interface to the underlying storage that has been tuned for object semantics and other features, such as asynchronous notifications for disk submissions. Today, the B-tree file system (BTRFS) can be used for storage nodes, which have implemented some of the necessary functions (such as embedded integrity).

because Ceph customers implement CRUSH and have no knowledge of the file mapping blocks on disk, the following storage devices can safely manage object-to-block mappings. This allows the storage node to replicate data (when a device fails). Allocation failure recovery also allows storage system expansion because fault detection and recovery are distributed across ecosystems. Ceph calls it a RADOS (see Figure 3). 2 ceph Mount 2.1 ceph single node installation 2.1.1 Node IP

192.168.14.100 (hostname for CEPH2, two partitions/dev/xvdb1 and/DEV/XVDB2 for OSD, Client/mon/mds installed) 2.1.2 Install Ceph library

# apt-get Install Ceph Ceph-common ceph-mds

# ceph-v # will display Ceph's version and key information 2.1.3 Create a ceph configuration file

#vim/etc/ceph/ceph.conf

[Global]

Max Open files = 131072

#For version 0.55 and beyond, you mustexplicitly enable

#or Disable authentication with "auth" entries in [Global].

Auth Cluster required = None

Auth service required = None

Auth Client required = None

[OSD]

OSD Journal size = 1000

#The following assumes EXT4 filesystem.

Filestore Xattruse Omap = True

#For Bobtail (v 0.56) and subsequentversions, May

#add settings for MKCEPHFS so it willcreate and mount

#the the file system on a particular OSD foryou. Remove the comment ' # '

#character for the following settings andreplace the values

#in braces with appropriate values, orleave the following settings

#commented out to accept the defaultvalues. You must specify the

#--mkfs option with MKCEPHFS in order forthe deployment script to

#utilize the following settings, and Youmust define the ' devs '

#option for each OSD instance; See below.

OSD MKFS type = XFS

OSD MKFS Options XFS =-F #default for Xfsis "-F"

OSD mount Options XFS = rw,noatime # defaultmount option is "Rw,noatime"

#For example, for Ext4, the Mount Optionmight look like this:

#osd mkfs Options Ext4 =user_xattr,rw,noatime

#Execute $ hostname To retrieve the name Ofyour host,

#and Replace {hostname} with the name Ofyour host.

#For the monitor, replace {ip-address} withthe IP

#address of your host.

[Mon.a]

Host = Ceph2

Mon addr = 192.168.14.100:6789

[osd.0]

Host = Ceph2

#For Bobtail (v 0.56) and subsequentversions, May

#add settings for MKCEPHFS so it willcreate and mount

#the the file system on a particular OSD foryou. Remove the comment ' # '

#character for the following setting foreach OSD and specify

#a path to the device if you use Mkcephfswith the--MKFS option.

Devs =/DEV/XVDB1

[Osd.1]

host= CEPH2

devs=/DEV/XVDB2

[MDS.A]

host= CEPH2 2.1.4 Creating Data Catalog

#mkdir –p/var/lib/ceph/osd/ceph-0 #其中有个目录是current, is equivalent to the Sheepdog obj directory, uploaded and created objects are stored in this directory, also includes metadata

#mkdir –p/var/lib/ceph/osd/ceph-1 #同上

#mkdir –p/var/lib/ceph/mon/ceph-a

#mkdir –p/var/lib/ceph/mds/ceph-a 2.1.5 to create partitions and mounts for the OSD

To format the new partition with XFS or btrfs:

# mkfs.xfs-f/DEV/XVDB1

# mkfs.xfs-f/DEV/XVDB2

The first time you must mount a partition to write initialization data:

# mount/dev/xvdb1/var/lib/ceph/osd/ceph-0

# mount/dev/xvdb2/var/lib/ceph/osd/ceph-1 2.1.6 Perform initialization

Note that before each initialization, you need to stop the Ceph service and empty the original data directory:

#/etc/init.d/ceph Stop

# rm-rf/var/lib/ceph/*/ceph-*/*

The initialization can then be performed on the node where Mon is located:

# sudo mkcephfs-a-c/etc/ceph/ceph.conf-k/etc/ceph/ceph1.keyring

Note that once the configuration file ceph.conf has changed, initialization is best done again. 2.1.7 Start the Ceph service

Execute on the node where Mon is located:

# sudo service ceph-a start

Note that when you perform this step, you may experience the following prompt:

= = = Osd.0 = =

Mounting XFS onceph4:/var/lib/ceph/osd/ceph-0

Error enoent:osd.0 does not exist. Create it before updating the crush map

After executing the following command, repeat the above command to start the service, you can resolve:

# ceph OSD Create 2.1.8 perform health check

# sudo ceph Health # can also use the Ceph-s command to view the status

If HEALTH_OK is returned, it represents success.

Note that if you encounter the following prompt:

Health_warn 576 pgs stuckinactive; 576 Pgsstuck Unclean; No Osds

Or you are prompted with the following:

Health_warn 178 pgs peering; 178pgs stuckinactive; 429 pgs stuck unclean; Recovery 2/24 Objects degraded (8.333%)

Execute the following command to resolve:

# Ceph PG Dump_stuck Stale && cephpg dump_stuck inactive && ceph PG Dump_stuck Unclean

If you encounter the following prompt:

Health_warn 384 pgs degraded; 384 pgs Stuckunclean; Recovery 21/42degraded (50%)

Indicates that the number of OSD is not sufficient and ceph defaults to a minimum of two OSD. 2.1.9 ceph Test

The client mounts the node where Mon is located:

# sudo MKDIR/MNT/MYCEPHFS

# sudo mount-t ceph 192.168.73.129:6789://MNT/MYCEPHFS

Client Authentication:

# df-h #如果能查看到 The use of/MNT/MYCEPHFS, the Ceph installation is successful. 2.1.10 Ceph Use

The use of Ceph as object storage is similar to that of sheepdog, mainly the use of commands, especially to note that the pool in Ceph is equivalent to sheepdog in LU,PG equivalent to LUNs, and the objects we create and upload are stored in the PG, You can see it in the current directory that you described earlier. Here are just a few simple commands:

#rados put/gettest.txt test.txt–pool=data upload or download an object named Test.txt to the pool data

#ceph OSD Map Data test.txt View the properties of the object Test.txt, you can see where it resides in the PG

#rados Lspools/rados DF Display Pool

#rados Mkpool test to create a pool

#rados Create Test-object–p test creates an object in the pool test test-object

#rados –p test ls like the object in pool test

#rados ls–p Test |more Ibid.

1.1 Ceph Multi-node installation

For multi-node scenarios, Ceph has the following requirements:

Modify the respective hostname and be able to access each other through the hostname.

Each node can SSH access to each other without entering a password (via the Ssh-keygen command). 1.1.1-Node IP

192.168.14.96 (hostname for ceph, there is a partition/DEV/SDB1 provided to the OSD, installed CLIENT/MON/MDS);

192.168.14.117 (hostname is CEPH1, there is a partition/DEV/SDB1 provided to the OSD);

192.168.14.120 (hostname is CEPH2, there is a partition/DEV/SDB1 provided to the OSD); 1.1.2 Configure host name

Set the appropriate host name on each node, for example:

# Vim/etc/hostname

Ceph1

To modify the/etc/hosts, add the following lines:

192.168.14.96 Ceph

192.168.73.117 CEPH1

192.168.73.120 CEPH2

After each node respectively ping the host name can Ping, then achieve the effect. 1.1.3 Configuring password-free access

Create the RSA key on each node:

# ssh-keygen-t RSA # always press the OK key to

# Touch/root/.ssh/authorized_keys

Configure CEPH1 first so that CEPH1 can access ceph and CEPH2 without a password:

ceph1# scp/root/.ssh/id_rsa.pub CEPH:/ROOT/.SSH/ID_RSA.PUB_CEPH1

ceph1# scp/root/.ssh/id_rsa.pub CEPH2:/ROOT/.SSH/ID_RSA.PUB_CEPH1

ceph1# ssh ceph "cat/root/.ssh/id_rsa.pub_ceph1>>/root/.ssh/authorized_keys"

ceph1# ssh ceph2 "cat/root/.ssh/id_rsa.pub_ceph1>>/root/.ssh/authorized_keys"

The node Ceph and CEPH2 also need to be configured with reference to the above commands. 1.1.4 Installing the Ceph library

Install the Ceph library on each node:

# apt-get Install Ceph Ceph-common ceph-mds

# ceph-v # will display Ceph's version and key information 1.1.5 Create a ceph configuration file

# vim/etc/ceph/ceph.conf

[Global]

Max Open files = 131072

Auth Cluster required = None

Auth service required = None

Auth Client required = None

[OSD]

OSD Journal size = 1000

Filestore Xattruse Omap = True

OSD MKFS type = XFS

OSD MKFS Options XFS =-F

OSD mount Options XFS = Rw,noatime

[Mon.a]

Host = Ceph

Mon addr = 192.168.14.96:6789

[MON.B]

Host = CEPH1

Mon addr = 192.168.14.117:6789

[MON.C]

Host = Ceph2

Mon addr = 192.168.14.120:6789

[osd.0]

Host = Ceph

Devs =/DEV/XVDB1

[Osd.1]

Host = CEPH1

devs=/DEV/XVDB1

[Osd.2]

Host = Ceph2

devs=/DEV/XVDB1

[MDS.A]

Host = Ceph

[mds.b]

Host = CEPH1

[MDS.C]

Host = CEPH2 1.1.6 Create data Directory

Create the data directory on each node:

# mkdir-p/var/lib/ceph/osd/ceph-0

# mkdir-p/var/lib/ceph/osd/ceph-1

# mkdir-p/var/lib/ceph/osd/ceph-2

# mkdir-p/var/lib/ceph/mon/ceph-a

# mkdir-p/var/lib/ceph/mon/ceph-b

# mkdir-p/var/lib/ceph/mon/ceph-c

# mkdir-p/var/lib/ceph/mds/ceph-a

# mkdir-p/var/lib/ceph/mds/ceph-b

# mkdir-p/var/lib/ceph/mds/ceph-c 1.1.7 create partition and mount for OSD

Ceph:

#fdisk/dev/xvdb//Create XBDB1 partition

#mkfs. xfs–f/dev/xvdb1

#mount/dev/xvdb1/var/lib/ceph/osd/ceph-0

CEPH1:

#fdisk/dev/xvdb//Create XBDB1 partition

#mkfs. xfs–f/dev/xvdb1

#mount/dev/xvdb1/var/lib/ceph/osd/ceph-1

CEPH2:

#fdisk/dev/xvdb//Create XBDB1 partition

#mkfs. xfs–f/dev/xvdb1

#mount/dev/xvdb1/var/lib/ceph/osd/ceph-2

The following operations are performed on Ceph to: 1.1.8 Performing initialization

Note that before each initialization, you need to stop the Ceph service on each node and empty the original data directory:

#/etc/init.d/ceph Stop

# rm-rf/var/lib/ceph/*/ceph-*/*

The initialization can then be performed on the node Ceph where Mon resides:

# sudo mkcephfs-a-c/etc/ceph/ceph.conf-k/etc/ceph/ceph.keyring

Note that once the configuration file ceph.conf has changed, initialization is best done again. 1.1.9 Start the Ceph service

Execute on the node Ceph where Mon is located:

# sudo service ceph-a start

Note that when you perform this step, you may experience the following prompt:

= = = Osd.0 = =

Mounting XFS onceph4:/var/lib/ceph/osd/ceph-0

Error enoent:osd.0 does not exist. Create it before updating the crush map

After executing the following command, repeat the above command to start the service, you can resolve:

# ceph OSD Create 1.1.10 perform health check

# sudo ceph Health # can also use the Ceph-s command to view the status

If HEALTH_OK is returned, it represents success.

Note that if you encounter the following prompt:

Health_warn 576 pgs stuckinactive; 576 Pgsstuck Unclean; No Osds

Or you are prompted with the following:

Health_warn 178 pgs peering; 178pgs stuckinactive; 429 pgs stuck unclean; Recovery 2/24 Objects degraded (8.333%)

Execute the following command to resolve:

# Ceph PG Dump_stuck stale &&cephpg dump_stuck inactive && ceph PG Dump_stuck Unclean

If you encounter the following prompt:

Health_warn 384 pgs degraded; 384 pgs Stuckunclean; Recovery 21/42degraded (50%)

Indicates that the number of OSD is not sufficient and ceph defaults to a minimum of two OSD. 1.1.11 ceph Test

Client (node Ceph) mounts the node where Mon resides (node Ceph):

# sudo MKDIR/MNT/MYCEPHFS

# sudo mount-t ceph 192.168.73.131:6789://MNT/MYCEPHFS

Client Authentication:

# df-h #如果能查看到 The use of/MNT/MYCEPHFS, the Ceph installation is successful.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.