Ceph provides three storage methods: Object Storage, block storage, and file system. The following figure shows the architecture of the Ceph storage cluster:
We are mainly concerned about block storage. In the second half of the year, we will gradually transition the virtual machine backend storage from SAN to Ceph. although it is still version 0.94, Ceph is now quite mature. A colleague has been running Ceph in the production environment for more than two years. He has encountered many problems, but finally solved them, it can be seen that Ceph is very stable and reliable.
Prepare the hardware environment
Prepare 6 machines, 3 physical servers for monitoring nodes (mon: ceph-mon1, ceph-mon2, ceph-mon3), 2 physical servers for storage nodes (osd: ceph-osd1, ceph-osd2 ), one virtual machine is used as the management node (adm: ceph-adm ).
Ceph must have an odd number of monitoring nodes and at least three (one can be used for self-playing). ceph-adm is optional, you can put ceph-adm on the monitor, but you can take ceph-adm out of the architecture to see more clearly. Of course, you can also place mon on osd, which is not recommended in the production environment.
The hardware configuration of the ADM server is relatively random. You can use a low-configuration virtual machine to operate and manage Ceph;
Two hard disks of the MON server are used as RAID 1 to install the operating system;
The OSD server uses 10 4 TB hard disks for Ceph storage. Each osd corresponds to one hard disk, and each osd requires one Journal. Therefore, 10 journals are required for 10 hard disks, we use two large-capacity SSD disks as journal, and each SSD is divided into five areas. In this way, each area corresponds to the journal of an osd hard disk, and the remaining two small-capacity SSDs are mounted to the operating system, RAID1.
The configuration list is as follows:
| Hostname | IP Address | Role | Hardware Info |
| ----------- + --------------- + ------- | --------------------------------------------------------- |
| Ceph-adm | 192.168.2.100 | adm | 2 Cores, 4 gb ram, 20 gb disk |
| Ceph-mon1 | 192.168.2.101 | mon | 24 Cores, 64 gb ram, 2x750 gb sas |
| Ceph-mon2 | 192.168.2.102 | mon | 24 Cores, 64 gb ram, 2x750 gb sas |
| Ceph-mon3 | 192.168.2.103 | mon | 24 Cores, 64 gb ram, 2x750 gb sas |
| Ceph-osd1 | 192.168.2.121 | osd | 12 Cores, 64 gb ram, 10x4 tb sas, 2x400 gb ssd, 2x80 gb ssd |
| Ceph-osd2 | 192.168.2.122 | osd | 12 Cores, 64 gb ram, 10x4 tb sas, 2x400 gb ssd, 2x80 gb ssd |
Software environment preparation
All Ceph cluster nodes use CentOS 7.1 (CentOS-7-x86_64-Minimal-1503-01.iso), all file systems use Ceph officially recommended xfs, all node operating systems are installed on RAID1, other hard disks are used separately, no RAID.
After installing CentOS, we need to make some basic configurations on each node (including ceph-adm), such as disabling SELINUX, enabling the firewall port, and synchronizing time:
Disable SELINUX
# Sed-I's/SELINUX = enforcing/SELINUX = disabled/g'/etc/selinux/config
# Setenforce 0
Open the port required by Ceph
# Firewall-cmd -- zone = public -- add-port = 6789/tcp -- permanent
# Firewall-cmd -- zone = public -- add-port = 6800-7100/tcp -- permanent
# Firewall-cmd -- reload
Install EPEL Software Source:
# Rpm-Uvh https://dl.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-5.noarch.rpm
# Yum-y update
# Yum-y upgrade
Ntp synchronization time installation
# Yum-y install ntp ntpdate ntp-doc
# Ntpdate 0.us.pool.ntp.org
# Hwclock -- systohc
# Systemctl enable ntpd. service
# Systemctl start ntpd. service
On each osd server, we need to partition 10 SAS hard disks and create an xfs file system. For two SSD hard disks that use journal, there are five partitions, each of which corresponds to one disk, you do not need to create a file system and leave it to Ceph for processing.
# Parted/dev/sda
GNU parted3.1
Using/dev/sda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(Parted) mklabel gpt
(Parted) mkpart primary xfs 0% 100%
(Parted) quit
# Mkfs. xfs/dev/sda1
Meta-data =/dev/sda1 isize = 256 agcount = 4, agsize = 244188544 blks
= Sectsz= 4096 attr = 2, projid32bit = 1
= Crc = 0 finobt = 0
Data = bsize = 4096 blocks = 976754176, imaxpct = 5
= Sunit = 0 swidth = 0 blks
Naming = version 2 bsize = 4096 ascii-ci = 0 ftype = 0
Log = internal log bsize = 4096 blocks = 476930, version = 2
= Sectsz= 4096 sunit = 1 blks, lazy-count = 1
Realtime = none extsz = 4096 blocks = 0, rtextents = 0
...
The above command line needs to process 10 hard disks, there are too many repeated operations, and servers will be added in the future, written as scripts parted. sh is easy to operate, where/dev/sda | B | d | e | g | h | I | j | k | l is 10 hard disks, /dev/sdc and/dev/sdf are SSD used for journal:
# Vi parted. sh
#! /Bin/bash
Set-e
If [! -X "/sbin/parted"]; then
Echo "This script requires/sbin/parted to run! "> & 2
Exit 1
Fi
DISKS = "a B d e g h I j k l"
For I in $ {DISKS}; do
Echo "Creating partitions on/dev/sd $ {I }..."
Parted-a optimal -- script/dev/sd $ {I} -- mktable gpt
Parted-a optimal -- script/dev/sd $ {I} -- mkpart primary xfs 0% 100%
Sleep 1
# Echo "Formatting/dev/sd $ {I} 1 ..."
Mkfs. xfs-f/dev/sd $ {I} 1 &
Done
SSDS = "c f"
For I in $ {SSDS}; do
Parted-s/dev/sd $ {I} mklabel gpt
Parted-s/dev/sd $ {I} mkpart primary 0% 20%
Parted-s/dev/sd $ {I} mkpart primary 21% 40%
Parted-s/dev/sd $ {I} mkpart primary 41% 60%
Parted-s/dev/sd $ {I} mkpart primary 61% 80%
Parted-s/dev/sd $ {I} mkpart primary 81% 100%
Done
# Sh parted. sh
Run ssh-keygen on ceph-adm to generate the ssh key file. Note that passphrase is empty. Copy the ssh key to each Ceph node:
# Ssh-keygen-t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/. ssh/id_rsa ):
Enter passphrase (empty for no passphrase ):
Enter same passphrase again:
# Ssh-copy-id root @ ceph-mon1
# Ssh-copy-id root @ ceph-mon2
# Ssh-copy-id root @ ceph-mon3
# Ssh-copy-id root @ ceph-osd1
# Ssh-copy-id root @ ceph-osd2
Log on to each node on the ceph-adm and check whether there is no ssh password. Make sure that the annoying connection is no longer confirmed:
# Ssh root @ ceph-mon1
The authenticity of host 'ceph-mon1 (192.168.2.101) 'can't be established.
ECDSA key fingerprint is d7: db: d6: 70: ef: 2e: 56: 7c: 0d: 9c: 62: 75: b2: 47: 34: df.
Are you sure you want to continue connecting (yes/no )? Yes
# Ssh root @ ceph-mon2
# Ssh root @ ceph-mon3
# Ssh root @ ceph-osd1
# Ssh root @ ceph-osd2
Ceph deployment
Compared to manually installing Ceph on each Ceph node, it is much easier to use the ceph-deploy tool for unified installation:
# Rpm-Uvh http://ceph.com/rpm-hammer/el7/noarch/ceph-release-1-1.el7.noarch.rpm
# Yum update-y
# Yum install ceps-deploy-y
Create a ceph working directory and perform subsequent operations under this directory:
# Mkdir ~ /Ceph-cluster
# Cd ~ /Ceph-cluster
Initialize the cluster and tell ceph-deploy which nodes are monitoring nodes. After the command is executed successfully, ceph is generated in the ceps-cluster directory. conf, ceph. log, ceph. mon. keyring and other related files:
# Ceph-deploy new ceph-mon1 ceph-mon2 ceph-mon3
Install Ceph on each Ceph node:
# Ceph-deploy install ceph-adm ceph-mon1 ceph-mon2 ceph-mon3 ceph-osd1 ceph-osd2
Initialize the monitoring node:
# Ceph-deploy mon create-initial
Check the hard disk status of the Ceph storage node:
# Ceph-deploy disk list ceph-osd1
# Ceph-deploy disk list ceph-osd2
Initialize the Ceph hard disk and create an osd storage node. Storage node: Single hard disk: corresponding journal partition, one-to-one correspondence:
Creating a ceph-osd1 storage node
# Ceph-deploy disk zap ceph-osd1: sda ceph-osd1: sdb ceph-osd1: sdd ceph-osd1: sde ceph-osd1: sdg ceph-osd1: sdh ceph-osd1: sdi ceph-osd1: sdj ceph-osd1: sdk ceph-osd1: sdl
# Ceph-deploy osd create ceph-osd1: sda:/dev/sdc1 ceph-osd1: sdb:/dev/sdc2 ceph-osd1: sdd:/dev/sdc3 ceph-osd1: sde:/dev/sdc4 ceph-osd1: sdg:/dev/sdc5 ceph-osd1: sdh:/dev/sdf1 ceph-osd1: sdi:/dev/sdf2 ceph-osd1: sdj:/dev/sdf3 ceph-osd1: sdk:/dev/sdf4 ceph-osd1: sdl:/dev/sdf5
Creating a ceph-osd2 storage node
# Ceph-deploy disk zap ceph-osd2: sda ceph-osd2: sdb ceph-osd2: sdd ceph-osd2: sde ceph-osd2: sdg ceph-osd2: sdh ceph-osd2: sdi ceph-osd2: sdj ceph-osd2: sdk ceph-osd2: sdl
# Ceph-deploy osd create ceph-osd2: sda:/dev/sdc1 ceph-osd2: sdb:/dev/sdc2 ceph-osd2: sdd:/dev/sdc3 ceph-osd2: sde:/dev/sdc4 ceph-osd2: sdg:/dev/sdc5 ceph-osd2: sdh:/dev/sdf1 ceph-osd2: sdi:/dev/sdf2 ceph-osd2: sdj:/dev/sdf3 ceph-osd2: sdk:/dev/sdf4 ceph-osd2: sdl:/dev/sdf5
Finally, we deploy the generated configuration file synchronously from ceph-adm to several other nodes so that the ceph configuration of each node is consistent:
# Ceph-deploy -- overwrite-conf admin ceph-adm ceph-mon1 ceph-mon2 ceph-mon3 ceph-osd1 ceph-osd2
Test
Check whether the configuration is successful?
# Ceph health
HEALTH_WARN too few PGs per OSD (10 <min 30)
Increase the number of PG. Based on the formula Total PGs = (# OSDs * 100)/pool size, determine pg_num (pgp_num should be set to the same as pg_num), so 20*100/2 = 1000, ceph officially recommends that you use an exponential factor of nearly 2, so select 1024. If it succeeds, you can see HEALTH_ OK:
# Ceph osd pool set rbd size 2
Set pool 0 size to 2
# Ceph osd pool set rbd min_size 2
Set pool 0 min_size to 2
# Ceph osd pool set rbd pg_num 1024
Set pool 0 pg_num to 1024
# Ceph osd pool set rbd pgp_num 1024
Set pool 0 pgp_num to 1024
# Ceph health
HEALTH_ OK
More details:
# Ceph-s
Cluster 6349efff-764a-45ec-bfe9-ed8f5fa25186
Health HEALTH_ OK
Monmap e1: 3 mons at {ceph-mon1 = 192.168.2.101: 6789/0, ceph-mon2 = 192.168.2.102: 6789/0, ceph-mon3 = 192.168.2.103: 6789/0}
Election epoch 6, quorum, 2 ceph-mon1, ceph-mon2, ceph-mon3
Osdmap e107: 20 osds: 20 up, 20 in
Pgmap v255: 1024 pgs, 1 pools, 0 bytes data, 0 objects
740 MB used, 74483 GB/74484 GB avail
1024 active + clean
If the operation is correct, write the operation above to the ceph. conf file and synchronize the nodes deployed:
# Vi ceph. conf
[Global]
Fsid = 6349efff-764a-45ec-bfe9-ed8f5fa25186
Mon_initial_members = ceph-mon1, ceph-mon2, ceph-mon3
Mon_host = 192.168.2.101, 192.168.2.102, 192.168.2.103
Auth_cluster_required = cephx
Auth_service_required = cephx
Auth_client_required = cephx
Filestore_xattr_use_omap = true
Osd pool default size = 2
Osd pool default min size = 2
Osd pool default pg num = 1024
Osd pool default pgp num = 1024
# Ceph-deploy admin ceph-adm ceph-mon1 ceph-mon2 ceph-mon3 ceph-osd1 ceph-osd2
If everything can be
If any strange problem cannot be solved during deployment, you can simply delete everything from start to end:
# Ceph-deploy purge ceph-mon1 ceph-mon2 ceph-mon3 ceph-osd1 ceph-osd2
# Ceph-deploy purgedata ceph-mon1 ceph-mon2 ceph-mon3 ceph-osd1 ceph-osd2
# Ceph-deploy forgetkeys
Troubleshooting
In case of any network problems, first confirm that the node can have no password for ssh, and that the firewalls of each node have been disabled or added to the rules:
# Ceph health
14:31:10. 545138 7fce64377700 0 --:/1024052> 192.168.2.101: 6789/0 pipe (0x7fce60027050 sd = 3: 0 s = 1 pgs = 0 cs = 0 l = 1 c = 0x7fce60023e00 ). fault
HEALTH_ OK
# Ssh ceph-mon1
# Firewall-cmd -- zone = public -- add-port = 6789/tcp -- permanent
# Firewall-cmd -- zone = public -- add-port = 6800-7100/tcp -- permanent
# Firewall-cmd -- reload
# Ceph health
HEALTH_ OK
When you install Ceph for the first time, you may encounter various problems. In general, troubleshooting is smooth. With the accumulation of experience, Ceph will be gradually added to the production environment in the second half of this year.
Ceph0.80 installation and use (CentOS7/ceph-deploy)
The main goal of Ceph is to design a POSIX-based distributed file system without SPOF, so that data can be fault-tolerant and seamlessly replicated. See: http://www.oschina.net/p/ceph
At present, most ceph deployment is basically in Ubuntu, because Ceph_fs is enabled by default in its kernel. When CentOS7 is selected (the default file system is XFS, rather than EXT4) as the deployment platform, you need to pay attention to more information, such as loading the ceph file system using a client.
I have read many articles on the Internet, but most of them are not suitable for 0.80 or can be omitted. For example, configure ceph. conf. Therefore, we have installed it several times and summarized this article. In addition, the company that acquired Ceph, Inktank, and released its own version ($1000/cluster), did not enable Ceph_fs in the latest kernel, as a result, many people directly switch to Ubuntu.
1. Prepare the host environment:
Host name IP
Role
OS
Ceph0 10.9.16.96 MON, MDS CentOS7
Ceph1 10.9.16.97 MON, OSD CentOS7
Ceph2 10.9.16.98 OSD, MDS CentOS7
Ceph3 10.9.16.99 OSD, MDS CentOS7
Ceph4 10.9.16.100 MON CentOS7
Client0 10.9.16.89 client CentOS7 (kernel 3.16.2)
Client1 10.9.16.95 client Ubuntu14.04
Deployment suggestions:
We recommend that you use three MON nodes. We recommend that you separate OSD data nodes from the operating system to improve performance. There are at least two Gigabit NICs (only the IP addresses in the cluster are displayed here, and the client access IP addresses are omitted)
2. Preparations(Note: ceph-deploy can be used to directly install ceph, or yum can be used to install it separately)
Make sure that the host name of each machine is correct (in CentOS7, you only need to change/etc/hostname, which is easier than the old version)
Add the corresponding IP address/host name to/etc/hosts on each machine;
Each machine uses the ssh-copy-id to log on to these servers without an ssh password. (it is found that ansible is easy to use)
Disable the firewall (systemctl stop firewalld. service) or open 6789/6800 ~ Port 6900;
Edit/etc/ntp. conf to enable the time service synchronization time. (crontab/ntpdate is unreliable and is not described separately)
Confirm that the epel/remi repo package has been configured; configure the elrepo package on client0 so that yum can upgrade the kernel
Initialize directories on all OSD servers. For example, create a folder in ceph1,/var/local/osd1, and/var/local/osd2 in ceph2.
3. Start installation
(The operations are performed on ceph0)
Generate MON information: ceph-deploy new ceph {0, 1, 4}
Install ceph: ceph-deploy install ceph0 ceph1 ceph2 ceph3 ceph4 (note: If ceph has been installed on each machine using yum, skip this step)
Generate keys: ceph-deploy -- overwrite-conf mon create-initial
Prepare the OSD server: ceph-deploy -- overwrite-conf osd prepare ceph1:/var/local/osd1 ceph2:/var/local/osd2 ceph3:/var/local/osd3
Activate OSD: ceph-deploy osd activate ceph1:/var/local/osd1 ceph2:/var/local/osd2 ceph3:/var/local/osd3
Copy the key to each node: ceph-deploy admin ceph0 ceph1 ceph2 ceph3 ceph4
Check whether it is OK: ceph health.
Install the MDS node: ceph-deploy mds create ceph0 ceph2 ceph3
Check Status:
[Root @ ceph0 ~] # Ceph-s
Cluster 9ddc0226-574d-4e8e-8ff4-bbe9cd838e21
Health HEALTH_ OK
Monmap e1: 2 mons at {ceph0 = 10.9.16.96: 6789/0, ceph1 = 10.9.16.97: 6789/0, ceph4 = 10.9.16.100: 6789/0}, election epoch 4, quorum ceph0, ceph1
Mdsmap e5: 1/1/1 up {0 = ceph0 = up: active}, 1 up: standby
Osdmap e13: 3 osds: 3 up, 3 in
Pgmap v6312: 192 pgs, 3 pools, 1075 MB data, 512 objects
21671 MB used, 32082 MB/53754 MB avail
192 active + clean
IV. Mount problems:
CentOS7 of client0 does not enable the ceph_fs kernel by default. You need to change the kernel. Here, update the kernel directly using yum (which can be compiled manually ):
Yum -- enablerepo = elrepo-kernel install kernel-ml
Grub2-set-default 0
Mkdir/mnt/cephfs
Mount-t ceph 10.9.16.96: 6789, 10.9.16.97: 6789: // mnt/cephfs-o name = admin, secret = AQDnDBhUWGS6GhAARV0CjHB ****** Y1LQzQ =
# The key here is the content in ceph. client. admin. keyring.
# The following content is automatically loaded by/etc/fstab:
10.9.16.96: 6789, 10.9.16.97: 6789: // mnt/ceph name = admin, secret = AQDnDBhUWGS6GhAARV0CjHB ******* Y1LQzQ =, noatime 0 0
Use the Ubuntu command to load the file.
When copying a file, you can use ceph-s to check whether there is a file read/write speed in real time, for example, client io 12515 kB/s wr, 3 op/s.
However, this read/write speed is the speed of ceph (including replication between different servers), rather than the speed from the client to the server provider.
Check whether it is working properly.
V. Installation conclusion:
It is not necessary to edit the ceph. conf file as it is written in most online tutorials. It should be changed only in a specific demand environment.
To configure the cluster intranet and the network that accesses the internet to improve network load efficiency and possible DDOS attacks, you can add the following options to the [global] section of ceph. conf.
[Global]
Public network {public-network-ip-address/netmask}
Cluster network {enter cluster-network-ip-address/netmask}
The default size of ceph osd journal is 0, so you have to go to ceph. set in conf, the log size should be at least twice the value of filestore min sync interval and the product of the estimated throughput: osd journal size = {2 * (expected throughput * filestore min sync interval )} example: osd journal size = 10000 (10 GB)
The meta variable is expanded to the actual cluster name and process name. For example, if the cluster name is ceph (default), you can use the following command to retrieve the osd.0 configuration: ceph -- admin-daemon/var/run/ceph/ceph-osd.0.asok config show | less
Use other port loading methods: mount. ceph monhost1: 7000, monhost2: 7000, monhost3: 7000: // mnt/foo