Some Ideas about CEpH tier

Last Update:2014-08-06 Source: Internet

Author: User

Tags emit

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The CEpH experiment environment has been used within the company for a period of time. It is stable to use the Block devices provided by RBD to create virtual machines and allocate blocks to virtual machines. However, most of the current environment configurations are the default CEpH value, but the journal is separated and written to a separate partition. Later, we plan to use CEpH tier and SSD for some optimizations:

1. Write journal to a separate SSD disk.

2. Use SSD to configure an SSD pool and use this pool as the cache for other pools. This requires CEpH tier.

I searched the internet. There are no articles on this practice and how much performance will be improved after this is done. Therefore, relevant tests will be conducted after the implementation of this solution:

1. CEpH is installed by default.

2. Separate journal to a separate hard disk partition.

3. Separate journal to a separate SSD disk.

4. Add the SSD pool.

Crush settings can be viewed in this article: http://www.sebastien-han.fr/blog/2012/12/07/ceph-2-speed-storage-with-crush/

I. Use Case

Roughly say your infrastructure cocould be based on several type of servers:

Storage nodes full of SSDS Disks
Storage nodes full of SAS Disks
Storage nodes full of SATA Disks

Such handy mecanism is possible with the help of the crush map.

II. A bit about Crush

Crush stands for controlled replication under scalable hashing:

Pseudo-Random placement algorithm
Fast Calculation, no lookup repeatable, deterministic
Ensures even distribution
Stable Mapping
Limited data migration
Rule-based configuration, rule determines data placement
Infrastructure topology aware, the map knows the structure of your infra (nodes, racks, row, datacenter)
Allows weighting, every OSD has a weight

For more details check the CEpH official documentation.

Iii. Setup

What are we going to do?

Retrieve the current crush Map
Decompile the crush Map
Edit it. We will add 2 buckets and 2 rulesets
Recompile the new crush map.
Re-inject the new crush map.

III.1. begin

Grab your current crush map:

$ ceph osd getcrushmap -o ma-crush-map$ crushtool -d ma-crush-map -o ma-crush-map.txt

For the sake of simplicity, let's assume that you have 4 osds:

2 of them are SAS Disks
2 of them are SSD Enterprise

And here is the OSD tree:

$ ceph osd treedumped osdmap tree epoch 621# id    weight  type name   up/down reweight-1  12  pool default-3  12      rack le-rack-2  3           host ceph-010   1               osd.0   up  11   1               osd.1   up  1-4  3           host ceph-022   1               osd.2   up  13   1               osd.3   up  1

III.2. default crush Map

Edit your crush map:

# begin crush map# devicesdevice 0 osd.0device 1 osd.1device 2 osd.2device 3 osd.3# typestype 0 osdtype 1 hosttype 2 racktype 3 rowtype 4 roomtype 5 datacentertype 6 pool# bucketshost ceph-01 {    id -2       # do not change unnecessarily    # weight 3.000    alg straw    hash 0  # rjenkins1    item osd.0 weight 1.000    item osd.1 weight 1.000}host ceph-02 {    id -4       # do not change unnecessarily    # weight 3.000    alg straw    hash 0  # rjenkins1    item osd.2 weight 1.000    item osd.3 weight 1.000}rack le-rack {    id -3       # do not change unnecessarily    # weight 12.000    alg straw    hash 0  # rjenkins1    item ceph-01 weight 2.000    item ceph-02 weight 2.000}pool default {    id -1       # do not change unnecessarily    # weight 12.000    alg straw    hash 0  # rjenkins1    item le-rack weight 4.000}# rulesrule data {    ruleset 0    type replicated    min_size 1    max_size 10    step take default    step chooseleaf firstn 0 type host    step emit}rule metadata {    ruleset 1    type replicated    min_size 1    max_size 10    step take default    step chooseleaf firstn 0 type host    step emit}rule rbd {    ruleset 2    type replicated    min_size 1    max_size 10    step take default    step chooseleaf firstn 0 type host    step emit}# end crush map

III.3. add buckets and rules

Now we have to add 2 new specific rules:

One for the SSD pool
One for the SAS pool

III.3.1. SSD pool

Add a bucket for the pool SSD:

pool ssd {    id -5       # do not change unnecessarily    alg straw    hash 0  # rjenkins1    item osd.0 weight 1.000    item osd.1 weight 1.000}

Add a rule for the bucket nearly created:

rule ssd {    ruleset 3    type replicated    min_size 1    max_size 10    step take ssd    step choose firstn 0 type host    step emit}

III.3.1. SAS pool

Add a bucket for the pool SAS:

pool sas {    id -6       # do not change unnecessarily    alg straw    hash 0  # rjenkins1    item osd.2 weight 1.000    item osd.3 weight 1.000}

Add a rule for the bucket nearly created:

rule sas {    ruleset 4    type replicated    min_size 1    max_size 10    step take sas    step choose firstn 0 type host    step emit}

Eventually recompile and inject the new crush map:

$ crushtool -c ma-crush-map.txt -o ma-nouvelle-crush-map$ ceph osd setcrushmap -i ma-nouvelle-crush-map

III.3. create and configure the pools

Create your 2 new pools:

$ rados mkpool ssdsuccessfully created pool ssd$ rados mkpool sassuccessfully created pool sas

Set the rule set to the pool:

ceph osd pool set ssd crush_ruleset 3ceph osd pool set sas crush_ruleset 4

Check that the changes have been applied successfully:

$ ceph osd dump | grep -E ‘ssd|sas‘pool 3 ‘ssd‘ rep size 2 crush_ruleset 3 object_hash rjenkins pg_num 128 pgp_num 128 last_change 21 owner 0pool 4 ‘sas‘ rep size 2 crush_ruleset 4 object_hash rjenkins pg_num 128 pgp_num 128 last_change 23 owner 0

Just create some random files and put them into your object store:

$ dd if=/dev/zero of=ssd.pool bs=1M count=512 conv=fsync$ dd if=/dev/zero of=sas.pool bs=1M count=512 conv=fsync$ rados -p ssd put ssd.pool ssd.pool.object$ rados -p sas put sas.pool sas.pool.object

Where are PG active?

$ ceph osd map ssd ssd.pool.objectosdmap e260 pool ‘ssd‘ (3) object ‘ssd.pool.object‘ -> pg 3.c5034eb8 (3.0) -> up [1,0] acting [1,0]$ ceph osd map sas sas.pool.objectosdmap e260 pool ‘sas‘ (4) object ‘sas.pool.object‘ -> pg 4.9202e7ee (4.0) -> up [3,2] acting [3,2]

Crush rules! As you can see from this article crush allows you to perform amazing things. The crush map cocould be very complex, but it brings a lot of flexibility! Happy crush mapping ;-)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More