Ceph performance tuning-Journal and tcmalloc

Source: Internet
Author: User

Ceph performance tuning-Journal and tcmalloc

Recently, a simple performance test has been conducted on Ceph, and it is found that the performance of Journal and the version of tcmalloc have a great impact on the performance.

Test Results
# rados -p tmppool -b 4096  bench 120 write  -t 32 --run-name test1
Object size Bw (MB/s) Lantency (s) Pool size Journal Tcmalloc version Max thread cache
4 K 2.676 0.0466848 3 SATA 2.0  
4 K 3.669 0.0340574 2 SATA 2.0  
4 K 10.169 0.0122452 2 SSD 2.0  
4 K 5.34 0.0234077 3 SSD 2.0  
4 K 7.62 0.0164019 3 SSD 2.1  
4 K 8.838 0.0141392 3 SSD 2.1 2048 M

You can see:

  • (1) SSD journal brings twice the performance improvement;
  • (2) Using tcmalloc 2.1 and adjusting the max thread cache parameters also bring about nearly doubled performance improvement;
  • (3) The number of replicas has a significant impact on performance.
Tcmalloc Problems

Ceph comes with a tcmalloc of 2.0. During the test, it is found that the CPU usage is very high, almost 90%:

Samples: 265K of event 'cycles', Event count (approx.): 104385445900+  27.58%  libtcmalloc.so.4.1.0    [.] tcmalloc::CentralFreeList::FetchFromSpans()+  15.25%  libtcmalloc.so.4.1.0    [.] tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long,+  12.20%  libtcmalloc.so.4.1.0    [.] tcmalloc::CentralFreeList::ReleaseToSpans(void*)+   1.63%  perf                    [.] append_chain+   1.39%  libtcmalloc.so.4.1.0    [.] tcmalloc::CentralFreeList::ReleaseListToSpans(void*)+   1.02%  libtcmalloc.so.4.1.0    [.] tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)+   0.85%  libtcmalloc.so.4.1.0    [.] 0x0000000000017e6f+   0.75%  libtcmalloc.so.4.1.0    [.] tcmalloc::ThreadCache::IncreaseCacheLimitLocked()+   0.67%  libc-2.12.so            [.] memcpy+   0.53%  libtcmalloc.so.4.1.0    [.] operator delete(void*)

This is because the default value of TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in tcmalloc is too small, resulting in thread competition. The email list has already discussed this issue many times:

  • Hitting tcmalloc bug even with patch applied
  • Tcmalloc issues

After this parameter is adjusted, the performance is significantly improved and the CPU usage is greatly reduced.

Samples: 280K of event 'cycles', Event count (approx.): 73401082082  3.92%  libtcmalloc.so.4.1.2    [.] tcmalloc::CentralFreeList::FetchFromSpans()  3.52%  libtcmalloc.so.4.1.2    [.] tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, i  2.41%  libtcmalloc.so.4.1.2    [.] 0x0000000000017dcf  1.78%  libc-2.12.so            [.] memcpy  1.37%  libtcmalloc.so.4.1.2    [.] operator delete(void*)  1.32%  libtcmalloc.so.4.1.2    [.] tcmalloc::CentralFreeList::ReleaseToSpans(void*)  1.00%  [kernel]                [k] _raw_spin_lock
Journal-related Journal size

The Journal Size Selection follows the rules below:

osd journal size = {2 * (expected throughput * filestore max sync interval)}

That is to say, the size of the osd journal should be set to twice the size (disk bandwidth * synchronization time. Refer to here.

Journal Storage Medium

Because OSD writes logs first and then writes data asynchronously, the speed of writing journal is crucial. For more information about how to select the Journal storage medium, see here.

SSD: Intel s3500 GB result:

# fio --filename=/data/fio.dat --size=5G --direct=1 --sync=1 --bs=4k  --iodepth=1 --numjobs=32 --thread  --rw=write --runtime=120 --group_reporting --time_base --name=test_write  write: io=3462.8MB, bw=29547KB/s, iops=7386 , runt=120005msec    clat (usec): min=99 , max=51201 , avg=4328.97, stdev=382.90     lat (usec): min=99 , max=51201 , avg=4329.26, stdev=382.86
Online journal Adjustment
  • (1) set noout
# ceph osd set nooutset noout# ceph -s    cluster 4a680a44-623f-4f5c-83b3-44483ba91872     health HEALTH_WARN noout flag(s) set…
  • (2) stop all osd
# service ceph stop osd
  • (3) flush journal
# cat flush.sh #!/bin/bashi=12num=12end=`expr $i + $num`while [ $i -lt $end ]do        ceph-osd -i $i --flush-journal        i=$((i+1))done
  • (4) change ceph. conf

Add the following content:

[osd] osd journal = /data/ceph/osd$id/journal osd journal size = 5120

  • (5) create new journal
# cat mkjournal.sh #!/bin/bashi=12num=12end=`expr $i + $num`while [ $i -lt $end ]do        mkdir -p /data/ceph/osd$i               ceph-osd -i $i --mkjournal        #ceph-osd -i $i --mkjournal             i=$((i+1))done
  • (6) start ceph-osd deamon
# service ceph start osd
  • (7) clear noout
# ceph osd unset noout
Two minor issues
  • Question 1

On the ext3 file system, mkjournal reports the following error:

2015-08-17 14:45:30.588136 7fc865b3a800 -1 journal FileJournal::open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2015-08-17 14:45:30.588160 7fc865b3a800 -1 journal FileJournal::open_file : unable to preallocation journal to 5368709120 bytes: (22) Invalid argument 2015-08-17 14:45:30.588171 7fc865b3a800 -1 filestore(/var/lib/ceph/osd/ceph-23) mkjournal error creating journal on /data/ceph/osd23/journal: (22) Invalid argument 2015-08-17 14:45:30.588184 7fc865b3a800 -1 ** ERROR: error creating fresh journal /data/ceph/osd23/journal for object store /var/lib/ceph/osd/ceph-23: (22) Invalid argument

This is because ext3 does not support fallocate:

int FileJournal::_open_file(int64_t oldsize, blksize_t blksize,          bool create){...  if (create && (oldsize < conf_journal_sz)) {    uint64_t newsize(g_conf->osd_journal_size);    newsize <<= 20;    dout(10) << "_open extending to " << newsize << " bytes" << dendl;    ret = ::ftruncate(fd, newsize);    if (ret < 0) {      int err = errno;      derr << "FileJournal::_open_file : unable to extend journal to "     << newsize << " bytes: " << cpp_strerror(err) << dendl;      return -err;    }    ret = ::posix_fallocate(fd, 0, newsize);    if (ret) {      derr << "FileJournal::_open_file : unable to preallocation journal to "     << newsize << " bytes: " << cpp_strerror(ret) << dendl;      return -ret;    }    max_size = newsize;  }
  • Question 2

When journal is a file and the journal file is opened, the following error is output:

2015-08-19 17:27:48.900894 7f1302791800 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway

That is, ceph does not use aio in this case. Why ???

Int FileJournal: _ open (bool forwrite, bool create ){... if (S_ISBLK (st. st_mode) {ret = _ open_block_device ();} else {if (aio &&! Force_aio) {derr <"FileJournal: _ open: disabling aio for non-block journal. use "<" journal_force_aio to force use of aio anyway "<dendl; aio = false; // do not Use aio} ret = _ open_file (st. st_size, st. st_blksize, create );}...

 

Install the distributed storage system Ceph on CentOS 7.1

Ceph environment configuration document PDF

Deploying Ceph on CentOS 6.3

Ceph Installation Process

HOWTO Install Ceph On FC12 and FC Install Ceph Distributed File System

Ceph File System Installation

CentOS 6.2 64-bit installation of Ceph 0.47.2

Ubuntu 12.04 Distributed File System (Ceph)

Install Ceph 0.24 on Fedora 14

Ceph details: click here
Ceph: click here

This article permanently updates the link address:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.