Ceph performance tuning-Journal and tcmalloc
Recently, a simple performance test has been conducted on Ceph, and it is found that the performance of Journal and the version of tcmalloc have a great impact on the performance.
Test Results
# rados -p tmppool -b 4096 bench 120 write -t 32 --run-name test1
Object size |
Bw (MB/s) |
Lantency (s) |
Pool size |
Journal |
Tcmalloc version |
Max thread cache |
4 K |
2.676 |
0.0466848 |
3 |
SATA |
2.0 |
|
4 K |
3.669 |
0.0340574 |
2 |
SATA |
2.0 |
|
4 K |
10.169 |
0.0122452 |
2 |
SSD |
2.0 |
|
4 K |
5.34 |
0.0234077 |
3 |
SSD |
2.0 |
|
4 K |
7.62 |
0.0164019 |
3 |
SSD |
2.1 |
|
4 K |
8.838 |
0.0141392 |
3 |
SSD |
2.1 |
2048 M |
You can see:
- (1) SSD journal brings twice the performance improvement;
- (2) Using tcmalloc 2.1 and adjusting the max thread cache parameters also bring about nearly doubled performance improvement;
- (3) The number of replicas has a significant impact on performance.
Tcmalloc Problems
Ceph comes with a tcmalloc of 2.0. During the test, it is found that the CPU usage is very high, almost 90%:
Samples: 265K of event 'cycles', Event count (approx.): 104385445900+ 27.58% libtcmalloc.so.4.1.0 [.] tcmalloc::CentralFreeList::FetchFromSpans()+ 15.25% libtcmalloc.so.4.1.0 [.] tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long,+ 12.20% libtcmalloc.so.4.1.0 [.] tcmalloc::CentralFreeList::ReleaseToSpans(void*)+ 1.63% perf [.] append_chain+ 1.39% libtcmalloc.so.4.1.0 [.] tcmalloc::CentralFreeList::ReleaseListToSpans(void*)+ 1.02% libtcmalloc.so.4.1.0 [.] tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)+ 0.85% libtcmalloc.so.4.1.0 [.] 0x0000000000017e6f+ 0.75% libtcmalloc.so.4.1.0 [.] tcmalloc::ThreadCache::IncreaseCacheLimitLocked()+ 0.67% libc-2.12.so [.] memcpy+ 0.53% libtcmalloc.so.4.1.0 [.] operator delete(void*)
This is because the default value of TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in tcmalloc is too small, resulting in thread competition. The email list has already discussed this issue many times:
- Hitting tcmalloc bug even with patch applied
- Tcmalloc issues
After this parameter is adjusted, the performance is significantly improved and the CPU usage is greatly reduced.
Samples: 280K of event 'cycles', Event count (approx.): 73401082082 3.92% libtcmalloc.so.4.1.2 [.] tcmalloc::CentralFreeList::FetchFromSpans() 3.52% libtcmalloc.so.4.1.2 [.] tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, i 2.41% libtcmalloc.so.4.1.2 [.] 0x0000000000017dcf 1.78% libc-2.12.so [.] memcpy 1.37% libtcmalloc.so.4.1.2 [.] operator delete(void*) 1.32% libtcmalloc.so.4.1.2 [.] tcmalloc::CentralFreeList::ReleaseToSpans(void*) 1.00% [kernel] [k] _raw_spin_lock
Journal-related Journal size
The Journal Size Selection follows the rules below:
osd journal size = {2 * (expected throughput * filestore max sync interval)}
That is to say, the size of the osd journal should be set to twice the size (disk bandwidth * synchronization time. Refer to here.
Journal Storage Medium
Because OSD writes logs first and then writes data asynchronously, the speed of writing journal is crucial. For more information about how to select the Journal storage medium, see here.
SSD: Intel s3500 GB result:
# fio --filename=/data/fio.dat --size=5G --direct=1 --sync=1 --bs=4k --iodepth=1 --numjobs=32 --thread --rw=write --runtime=120 --group_reporting --time_base --name=test_write write: io=3462.8MB, bw=29547KB/s, iops=7386 , runt=120005msec clat (usec): min=99 , max=51201 , avg=4328.97, stdev=382.90 lat (usec): min=99 , max=51201 , avg=4329.26, stdev=382.86
Online journal Adjustment
# ceph osd set nooutset noout# ceph -s cluster 4a680a44-623f-4f5c-83b3-44483ba91872 health HEALTH_WARN noout flag(s) set…
# service ceph stop osd
# cat flush.sh #!/bin/bashi=12num=12end=`expr $i + $num`while [ $i -lt $end ]do ceph-osd -i $i --flush-journal i=$((i+1))done
Add the following content:
[osd] osd journal = /data/ceph/osd$id/journal osd journal size = 5120
# cat mkjournal.sh #!/bin/bashi=12num=12end=`expr $i + $num`while [ $i -lt $end ]do mkdir -p /data/ceph/osd$i ceph-osd -i $i --mkjournal #ceph-osd -i $i --mkjournal i=$((i+1))done
- (6) start ceph-osd deamon
# service ceph start osd
# ceph osd unset noout
Two minor issues
On the ext3 file system, mkjournal reports the following error:
2015-08-17 14:45:30.588136 7fc865b3a800 -1 journal FileJournal::open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2015-08-17 14:45:30.588160 7fc865b3a800 -1 journal FileJournal::open_file : unable to preallocation journal to 5368709120 bytes: (22) Invalid argument 2015-08-17 14:45:30.588171 7fc865b3a800 -1 filestore(/var/lib/ceph/osd/ceph-23) mkjournal error creating journal on /data/ceph/osd23/journal: (22) Invalid argument 2015-08-17 14:45:30.588184 7fc865b3a800 -1 ** ERROR: error creating fresh journal /data/ceph/osd23/journal for object store /var/lib/ceph/osd/ceph-23: (22) Invalid argument
This is because ext3 does not support fallocate:
int FileJournal::_open_file(int64_t oldsize, blksize_t blksize, bool create){... if (create && (oldsize < conf_journal_sz)) { uint64_t newsize(g_conf->osd_journal_size); newsize <<= 20; dout(10) << "_open extending to " << newsize << " bytes" << dendl; ret = ::ftruncate(fd, newsize); if (ret < 0) { int err = errno; derr << "FileJournal::_open_file : unable to extend journal to " << newsize << " bytes: " << cpp_strerror(err) << dendl; return -err; } ret = ::posix_fallocate(fd, 0, newsize); if (ret) { derr << "FileJournal::_open_file : unable to preallocation journal to " << newsize << " bytes: " << cpp_strerror(ret) << dendl; return -ret; } max_size = newsize; }
When journal is a file and the journal file is opened, the following error is output:
2015-08-19 17:27:48.900894 7f1302791800 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
That is, ceph does not use aio in this case. Why ???
Int FileJournal: _ open (bool forwrite, bool create ){... if (S_ISBLK (st. st_mode) {ret = _ open_block_device ();} else {if (aio &&! Force_aio) {derr <"FileJournal: _ open: disabling aio for non-block journal. use "<" journal_force_aio to force use of aio anyway "<dendl; aio = false; // do not Use aio} ret = _ open_file (st. st_size, st. st_blksize, create );}...
Install the distributed storage system Ceph on CentOS 7.1
Ceph environment configuration document PDF
Deploying Ceph on CentOS 6.3
Ceph Installation Process
HOWTO Install Ceph On FC12 and FC Install Ceph Distributed File System
Ceph File System Installation
CentOS 6.2 64-bit installation of Ceph 0.47.2
Ubuntu 12.04 Distributed File System (Ceph)
Install Ceph 0.24 on Fedora 14
Ceph details: click here
Ceph: click here
This article permanently updates the link address: