Ceph is a fully open source distributed storage solution, a network block device, and a file system with high stability, high performance, and high scalability to handle data volumes from terabyte to exabyte levels.
By using innovative scheduling algorithms (CRUSH), active storage nodes, and Peer-to-peer gossip protocols, Ceph avoids the problems of scalability and reliability in traditional centralized control and lookup table.
Ceph is highly regarded throughout the open source community and has been widely used in virtualization platforms (PROXMOX), Cloud computing platforms (OpenStack, CloudStack, Opennebula), container Technology (Docker), and Big Data analytics systems (Hadoop, As an meted server in HDFs).
I've been trying to run Ceph in Docker for almost two years. I still do the work until today. Recently I have invested a little effort in deploying Ceph in Docker.
Before I start technical details, I would like to thank Sean C McCord for his support of this work, and the open source Ceph-docker project was based on Sean's early work.
Now let's look specifically at how to run Ceph in Docker!
Principle
It is a controversial topic to run Ceph in Docker, and many people question the meaning of such operations. While the detection module, metadata server, and Rados gateway are not much of a problem to containerized, things can become tricky for the OSD (Object Storage daemon). The Ceph OSD is optimized for physical machines and has many associations with the underlying hardware. If the physical hard disk fails, the OSD does not work, which poses a problem for containerized scenarios.
Frankly, at some point in the past, I was thinking:
"I don't know why I do this, I just know someone who needs this feature (and of course they probably don't know why). I just think I want to make a technical attempt, so try it! ”
Of course, the idea doesn't sound optimistic, but it's really what I was thinking. My point of view followed a few changes, and I'll explain why it's worth it. Hopefully this explanation will also change your view (my explanation is not just "Docker is cool, so we're going to run everything in Docker!") )。
Many developers have spent a lot of time to containerized their software. In this process, they also use a variety of different tools to build and manage their environment. I wouldn't be surprised if I saw someone using Kubernetes as a management tool.
Some people like to apply the latest technology to production, otherwise they will feel the work is boring. So when they see that their favorite open source storage solutions are being containerized, they will be happy with the way they are "all containerized."
Unlike traditional Yum or apt-get, containers make it easy to upgrade and rewind software: We can publish a new version of Daemons via Docker stop or Docker run. We can even run multiple isolated clusters on a single physical machine. All of these provide great convenience for the development process.
Project
As mentioned above, all the work is based on the early contributions of Sean C McCord, and we have all worked around his work. Now if you use Ceph-docker, you can run each single ceph daemon on Ubuntu or CentOS.
We have a lot of mirrors in the Docker hub, we use Ceph's namespace, so our image prefixes are ceph/<daemon>. We used automatic builds, so each time we integrate a new patch, it triggers a new build, creating a new container image.
As we are now engaged in code refactoring, you will see that there are a lot of mirrored versions. We've been building a separate image for each of the daemon (we'll do this when we integrate the patches). So the monitor, OSD, MDS and RADOSGW each have separate mirrors. This is not the ideal solution, so we are trying to integrate all the components into a mirror called daemon.
This image contains all the modules that you can selectively activate from the command line while running Docker run. If you want to try out our image, we recommend using the Ceph/daemon image. Let me give you an example of how to run it.
Containerized ceph
Monitoring
Since the monitoring module cannot communicate in a NAT-over network, we must use--net=host to open the host's network layer to the container:
$ sudo docker run-d--net=host \
-v/etc/ceph:/etc/ceph \
-v/var/lib/ceph/:/var/lib/ceph \
-E mon_ip=192.168.0.20 \
-E ceph_public_network=192.168.0.0/24 \
Ceph/daemon Mon
You can configure the following options:
MON_IP is the host IP that runs Docker
Mon_name is the name of your monitoring module (default is $ (hostname))
Ceph_public_network is the CIDR of the host running Docker. It and MON_IP must be the same network.
Ceph_cluster_network is a CIDR of the standby network port of the host running Docker, used for backup traffic on the OSD.
Object Storage Daemon
We can now implement allowing each OSD process to run in a separate container. In accordance with the concept of microservices, a container should not run more than one service. And in our case, running multiple OSD processes in the same container, breaking this idea, will, of course, bring additional complexity to the configuration and maintenance of the system.
In such a configuration, we must use--privileged=true to enable processes in the container to access other kernel features such as/dev. Then, we also support other configurations on the basis of the open OSD directory, and the open OSD Directory allows operators to prepare the device for proper preparation.
This allows us to simply open the OSD directory, and the configuration of the OSD (Ceph-osd mkfs) will be done through entry point. The configuration method I described below is the simplest because it only requires you to specify a block device, and the rest is done by entry point.
If you do not want to use--privileged=true can take my second example.
$ sudo docker run-d--net=host \
--privileged=true \
-v/etc/ceph:/etc/ceph \
-v/var/lib/ceph/:/var/lib/ceph \
-v/dev/:/dev/\
-E osd_device=/dev/vdd \
Ceph-daemon Osd_ceph_disk
If you don't want to use--privileged=true, you can also use your favorite configuration management tool to manually configure the OSD.
The following example assumes that you have implemented partitioning and configured the file system. Run the following command to generate your OSD:
$ sudo docker exec <mon-container-id> ceph OSD create.
Then run your container:
Docker Run-v/osds/1:/var/lib/ceph/osd/ceph-1-v/osds/2:/var/lib/ceph/osd/ceph-2
$ sudo docker run-d--net=host \
-v/etc/ceph:/etc/ceph \
-v/var/lib/ceph/:/var/lib/ceph \
-V/OSDS/1:/VAR/LIB/CEPH/OSD/CEPH-1 \
Ceph-daemon osd_disk_directory
The following options can be configured:
Osd_device i is an OSD device, for example:/dev/sdb
Osd_journal used to store the OSD journal device, for example:/dev/sdz
Hostname is the host running the OSD (default is $ (hostname)
Osd_force_zap will force the established device content zapping (default = 0, set to 1 to open)
Osd_journal_size is the size of the OSD JOURNAL (default is 100)
Metadata Server
This component is more intuitive to set up. The only thing to be aware of is that we have access to the Ceph administrator key in Docker. This key is used to generate CEPHFS pools and file systems.
If you run a 0.87 version of Ceph, you don't need this configuration, but we'd better run the latest version!
$ sudo docker run-d--net=host \
-v/var/lib/ceph/:/var/lib/ceph \
-v/etc/ceph:/etc/ceph \
-E cephfs_create=1 \
Ceph-daemon MDS
The following options can be configured:
Mds_name is the name of the metadata server (the default is mds-$ (hostname)).
Cephfs_create will generate a file system for the metadata server (default is 0, set to 1 to open).
Cephfs_name is the name of the metadata file system (default is CEPHFS).
Cephfs_data_pool is the name of the metadata server DATA pool (default is Cephfs_data).
CEPHFS_DATA_POOL_PG is the number of placement groups for DATA POOL (default is 8).
Cephfs_data_pool is the name of the metadata server metadata pool (default is Cephfs_metadata).
CEPHFS_METADATA_POOL_PG is the number of placement groups for METADATA POOL (default is 8).
RADOS Gateway
Civetweb is turned on by default when we deploy Rados gateway. Of course, we can also use different CGI front ends by specifying the address and port:
$ sudo docker run-d--net=host \
-v/var/lib/ceph/:/var/lib/ceph \
-v/etc/ceph:/etc/ceph \
Ceph-daemon RGW
The following options can be configured:
RGW_REMOTE_CGI Specifies whether to use an embedded Web server (default is 0, set to 1 to close).
RGW_REMOTE_CGI_HOST Specifies the remote host that is running the CGI process.
The Rgw_remote_cgi_port is a remote host port running CGI.
The Rgw_civetweb_port is the listening port for the Civetweb (default is 80).
Rgw_name is the name of the Rados gateway instance (default is $ (hostname)).
Follow-up work
Back-end Configuration Storage
Under the default configuration, ceph.conf and all ceph keys are generated during the Monitoring module startup phase. This procedure assumes that you must extend the cluster to multiple nodes to transfer the configuration to all nodes. This operation is not flexible and we want to improve it. One of the options I'm going to make is to use ansible to generate these configuration files and passwords and install them on all machines.
Another approach is to store all configuration information on different back-end servers, such as ETCD or consul.
Deployment Management
The most intuitive solution is to use off-the-shelf ceph-ansible, although I still need to make some changes, but the main work is done. Another option is to use Kubernetes, whose preview version has been released.
Support for other container technologies such as rocket
There's no need to do anything because you can simply ship your Docker image to rocket and run it.
Run Ceph in Docker