Preface
The previous article talked about some common cluster operations in ETCD, which mainly covered some common problems that might be encountered, after all, the God (Operational dimension) perspective always sees the problem and then recovers.
For a cluster, it is common to process crashes, physical machine downtime, data migration backups, capacity reduction, and so on. The rest of the operation is nothing more than some common problem-handling.
Backup recovery
Etcd from the strict sense, that is, a storage, but a distributed environment storage, and maintain strong consistency, that is, each time a leader to send a number of instructions, write data, must leader agree follower reply OK to write, And most of the nodes must respond normally.
Therefore, when the data backup, it is possible to back up any node casually.
1. Configure Scheduled tasks for backup
The scheduled task is configured to execute the script at 2 o'clock in the morning every day, keep only seven days of backup, and then back up the data to a fixed directory, the script backup mainly uses the Etcdctl to backup, as follows:
[Root@docker-ce python]# Cat backup.sh
#!/bin/bash
Date_time= ' Date +%y%m%d '
Etcdctl Backup--data-dir/etcd/--backup-dir/python/etcdbak/${date_time}
find/python/etcdbak/-ctime +7-exec rm-r {} \;
To set a timed task:
The error output and normal output are redirected to prevent mail from being sent, resulting in an increase in the number of inode.
2. Data Recovery
When you want to do data recovery, you can use the following steps:
Packages the backup data and sends it to the host to be restored.
Decompression Run ETCD:
In the start Etcd, in addition to specifying the data directory, and must use force-new-cluster parameters, otherwise there will be errors, cluster ID mismatch and other information.
3, about the data storage description
In the directory where the data is stored, at startup, the file directory results are as follows:
The file directory in the backup looks like this:
As can be seen from above, the DB files and TMP files are discarded, the TMP file is discarded primarily uncommitted data records, and the discarded DB information is the cluster and some information.
4, a single node to expand into a cluster
After the backup, then do the decompression error, and then start the ETCD process, start, note the use of the relevant parameters, as follows:
[Root@docker-ce etcd]# etcd--name docker-ce--data-dir/etcd1--initial-advertise-peer-urls http://192.168.1.222:2380 --listen-peer-urls http://192.168.1.222:2380--listen-client-urls http://192.168.1.222:2379,http://127.0.0.1:2379 --advertise-client-urls http://192.168.1.222:2379--initial-cluster-token etcd-cluster--initial-cluster centos= http://192.168.1.22:2380,docker-ce=http://192.168.1.222:2380--force-new-cluster
To add a new member information:
To start the ETCD process on a new machine:
[Root@docker1/]# etcd--name docker1--data-dir/etcd--initial-advertise-peer-urls http://192.168.1.32:2380-- Listen-peer-urls http://192.168.1.32:2380--listen-client-urls http://192.168.1.32:2379,http://127.0.0.1:2379-- Advertise-client-urls http://192.168.1.32:2379--initial-cluster-token etcd-cluster--initial-cluster docker-ce= http://192.168.1.222:2380,docker1=http://192.168.1.32:2380--initial-cluster-state Existing
Note the update of the parameters, otherwise the cluster ID information mismatch, peer information mismatch and other errors.
Problems that may arise
1, the clock is different step
When the clock is different, the error appears as follows:
2018-02-09 05:45:37.636506 W | rafthttp:the Clock difference against peer 5d951def1d1ebd99 is too high [8h0m2.595609129s > 1s]
2018-02-09 05:45:37.717527 W | rafthttp:the Clock difference against peer f83aa3ff91a96c2f is too high [8h0m2.52274509s > 1s]
The time is synchronized.
2, the cluster ID does not match
The main reason is that the data directory is not deleted, then cause the cluster ID mismatch, delete the data directory, and then rejoin.
3, delete the time Data directory error
2018-02-07 22:05:58.539721 I | raft:e0f5fe608dbc732d became follower at term 11
2018-02-07 22:05:58.539833 C | Raft:tocommit is out of range [Lastindex (0)]. Was the raft log corrupted, truncated, or lost?
Panic:tocommit is out of range [Lastindex (0)]. Was the raft log corrupted, truncated, or lost?
Goroutine [Running]:
Github.com/coreos/pkg/capnslog. (*packagelogger). PANICF (0xc4201730e0, 0x559ecf0e5ebc, 0x5d, 0xc420121400, 0x2, 0x2)
/builddir/build/build/etcd-1e1dbb23924672c6cd72c62ee0db2b45f778da71/godeps/_workspace/src/github.com/coreos/ Pkg/capnslog/pkg_logger.go:75 +0x15e
Github.com/coreos/etcd/raft. (*raftlog). Committo (0xc42021a380, 0x19)
/builddir/build/build/etcd-1e1dbb23924672c6cd72c62ee0db2b45f778da71/src/github.com/coreos/etcd/raft/log.go:191 +0x15e
Github.com/coreos/etcd/raft. (*raft). Handleheartbeat (0xc42022c1e0, 0x8, 0xe0f5fe608dbc732d, 0x5d951def1d1ebd99, 0xb, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/builddir/build/build/etcd-1e1dbb23924672c6cd72c62ee0db2b45f778da71/src/github.com/coreos/etcd/raft/raft.go:1100 +0x56
Github.com/coreos/etcd/raft.stepfollower (0xc42022c1e0, 0x8, 0xe0f5fe608dbc732d, 0x5d951def1d1ebd99, 0xb, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/builddir/build/build/etcd-1e1dbb23924672c6cd72c62ee0db2b45f778da71/src/github.com/coreos/etcd/raft/raft.go:1046 +0X2B5
Github.com/coreos/etcd/raft. (*raft). Step (0xc42022c1e0, 0x8, 0xe0f5fe608dbc732d, 0x5d951def1d1ebd99, 0xb, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/builddir/build/build/etcd-1e1dbb23924672c6cd72c62ee0db2b45f778da71/src/github.com/coreos/etcd/raft/raft.go:778 +0x10f9
Github.com/coreos/etcd/raft. (*node). Run (0xc420354000, 0XC42022C1E0)
/builddir/build/build/etcd-1e1dbb23924672c6cd72c62ee0db2b45f778da71/src/github.com/coreos/etcd/raft/node.go:323 +0x67d
Created by Github.com/coreos/etcd/raft.restartnode
/builddir/build/build/etcd-1e1dbb23924672c6cd72c62ee0db2b45f778da71/src/github.com/coreos/etcd/raft/node.go:223 +0x340
This is mainly the need to add nodes as a new node into the cluster, the direct start words because the cluster's file and log files can not be found, thus the error.
To be continued .....