This is a creation in Article, where the information may have evolved or changed.
The topic of this share is: How to install CoreOS + Kubernetes automatically under bare metal. Mainly from the background, working principle, the specific process, the mining pit four to share.
Let's introduce the background.
With the increase of the scale of the light and sound business, the number of online business products and the purchase volume of the server are also getting bigger. When a certain number of levels are reached, it is not possible to use conventional maintenance methods to solve these problems.
Previously, once the volume of business went up, we had to stop working on the development, the environment and the online debugging needed to deploy the business, and finally, only colleagues who were particularly familiar with the business and code would be qualified for the job. To solve these problems, we have been focusing on lxc since the year before, and trying to use it on a small scale for some time, but because of the LXC itself there are a series of problems, such as the limitations of the kernel version and the two development difficulties, not be able to promote on a large scale.
Later, with the development of Docker and the fiery, we have also come into contact with the CoreOS, its AB partition upgrade features are particularly attractive. Use only to find, with the propaganda write not the same, or need to restart the server to upgrade the kernel, but in general, combined with the use of fleet, you can dynamically move the business to other servers to achieve the purpose of smooth upgrade, it is still very good.
I am from the development to the basic platform to maintain the position, so I hope to use the development of the way to solve operational problems, and want to change operations through the way to speed up the development of business. But after the real full-time platform development, realize how difficult it is to build the whole operation and maintenance system. Therefore, in order to build our platform more quickly, the choice of business is the preferred open source framework, and then it is based on the needs of the business to do two times development. Through Google's fame and relatively perfect ecosystem, we finally chose Kubernetes + CoreOS + Docker as the basis for scheduling the entire platform.
Every time we buy a machine, usually hundreds of nodes, so the deployment of the entire platform is very headache, especially CoreOS and kubernetes, must use ladder to install and update, people very uncomfortable. Most of the upfront time is wasted on it!
So, write a simple yoo-installer tool to solve these problems, now share to everyone.
Because want to show the simplest and convenient installation method, the results did not grasp the time, this time the sharing is not enough, so you can ask more questions, I share some of the pits.
Project code I put on GitHub, the code is still perfect, if there is a problem, you can directly on the above to mention issue. In addition, a colleague of our group Zhao Wenlai also contributed to Kubernetes-client's Nodejs version, shared to everyone: Https://github.com/Goyoo/node-Kubernetes-client, Hope to work with you to create a beautiful Docker ecosystem.
The following describes how Yoo-installer works
It is divided into DHCP service, TFTP service, HTTP service, where the HTTP service includes CoreOS boot, installation to hard disk, kubernetes installation and other service script initialization.
PXE boot
Receive DHCP broadcast, get IP
Use TFTP to communicate and transmit CoreOS base imgage
Booting the CoreOS system in memory
After the system starts successfully, download the script, perform the installation kubernetes and other related services.
Here are a few points to share:
The IP of the server. Before installing the server, you should determine the IP of the server, because IP is a very important variable to install later. For example, the ETCD service Ip,kubernetes's master IP needs to be written to the configuration.
We do this: Our servers are high-density blade servers, all equipped with management modules. Through some simple API calls, you can get all the network card Mac information, and the KVM IP to maintain a certain logical relationship, so that the server can ensure the IP order, convenient for future management and maintenance. (You can refer to the Yoo-installer project's app/utils/ipmi/dhcpmaclist.js, this code to get the configuration format required by DHCPD, directly use it).
How to install CoreOS cluster
We have specifically made a menu to meet the needs of various scenarios for deployment.
In Yoo-installer, I divided 6 menus for four use cases.
Corresponding to the official CoreOS cluster architecture, respectively
- Docker Dev Environment on Laptop
- Small Cluster
- Easy Development/testing Cluster
- Production Cluster with central Services
Reference: CoreOS Cluster Architectures
The first is the way the environment is developed. We can develop it in this way when we develop it.
Small-scale clustering approach. Each node has the ETCD service installed, and any one node failure will not affect the overall service.
Develop a test cluster. The advantage of this approach is that without the additional deployment of ETCD services, several nodes are required to test, adding several, very flexible. However, it is not a highly available architecture. Because if Etcd died, the whole cluster would not work. Therefore, it is very suitable for development test use.
On-line production environment cluster architecture. This can basically be done with high availability. Meta information can be used to identify between different service resources.
How is this code implemented?
The main is to change the different startup scripts by setting the Cloud-config-url parameter, such as Cloud-config-url=http://192.168.1.10/config/develop-etcd/pxe.yml
Note that there is a trick in this. By default, CoreOS does not have a password. Sometimes the installation will have some problems, I will add the coreos.autologin tag in the boot parameters, so that after booting, you can go directly into the system, and then find the problem.
See Code: Https://github.com/Goyoo/yoo-i ... fault
This way, we can start the system smoothly.
After booting, we need to install the system to the hard disk, download and install Kubernetes, and initialize some system environment and so on.
How do you do that?
Look at the code first
Here we are
cloud-config-url
has customized a service called
setup.service
。 When the system starts, the appropriate script will be downloaded: pxe.sh.
See the code.
The script is divided into several parts:
- Synchronizing the system time, the new machine may have a clock problem, which will cause the CoreOS to not install properly,
- Hard disk partitions, which are handled according to their own machine conditions.
- installation of the system.
- Download the ready-made kubernetes offline and put it in the appropriate system directory.
- Other scripts.
- Notifies Yoo-installer that the installation is complete and restarted.
So, how do we handle kubernetes automatically?
In CoreOS, there is a cloud-init file that executes automatically each time the system starts. We used this file to automatically build the Kubernetes service.
For example, this central Yaml file.
In this configuration file, there are several parts
Hostname: Easy to identify after configuration.
Ssh_authorized_keys: Can set a springboard key, easy to manage.
Update:coreos version update policy and update version settings.
Fleet:fleet's services.
Units: Specific SYSTEMD units file configuration, which includes ETCD, Fleet, flannel, Docker, Kube API server, kube controller manage, Kube scheduler, Configuration of services such as Kube-register.
The configuration of these services automatically creates the appropriate service files.
We can also directly write files, such as DNS configuration, "Search localhost", I did not find the corresponding function configuration key in Coreos-init, only write files.
Here are a few places to watch out for
The server does not recommend DHCP to dynamically assign IP, even if the Mac is not tied to death. We had a failure, and the DHCP service was abnormal, causing the business network to break.
About the Cloud-init file, CoreOS provides a coreos-cloudinit method that can be done validate. If you modify the init file yourself, it's best to check it, or you'll only see the problem after a reboot.
Coreos-cloudinit does not add validate parameters, can also be executed, such as writing files and other operations, you can see the results directly.
About CoreOS automatically download updates. such as our self-built computer room, the most depressed, not like some foreign public cloud so convenient, directly can download updates. How do you do it? First of all, you have to have a ladder, and then through the special Set update service, plus all_proxy, you can automatically update!
To make it easier to use Yoo-installer, I plan to do two versions, one VM version, once downloaded, directly available, and the other Docker version. Prepared by the insufficient, did not finish, but because I also need, so I will continue to improve!
There's going to be a noise here. When I was encapsulating the DHCPD, the radio in Docker couldn't be sent out because of the bridging problem. If you have a good idea, please tell me, I will invite you to dinner Oh ~
For Yoo-installer, for ease of use, we have made a special UI, the function is not rich, I hope you can actively comment, welcome pr.
q& A
Q: I have a question, just asked a kubernetes to create the pod, to download the image from Gcr.io, was the wall, how do you solve this?
A: There are two ways of doing this.
1. Use a foreign server to download the good one tar package, and then come down and load in.
2. Change the tag, in the hub.docker.com Google seems to have this image, download good, modified into Gc.io tag.
I put a tar bag in the project.
What is the PXE Q:coreos?
The A:PXE feature is provided by the motherboard.
Q: node = server? Ladder?
A: We have a few management servers, as a node to proxy. If necessary, add All_proxy to the specific program variable. Doing so does not affect the normal network.
Q: How to choose the network model? Bridge, the same network segment big network? Openvswitch?
A: We are currently using flannel, this is the CoreOS.
Q: How to consider the network?
A: The network is currently using flannel, the same problem.
Q:yoo Installer Do you have anything to share in the development environment for development developers?
A: I think it might not be appropriate if you just use it for development. Because this is designed for a batch installation deployment. If it is after the development of the pressure test, it can be considered to use Ah, because you want to install, put in the installation of network environment on it.
Q: Do you have a restriction class for k8s API calls?
A: Do you mean to limit resources? You can do it in the config file. If there is a problem with the API, you can discuss with my colleague: @Moon @ Beijing-goyoo.
Q: How many VMS do you normally need for a minimum cluster?
A: This depends on your business scenario. I don't know if you mean small Cluster, there is a picture on it, you need a few.
Q: About the use of k8s, rc,services do you assign permissions?
A: Because we are our own private cloud, there is no allocation of permissions.
Q: How does the network go from Docker container, pod to service, and is it useful for DNS?
A: First of all we own an intranet dns,kubernetes also provides the function of DNS.
Q: I use the virtual machine, inside ran coreos,3 machine to form a small cluster can it?
A: Of course you can. There are three ways to build a cluster, and if you are playing it yourself, it is recommended to use the Discoveryid method.
Q: Today said the container dynamic IP allocation is not good, then how do you do in the actual production?
A: Write the static IP directly. When initializing the installation, the IP is already bound through the Mac, and then the variable is written to the file via IP, which is made into a static IP.
is Q:coreos itself an operating system? What is the difference between the virtual machine, is the installation more convenient, or itself relatively simple?
A:coreos is relatively light, and there is no package Manager by default, that is, you cannot install things casually. The biggest benefit is that there is no dependency on the bottom. and can be upgraded automatically. Of course there are other advantages, you can take a look at the official introduction.
Q: Are there other DHCP services in the environment that you detect?
A: Because it is self-built room, are we maintain, their topology is very clear. No tests are currently available.
Q: I used Docker+etcd+confd+nginx to build a cluster, not for production
Kubernetes This kind of cluster, not very understand, CoreOS is very popular now? I've never played.
A: I'm just holding on to the attitude of trying, some features are very attractive. The culture of our company is like this, encourage trial and error, not afraid of mistakes. So what's the problem, only use it to know. However, there are indeed many pits.
Q: Also, as you have mentioned before the use of Dhcp.server, want to ask you after the deployment of the test environment, IP address can also be changed?
A: You can change ah, as long as the IP configuration file can be. After the general installation will not change. We have also made a reference to modify the tool, if useful, you can also share it to everyone.
Q: How is this coreos system implemented?
A: I put the code directly. There is a menu library, I use it directly.
Q: You should be implementing a bare-metal installation OS and then deploying Kubernetes something, why not use Puppet or installer to do kubernetes?
A: Because the integration into a set of Ah, plug in the network cable, power on, the system is installed.
Q: Gentlemen, you mentioned earlier that you had an accident using DHCP, can you elaborate on it?
A: Due to a hardware failure, DHCP cannot be broadcast.
Q: are non-coreos systems considered for support?
A: I'm going to make this perfect, and then I'll say it well. Welcome PR Ah, at present we have serious shortage of groups, no energy to do too much, only in their own scope of work, by the way the code open source.
Q: What Vxlan is used in the network mode of the whole cluster? Flannel or pipework for network service?
A:flannel
Q: What is the use of HTTP service when booting CoreOS?
A: Download the automatic installation script, download the kubernetes stuff, and do some statistics to connect the whole automatic installation process.
Q: If the environment is made of IPV6, can the software be used directly now?
A: There should be no problem.
Q: Ask a question, after that technology selection will continue the current kubernetes or there will be other? What are the following plans or what are the deficiencies that are currently being planned for improvement?
A: We use kubernetes for the time being. There is no automatic scheduling, that is, according to historical monitoring data to learn, to complete some automatic expansion and contraction capacity. It can also continue to improve the data platform.
Q: If the DHCP server hangs up after the deployment play, how do I handle not letting the deployed environment drop?
A:DHCP can do hot preparation, has been deployed, we have made a static IP, so will not fall off the line
Q: Why use pod to assign IP to a separate container? Kubernetes is the minimum dispatch unit not a pod? Does it make sense to assign IPs to containers?
A:yoo-installer IP Assignment is a physical machine, POD has its own set of overlay
q:flannel the container IP with host IP in a network segment?
A: Not on the same network. Each of them manages their own.
Q: Hardware failure may be a CoreOS hardware driver problem, do you restart the solution or redeploy?
A: Because the machine is more, the fault will appear more. The hook-up of the node will not affect the normal operation of the business, which is the advantage of using Docker AH. Some problems can not be solved, I will directly shut down the machine.
Q: I found a way, first pull down the same name (domestic accessible) image, and then use the #docker Tag command to change to Gcr.io tag can be.
A: This is the most convenient.
Q: How does the entire installation process be automated?
A: After the installation is complete, there will be a successful request to Yoo-cloud, which is counted in this way.
Q: Has been deployed to a static IP is not a container to change the container?
A: A set of automatic scripts, only need to remove the current machine should be assigned IP.
does the q:kubernetes bring the UI to use? How is it?
A: Not good use, simple function.
Q: Could this be deployed in the cloud host?
A: Most of the cloud hosting providers support CoreOS. Specifically, I haven't tested it yet.
Q: They did not separate the OS Installation and service deployment! It should be very rare to use your own company products.
A: We're doing a private hybrid cloud, and the deployment of the service is another set of orchestration tools.
Q: Do you have IP and port bindings when planning your network? Also need to assign IP to the physical machine with Yoo_install?
A: No. That makes the network more complex, in our application scenario, in favor of the sophomore layer.
Q:flannel container to host can I communicate?
A: Can communicate ah, what do you need to do?
Q: What applications do you run inside this set?
A: Most of the business in our company is currently working to advance micro-services.
Q:kubernetes How does port planning work when external service is released?
A: We will prescribe some ports, because our business is a single, mainly HTTP and partial TCP.
Q: Before the application distributed more than one cloud host, the port is unified, put kubernetes cluster, how to deal with?
A: They are in the flannel, they are different IP, so there is no port duplication problem.
Q: What about the loadbanlce of the Kubernetes cluster?
A: You do not support it, you need to complete the corresponding functions. They have designed this part into the form of plugin.
Q: Have you ever encountered a problem with the interruption of TCP long connection to mobile phone service?
A: We also have a long connection, but we haven't put it in the coreos yet. For the mobile side, the middle is very common.
Q: What is the application deployment? Chef/puppet?
A: Now try to use Docker in addition to some of the tools for arranging classes. Because the CoreOS itself is inconvenient to install, and to reduce the underlying dependency. There is also the idea that hot spares are placed on the public cloud so that the future will not have to consider these dependencies. Some simple, in the form of Fleet Global, the experience of running a script globally is also cool.
===========================
The above content is organized according to the August 4, 2015 night Group sharing content. Share people
Wang Peng (TAD), senior research and development manager of Optical Sound Network, head of cloud platform, focus on Docker research and promotion. Currently responsible for the development of Cloud platform, actively build Docker-based resource automatic scheduling, automated construction and testing, to promote the container and micro-service of products. Dockone Weekly will organize the technology to share, welcome interested students add: LIYINGJIESX, into group participation, you want to listen to the topic can give us a message.