這是一個建立於 的文章,其中的資訊可能已經有所發展或是發生改變。
k8s離線安裝包 三步安裝,簡單到難以置信
kubeadm源碼分析
說句實在話,kubeadm的代碼寫的真心一般,品質不是很高。
幾個關鍵點來先說一下kubeadm乾的幾個核心的事:
- kubeadm 產生認證在/etc/kubernetes/pki目錄下
- kubeadm 產生static pod yaml配置,全部在/etc/kubernetes/manifasts下
- kubeadm 產生kubelet配置,kubectl配置等 在/etc/kubernetes下
- kubeadm 通過client go去啟動dns
kubeadm init
代碼入口 cmd/kubeadm/app/cmd/init.go
建議大家去看看cobra
找到Run函數來分析下主要流程:
- 如果認證不存在,就建立認證,所以如果我們有自己的認證可以把它放在/etc/kubernetes/pki下即可, 下文細看如果產生認證
if res, _ := certsphase.UsingExternalCA(i.cfg); !res { if err := certsphase.CreatePKIAssets(i.cfg); err != nil { return err }
- 建立kubeconfig檔案
if err := kubeconfigphase.CreateInitKubeConfigFiles(kubeConfigDir, i.cfg); err != nil { return err }
- 建立manifest檔案,etcd apiserver manager scheduler都在這裡建立, 可以看到如果你的設定檔裡已經寫了etcd的地址了,就不建立了,這我們就可以自己裝etcd叢集,而不用預設單點的etcd,很有用
controlplanephase.CreateInitStaticPodManifestFiles(manifestDir, i.cfg); if len(i.cfg.Etcd.Endpoints) == 0 { if err := etcdphase.CreateLocalEtcdStaticPodManifestFile(manifestDir, i.cfg); err != nil { return fmt.Errorf("error creating local etcd static pod manifest file: %v", err) }}
- 等待APIserver和kubelet啟動成功,這裡就會遇到我們經常遇到的鏡像拉不下來的錯誤,其實有時kubelet因為別的原因也會報這個錯,讓人誤以為是鏡像弄不下來
if err := waitForAPIAndKubelet(waiter); err != nil { ctx := map[string]string{ "Error": fmt.Sprintf("%v", err), "APIServerImage": images.GetCoreImage(kubeadmconstants.KubeAPIServer, i.cfg.GetControlPlaneImageRepository(), i.cfg.KubernetesVersion, i.cfg.UnifiedControlPlaneImage), "ControllerManagerImage": images.GetCoreImage(kubeadmconstants.KubeControllerManager, i.cfg.GetControlPlaneImageRepository(), i.cfg.KubernetesVersion, i.cfg.UnifiedControlPlaneImage), "SchedulerImage": images.GetCoreImage(kubeadmconstants.KubeScheduler, i.cfg.GetControlPlaneImageRepository(), i.cfg.KubernetesVersion, i.cfg.UnifiedControlPlaneImage), } kubeletFailTempl.Execute(out, ctx) return fmt.Errorf("couldn't initialize a Kubernetes cluster")}
- 給master加標籤,加汙點, 所以想要pod調度到master上可以把汙點清除了
if err := markmasterphase.MarkMaster(client, i.cfg.NodeName); err != nil { return fmt.Errorf("error marking master: %v", err)}
- 產生tocken
if err := nodebootstraptokenphase.UpdateOrCreateToken(client, i.cfg.Token, false, i.cfg.TokenTTL.Duration, kubeadmconstants.DefaultTokenUsages, []string{kubeadmconstants.NodeBootstrapTokenAuthGroup}, tokenDescription); err != nil { return fmt.Errorf("error updating or creating token: %v", err)}
- 調用clientgo建立dns和kube-proxy
if err := dnsaddonphase.EnsureDNSAddon(i.cfg, client); err != nil { return fmt.Errorf("error ensuring dns addon: %v", err)}if err := proxyaddonphase.EnsureProxyAddon(i.cfg, client); err != nil { return fmt.Errorf("error ensuring proxy addon: %v", err)}
筆者批判代碼無腦式的一個流程到底,要是筆者操刀定抽象成介面 RenderConf Save Run Clean等,DNS kube-porxy以及其它組件去實現,然後問題就是沒把dns和kubeproxy的配置渲染出來,可能是它們不是static pod的原因, 然後就是join時的bug下文提到
認證產生
迴圈的調用了這一坨函數,我們只需要看其中一兩個即可,其它的都差不多
certActions := []func(cfg *kubeadmapi.MasterConfiguration) error{ CreateCACertAndKeyfiles, CreateAPIServerCertAndKeyFiles, CreateAPIServerKubeletClientCertAndKeyFiles, CreateServiceAccountKeyAndPublicKeyFiles, CreateFrontProxyCACertAndKeyFiles, CreateFrontProxyClientCertAndKeyFiles,}
根憑證產生:
//返回了根憑證的公開金鑰和私密金鑰func NewCACertAndKey() (*x509.Certificate, *rsa.PrivateKey, error) { caCert, caKey, err := pkiutil.NewCertificateAuthority() if err != nil { return nil, nil, fmt.Errorf("failure while generating CA certificate and key: %v", err) } return caCert, caKey, nil}
k8s.io/client-go/util/cert 這個庫裡面有兩個函數,一個產生key的一個產生cert的:
key, err := certutil.NewPrivateKey()config := certutil.Config{ CommonName: "kubernetes",}cert, err := certutil.NewSelfSignedCACert(config, key)
config裡面我們也可以填充一些別的認證資訊:
type Config struct { CommonName string Organization []string AltNames AltNames Usages []x509.ExtKeyUsage}
私密金鑰就是封裝了rsa庫裡面的函數:
"crypto/rsa" "crypto/x509"func NewPrivateKey() (*rsa.PrivateKey, error) { return rsa.GenerateKey(cryptorand.Reader, rsaKeySize)}
自簽認證,所以根憑證裡只有CommonName資訊,Organization相當於沒設定:
func NewSelfSignedCACert(cfg Config, key *rsa.PrivateKey) (*x509.Certificate, error) { now := time.Now() tmpl := x509.Certificate{ SerialNumber: new(big.Int).SetInt64(0), Subject: pkix.Name{ CommonName: cfg.CommonName, Organization: cfg.Organization, }, NotBefore: now.UTC(), NotAfter: now.Add(duration365d * 10).UTC(), KeyUsage: x509.KeyUsageKeyEncipherment | x509.KeyUsageDigitalSignature | x509.KeyUsageCertSign, BasicConstraintsValid: true, IsCA: true, } certDERBytes, err := x509.CreateCertificate(cryptorand.Reader, &tmpl, &tmpl, key.Public(), key) if err != nil { return nil, err } return x509.ParseCertificate(certDERBytes)}
產生好之後把之寫入檔案:
pkiutil.WriteCertAndKey(pkiDir, baseName, cert, key);certutil.WriteCert(certificatePath, certutil.EncodeCertPEM(cert))
這裡調用了pem庫進行了編碼
encoding/pemfunc EncodeCertPEM(cert *x509.Certificate) []byte { block := pem.Block{ Type: CertificateBlockType, Bytes: cert.Raw, } return pem.EncodeToMemory(&block)}
然後我們看apiserver的認證產生:
caCert, caKey, err := loadCertificateAuthorithy(cfg.CertificatesDir, kubeadmconstants.CACertAndKeyBaseName)//從根憑證產生apiserver認證apiCert, apiKey, err := NewAPIServerCertAndKey(cfg, caCert, caKey)
這時需要關注AltNames了比較重要,所有需要訪問master的地址網域名稱都得加進去,對應設定檔中apiServerCertSANs欄位,其它東西與根憑證無差別
config := certutil.Config{ CommonName: kubeadmconstants.APIServerCertCommonName, AltNames: *altNames, Usages: []x509.ExtKeyUsage{x509.ExtKeyUsageServerAuth},}
建立k8s設定檔
可以看到建立了這些檔案
return createKubeConfigFiles( outDir, cfg, kubeadmconstants.AdminKubeConfigFileName, kubeadmconstants.KubeletKubeConfigFileName, kubeadmconstants.ControllerManagerKubeConfigFileName, kubeadmconstants.SchedulerKubeConfigFileName,)
k8s封裝了兩個渲染配置的函數:
區別是你的kubeconfig檔案裡會不會產生token,比如你進入dashboard需要一個token,或者你調用api需要一個token那麼請產生帶token的配置
產生的conf檔案基本一直只是比如ClientName這些東西不同,所以加密後的認證也不同,ClientName會被加密到認證裡,然後k8s取出來當使用者使用
所以重點來了,我們做多租戶時也要這樣去產生。然後給該租戶綁定角色。
return kubeconfigutil.CreateWithToken( spec.APIServer, "kubernetes", spec.ClientName, certutil.EncodeCertPEM(spec.CACert), spec.TokenAuth.Token,), nilreturn kubeconfigutil.CreateWithCerts( spec.APIServer, "kubernetes", spec.ClientName, certutil.EncodeCertPEM(spec.CACert), certutil.EncodePrivateKeyPEM(clientKey), certutil.EncodeCertPEM(clientCert),), nil
然後就是填充Config結構體嘍, 最後寫到檔案裡,略
"k8s.io/client-go/tools/clientcmd/apireturn &clientcmdapi.Config{ Clusters: map[string]*clientcmdapi.Cluster{ clusterName: { Server: serverURL, CertificateAuthorityData: caCert, }, }, Contexts: map[string]*clientcmdapi.Context{ contextName: { Cluster: clusterName, AuthInfo: userName, }, }, AuthInfos: map[string]*clientcmdapi.AuthInfo{}, CurrentContext: contextName,}
建立static pod yaml檔案
這裡返回了apiserver manager scheduler的pod結構體,
specs := GetStaticPodSpecs(cfg, k8sVersion)staticPodSpecs := map[string]v1.Pod{ kubeadmconstants.KubeAPIServer: staticpodutil.ComponentPod(v1.Container{ Name: kubeadmconstants.KubeAPIServer, Image: images.GetCoreImage(kubeadmconstants.KubeAPIServer, cfg.GetControlPlaneImageRepository(), cfg.KubernetesVersion, cfg.UnifiedControlPlaneImage), Command: getAPIServerCommand(cfg, k8sVersion), VolumeMounts: staticpodutil.VolumeMountMapToSlice(mounts.GetVolumeMounts(kubeadmconstants.KubeAPIServer)), LivenessProbe: staticpodutil.ComponentProbe(cfg, kubeadmconstants.KubeAPIServer, int(cfg.API.BindPort), "/healthz", v1.URISchemeHTTPS), Resources: staticpodutil.ComponentResources("250m"), Env: getProxyEnvVars(), }, mounts.GetVolumes(kubeadmconstants.KubeAPIServer)), kubeadmconstants.KubeControllerManager: staticpodutil.ComponentPod(v1.Container{ Name: kubeadmconstants.KubeControllerManager, Image: images.GetCoreImage(kubeadmconstants.KubeControllerManager, cfg.GetControlPlaneImageRepository(), cfg.KubernetesVersion, cfg.UnifiedControlPlaneImage), Command: getControllerManagerCommand(cfg, k8sVersion), VolumeMounts: staticpodutil.VolumeMountMapToSlice(mounts.GetVolumeMounts(kubeadmconstants.KubeControllerManager)), LivenessProbe: staticpodutil.ComponentProbe(cfg, kubeadmconstants.KubeControllerManager, 10252, "/healthz", v1.URISchemeHTTP), Resources: staticpodutil.ComponentResources("200m"), Env: getProxyEnvVars(), }, mounts.GetVolumes(kubeadmconstants.KubeControllerManager)), kubeadmconstants.KubeScheduler: staticpodutil.ComponentPod(v1.Container{ Name: kubeadmconstants.KubeScheduler, Image: images.GetCoreImage(kubeadmconstants.KubeScheduler, cfg.GetControlPlaneImageRepository(), cfg.KubernetesVersion, cfg.UnifiedControlPlaneImage), Command: getSchedulerCommand(cfg), VolumeMounts: staticpodutil.VolumeMountMapToSlice(mounts.GetVolumeMounts(kubeadmconstants.KubeScheduler)), LivenessProbe: staticpodutil.ComponentProbe(cfg, kubeadmconstants.KubeScheduler, 10251, "/healthz", v1.URISchemeHTTP), Resources: staticpodutil.ComponentResources("100m"), Env: getProxyEnvVars(), }, mounts.GetVolumes(kubeadmconstants.KubeScheduler)),}//擷取特定版本的鏡像func GetCoreImage(image, repoPrefix, k8sVersion, overrideImage string) string { if overrideImage != "" { return overrideImage } kubernetesImageTag := kubeadmutil.KubernetesVersionToImageTag(k8sVersion) etcdImageTag := constants.DefaultEtcdVersion etcdImageVersion, err := constants.EtcdSupportedVersion(k8sVersion) if err == nil { etcdImageTag = etcdImageVersion.String() } return map[string]string{ constants.Etcd: fmt.Sprintf("%s/%s-%s:%s", repoPrefix, "etcd", runtime.GOARCH, etcdImageTag), constants.KubeAPIServer: fmt.Sprintf("%s/%s-%s:%s", repoPrefix, "kube-apiserver", runtime.GOARCH, kubernetesImageTag), constants.KubeControllerManager: fmt.Sprintf("%s/%s-%s:%s", repoPrefix, "kube-controller-manager", runtime.GOARCH, kubernetesImageTag), constants.KubeScheduler: fmt.Sprintf("%s/%s-%s:%s", repoPrefix, "kube-scheduler", runtime.GOARCH, kubernetesImageTag), }[image]}//然後就把這個pod寫到檔案裡了,比較簡單 staticpodutil.WriteStaticPodToDisk(componentName, manifestDir, spec);
建立etcd的一樣,不多廢話
等待kubelet啟動成功
這個錯誤非常容易遇到,看到這個基本就是kubelet沒起來,我們需要檢查:selinux swap 和Cgroup driver是不是一致
setenforce 0 && swapoff -a && systemctl restart kubelet如果不行請保證 kubelet的Cgroup driver與docker一致,docker info|grep Cg
go func(errC chan error, waiter apiclient.Waiter) { // This goroutine can only make kubeadm init fail. If this check succeeds, it won't do anything special if err := waiter.WaitForHealthyKubelet(40*time.Second, "http://localhost:10255/healthz"); err != nil { errC <- err }}(errorChan, waiter)go func(errC chan error, waiter apiclient.Waiter) { // This goroutine can only make kubeadm init fail. If this check succeeds, it won't do anything special if err := waiter.WaitForHealthyKubelet(60*time.Second, "http://localhost:10255/healthz/syncloop"); err != nil { errC <- err }}(errorChan, waiter)
建立DNS和kubeproxy
我就是在此發現coreDNS的
if features.Enabled(cfg.FeatureGates, features.CoreDNS) { return coreDNSAddon(cfg, client, k8sVersion)}return kubeDNSAddon(cfg, client, k8sVersion)
然後coreDNS的yaml配置模板直接是寫在代碼裡的:
/app/phases/addons/dns/manifests.go
CoreDNSDeployment = `apiVersion: apps/v1beta2kind: Deploymentmetadata: name: coredns namespace: kube-system labels: k8s-app: kube-dnsspec: replicas: 1 selector: matchLabels: k8s-app: kube-dns template: metadata: labels: k8s-app: kube-dns spec: serviceAccountName: coredns tolerations: - key: CriticalAddonsOnly operator: Exists - key: {{ .MasterTaintKey }}...
然後渲染模板,最後調用k8sapi建立,這種建立方式可以學習一下,雖然有點拙劣,這地方寫的遠不如kubectl好
coreDNSConfigMap := &v1.ConfigMap{}if err := kuberuntime.DecodeInto(legacyscheme.Codecs.UniversalDecoder(), configBytes, coreDNSConfigMap); err != nil { return fmt.Errorf("unable to decode CoreDNS configmap %v", err)}// Create the ConfigMap for CoreDNS or update it in case it already existsif err := apiclient.CreateOrUpdateConfigMap(client, coreDNSConfigMap); err != nil { return err}coreDNSClusterRoles := &rbac.ClusterRole{}if err := kuberuntime.DecodeInto(legacyscheme.Codecs.UniversalDecoder(), []byte(CoreDNSClusterRole), coreDNSClusterRoles); err != nil { return fmt.Errorf("unable to decode CoreDNS clusterroles %v", err)}...
這裡值得一提的是kubeproxy的configmap真應該把apiserver地址傳入進來,允許自訂,因為做高可用時需要指定虛擬ip,得修改,很麻煩
kubeproxy大差不差,不說了,想改的話改: app/phases/addons/proxy/manifests.go
kubeadm join
kubeadm join比較簡單,一句話就可以說清楚,擷取cluster info, 建立kubeconfig,怎麼建立的kubeinit裡面已經說了。帶上token讓kubeadm有許可權
可以拉取
return https.RetrieveValidatedClusterInfo(cfg.DiscoveryFile)cluster info內容type Cluster struct { // LocationOfOrigin indicates where this object came from. It is used for round tripping config post-merge, but never serialized. LocationOfOrigin string // Server is the address of the kubernetes cluster (https://hostname:port). Server string `json:"server"` // InsecureSkipTLSVerify skips the validity check for the server's certificate. This will make your HTTPS connections insecure. // +optional InsecureSkipTLSVerify bool `json:"insecure-skip-tls-verify,omitempty"` // CertificateAuthority is the path to a cert file for the certificate authority. // +optional CertificateAuthority string `json:"certificate-authority,omitempty"` // CertificateAuthorityData contains PEM-encoded certificate authority certificates. Overrides CertificateAuthority // +optional CertificateAuthorityData []byte `json:"certificate-authority-data,omitempty"` // Extensions holds additional information. This is useful for extenders so that reads and writes don't clobber unknown fields // +optional Extensions map[string]runtime.Object `json:"extensions,omitempty"`}return kubeconfigutil.CreateWithToken( clusterinfo.Server, "kubernetes", TokenUser, clusterinfo.CertificateAuthorityData, cfg.TLSBootstrapToken,), nil
CreateWithToken上文提到了不再贅述,這樣就能去產生kubelet設定檔了,然後把kubelet啟動起來即可
kubeadm join的問題就是渲染配置時沒有使用命令列傳入的apiserver地址,而用clusterinfo裡的地址,這不利於我們做高可用,可能我們傳入一個虛擬ip,但是配置裡還是apiser的地址