运维开发网

kubernets之安装和问题解决

运维开发网 https://www.qedev.com 2020-11-17 12:09 出处:51CTO 作者:芥蔚
简述了kubernets自己安装学习的流程步骤,和出现的问题以及解决办法

最近在学习kubernets,发现安装的时候还是比较多的坑,这里总结一下,会贴出具体的报错和解决办法

我的环境: Debian10

Docker请自行安装 我用的163源

其他Linux版本都类似

1 国内源安装

kubernets我建议用国内镜像安装比较方便,用了阿里云的镜像 kubernets 记得导入gpg或者设置no gpg check

[email protected]:~# cat /etc/apt/sources.list.d/kubernetes.list
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
[email protected]:~# curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add -
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   653  100   653    0     0   4106      0 --:--:-- --:--:-- --:--:--  4159
OK
# 接着就自行update然后安装吧
[email protected]:~# dpkg -l|grep kub
ii  kubeadm                               1.19.3-00                                    amd64        Kubernetes Cluster Bootstrapping Tool
ii  kubectl                               1.19.3-00                                    amd64        Kubernetes Command Line Tool
ii  kubelet                               1.19.3-00                                    amd64        Kubernetes Node Agent
ii  kubernetes-cni                        0.8.7-00                                     amd64        Kubernetes CNI

2 尝试init初始化的问题和修复

可能出现如下问题,为了避免出现,对应参考解决办法如下,

默认kubeadm init初始化会报错提示 Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting fo等

init的时候加参数--image-repository registry.cn-hangzhou.aliyuncs.com/google_containers 使用阿里云的源

正常的init参数我目前的建议是

[email protected]:~# kubeadm init --image-repository registry.aliyuncs.com/google_containers --pod-network-cidr=10.244.0.0/16 --kubernetes-version=v1.19.3 --apiserver-advertise-address=192.168.0.240   

安装成功以后会提示的几个步骤要做

Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.0.240:6443 --token c4u548.5qa7ujmmm9gtdtb1 \
    --discovery-token-ca-cert-hash sha256:3e1fc4f89c1b7ad896d14b00484e6316a096f7cc18cb87757725233a40de27a3

第一个是创建.cube目录,按照操作复制即可

第二个是需要创建pod的网络,否则会在status kubelet里面报错找不到cni文件,这里创建flannel网络,具体后面再看

[email protected]:~# kubectl apply -f # 如果失败timeout可以自己想办法下载编辑在服务器上
# 附上我这里是我下载的文件,再重新apply -f即可
[email protected]:~# cat kube-flannel.yml
---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: psp.flannel.unprivileged
  annotations:
    seccomp.security.alpha.kubernetes.io/allowedProfileNames: docker/default
    seccomp.security.alpha.kubernetes.io/defaultProfileName: docker/default
    apparmor.security.beta.kubernetes.io/allowedProfileNames: runtime/default
    apparmor.security.beta.kubernetes.io/defaultProfileName: runtime/default
spec:
  privileged: false
  volumes:
  - configMap
  - secret
  - emptyDir
  - hostPath
  allowedHostPaths:
  - pathPrefix: "/etc/cni/net.d"
  - pathPrefix: "/etc/kube-flannel"
  - pathPrefix: "/run/flannel"
  readOnlyRootFilesystem: false
  # Users and groups
  runAsUser:
    rule: RunAsAny
  supplementalGroups:
    rule: RunAsAny
  fsGroup:
    rule: RunAsAny
  # Privilege Escalation
  allowPrivilegeEscalation: false
  defaultAllowPrivilegeEscalation: false
  # Capabilities
  allowedCapabilities: ['NET_ADMIN', 'NET_RAW']
  defaultAddCapabilities: []
  requiredDropCapabilities: []
  # Host namespaces
  hostPID: false
  hostIPC: false
  hostNetwork: true
  hostPorts:
  - min: 0
    max: 65535
  # SELinux
  seLinux:
    # SELinux is unused in CaaSP
    rule: 'RunAsAny'
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: flannel
rules:
- apiGroups: ['extensions']
  resources: ['podsecuritypolicies']
  verbs: ['use']
  resourceNames: ['psp.flannel.unprivileged']
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - nodes/status
  verbs:
  - patch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: flannel
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: flannel
subjects:
- kind: ServiceAccount
  name: flannel
  namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: flannel
  namespace: kube-system
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-flannel-cfg
  namespace: kube-system
  labels:
    tier: node
    app: flannel
data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "cniVersion": "0.3.1",
      "plugins": [
        {
          "type": "flannel",
          "delegate": {
            "hairpinMode": true,
            "isDefaultGateway": true
          }
        },
        {
          "type": "portmap",
          "capabilities": {
            "portMappings": true
          }
        }
      ]
    }
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: kube-flannel-ds
  namespace: kube-system
  labels:
    tier: node
    app: flannel
spec:
  selector:
    matchLabels:
      app: flannel
  template:
    metadata:
      labels:
        tier: node
        app: flannel
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/os
                operator: In
                values:
                - Linux
      hostNetwork: true
      priorityClassName: system-node-critical
      tolerations:
      - operator: Exists
        effect: NoSchedule
      serviceAccountName: flannel
      initContainers:
      - name: install-cni
        image: quay.io/coreos/flannel:v0.13.0
        command:
        - cp
        args:
        - -f
        - /etc/kube-flannel/cni-conf.json
        - /etc/cni/net.d/10-flannel.conflist
        volumeMounts:
        - name: cni
          mountPath: /etc/cni/net.d
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      containers:
      - name: kube-flannel
        image: quay.io/coreos/flannel:v0.13.0
        command:
        - /opt/bin/flanneld
        args:
        - --ip-masq
        - --kube-subnet-mgr
        resources:
          requests:
            cpu: "100m"
            memory: "50Mi"
          limits:
            cpu: "100m"
            memory: "50Mi"
        securityContext:
          privileged: false
          capabilities:
            add: ["NET_ADMIN", "NET_RAW"]
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        volumeMounts:
        - name: run
          mountPath: /run/flannel
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      volumes:
      - name: run
        hostPath:
          path: /run/flannel
      - name: cni
        hostPath:
          path: /etc/cni/net.d
      - name: flannel-cfg
        configMap:
          name: kube-flannel-cfg
          
   # 这里我apply -f

第三步是节点join 这里先不说,下面会说

kubelet系统启动 status报错解决办法重新kubeadm reset--init--daemon-reload--restart docker kubelet

Nov 04 15:14:41 kub-master0 kubelet[23654]: E1104 15:14:41.766086   23654 reflector.go:127] k8s.io/kubernetes/pkg/kubelet/kubelet.go:438: Failed to watch *v1.Node: failed to list

kubectl不能获取到node port8080 refused,The connection to the server localhost:8080 was refused - did you specify the right host or port?

解决办法 见上面的执行init的时候的第一步 mkdir 

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

kubectl不能获取到node Unable to connect to the server: dial tcp: lookup k8svip on 114.114.114.114:53: no such host

解决办法 hosts里面增加对应即可

192.168.0.240 k8svip

kubectl不能获取到node The connection to the server k8svip:16443 was refused - did you specify the right host or port?

默认端口是6443 是否admin.conf指定了其他端口,ss -tnlp看看哪些端口开了

补充一个docker命令tip:docker启动指定端口失败 一定要注意-p在--name后 不能在kubia后 吐了 

[email protected]:~# docker run --name kubia-container -p 8080:8080 -d kubia

尝试join

join的时候报错 The cluster-info ConfigMap does not yet contain a JWS signature for token ID "w7whde", will try again

 最后发现是kubeadm init的时候操作有问题,重新init

以后node端join

 [email protected]:~# rm -rf /etc/kubernetes && systemctl stop kubelet && kubeadm join 192.168.0.240:6443 --token 336xbo.dggeimv5c56ojfe5     --discovery-token-ca-cert-hash sha256:f619ceb139523298fdfdb2ab2a9d039421e82be4a189a0d4fcb6116a8649e5ad --v=5
[kubelet-check] Initial timeout of 40s passed.
timed out waiting for the condition
error uploading crisocket
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/join.runKubeletStartJoinPhase
        /workspace/anago-v1.19.3-rc.0.69+37babbd0e76c11/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/join/kubelet.go:190
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1

发现报错timeout

 参考网上对node做了如下操作

[email protected]:~# kubeadm reset
[email protected]:~# systemctl daemon-reload
[email protected]:~# systemctl restart kubelet
[email protected]:~# rm -rf /etc/kubernetes && systemctl stop kubelet && kubeadm join 192.168.0.240:6443 --token 336xbo.dggeimv5c56ojfe5     --discovery-token-ca-cert-hash sha256:f619ceb139523298fdfdb2ab2a9d039421e82be4a189a0d4fcb6116a8649e5
ad --v=5
#****省略非关键信息
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

但是master上看get nodes还是not READY

[email protected]:~# kubectl get nodes

NAME                         STATUS     ROLES    AGE     VERSION

kub-master0.liuliancao.org   NotReady   master   27m     v1.19.3

kub-master1.liuliancao.org   NotReady   <none>   2m35s   v1.19.3

检查kubelet status知道

Nov 07 17:35:40 kub-master0.liuliancao.org kubelet[85092]: E1107 17:35:40.338282   85092 kubelet.go:2103] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

Nov 07 17:35:41 kub-master0.liuliancao.org kubelet[85092]: W1107 17:35:41.404607   85092 cni.go:239] Unable to update cni config: no networks found in /etc/cni/net.d

初始化下cni net

[email protected]:~# kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

Unable to connect to the server: net/http: TLS handshake timeout

   

解决办法

创建flannel网络,最终成功创建,然后发现还是not ready

[email protected]:~# systemctl daemon-reload  && systemctl restart kubelet  

还有一个问题 master6443 如果起不来的话 重启下docker试试 再重启一下kubelet

再查看node和master状态

[email protected]:~# kubectl get nodes

NAME                         STATUS   ROLES    AGE     VERSION

kub-master0.liuliancao.org   Ready    master   5m59s   v1.19.3

kub-master1.liuliancao.org   Ready    <none>   11s     v1.19.3

至此,kubernets初始化基本完成

最后补充几点

1 检查错误日志和服务状态

systemctl status docker 

systemctl status kubelet

journalctl -f -u kubelet

kubelet logs 

kubectl get events

kubectl get cs

2 出现别人没出现过的错误

重新初始化,初始化记得用

kubeadmin reset

kubeadmin init 加特定参数

systemctl daemon-reload

systemctl restart docker

systemctl restart kubelet

扫码领视频副本.gif

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号