kubernetes资源监控
1. 查看集群资源状况
·k8s集群的master节点一般不会跑业务容器·
kubectl get cs #查看master资源状态 kubectl get node #查看节点 kubectl cluster-info #查看集群状态 kubectl describe pods [pod名] #查看各类资源状态 kubectl get pods -o wide #查看更新信息
2.监控集群资源利用率【metrics-server安装使用】
#可以通过kubectl 来查看资源利用率,但是该命令需要 "heapster" 调用资源才行,如果没有提供资源则会报错如下: [root@k8s-master1 ~]# kubectl top node k8s-node1 Error from server (NotFound): the server could not find the requested resource (get services http:heapster:) #heapster组件已经被弃用,现在使用的是: metrics-server + cAdvisor聚合器来提供资源。 #cAdvisor 已经内置于kubelet组件中。所以需要安装metrics-server
metrics-server架构示意图:
wget https://www.chenleilei.net/soft/k8s/metrics-server.zip #安装metrics-server 或 上传相关包 git clone https://github.com/kubernetes-incubator/metrics-server cd metrics-server/ vim metrics-server-deployment.yaml 31,32行改为: - name: metrics-server image: lizhenliang/metrics-server-amd64:v0.3.1 33行 - --secure-port=4443 下方加入以下两行内容[从github中下载需要添加这两处,从博客下载这里就不用改了已经改好了]: - --kubelet-insecure-tls - --kubelet-preferred-address-types=InternalIP [这里意思是讲链接方式改为内部IP链接] [注意:metrics-server是通过主机名来区分主机的,所以说必须要配置 host解析,metrics-server才能正确的采集到目标。] 修改配置文件 vim /etc/kubernetes/manifests/kube-apiserver.yaml 大约20行 - --enable-admission-plugins=NodeRestriction的下方添加1行代码: - --enable-aggregator-routing=true root@k8s-master1 metrics-server]# ll otal 28 rw-r--r-- 1 root root 397 Mar 15 21:04 aggregated-metrics-reader.yaml rw-r--r-- 1 root root 303 Mar 15 21:04 auth-delegator.yaml rw-r--r-- 1 root root 324 Mar 15 21:04 auth-reader.yaml rw-r--r-- 1 root root 298 Mar 15 21:04 metrics-apiservice.yaml #将merics-server注册到k8s的api中 rw-r--r-- 1 root root 1277 Mar 27 15:57 metrics-server-deployment.yaml #部署metrics-server rw-r--r-- 1 root root 297 Mar 15 21:04 metrics-server-service.yaml rw-r--r-- 1 root root 532 Mar 15 21:04 resource-reader.yaml 修改完毕后直接安装: [root@k8s-master1 metrics-server]# kubectl apply -f . 提示: clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created serviceaccount/metrics-server created deployment.apps/metrics-server created service/metrics-server created clusterrole.rbac.authorization.k8s.io/system:metrics-server created clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created # 重启 kubelet: systemctl restart kubelet 检查安装: [root@k8s-master1 metrics-server]# kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE coredns-7ff77c879f-kvbbh 1/1 Running 0 6d4h coredns-7ff77c879f-lqk9q 1/1 Running 5 6d4h etcd-k8s-master1 1/1 Running 2 8d kube-apiserver-k8s-master1 1/1 Running 2 8d kube-controller-manager-k8s-master1 1/1 Running 2 8d kube-flannel-ds-amd64-8gssm 1/1 Running 5 8d kube-flannel-ds-amd64-gtpwc 1/1 Running 2 8d kube-flannel-ds-amd64-mx4jx 1/1 Running 4 8d kube-proxy-kzwft 1/1 Running 5 8d kube-proxy-rgjmf 1/1 Running 2 8d kube-proxy-vhdpp 1/1 Running 4 8d kube-scheduler-k8s-master1 1/1 Running 2 8d metrics-server-5667498b7d-lmbtr 1/1 Running 0 56s <------metrics-server安装完毕。 查看: [root@k8s-master1 metrics-server]# kubectl get apiservice 看到这行: v1beta1.metrics.k8s.io kube-system/metrics-server True 17s #启动成功 验证配置kubectl top配置: [root@k8s-master1 metrics-server]# kubectl top pods NAME CPU(cores) MEMORY(bytes) nginx-f89759699-6jfdp 0m 2Mi #最开始执行这条命令报错,现在正常: [root@k8s-master1 metrics-server]# kubectl top node k8s-node1 NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% #<---- 成功 k8s-node1 42m 2% 383Mi 13% [root@k8s-master1 metrics-server]# kubectl top node NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% k8s-master1 94m 4% 809Mi 27% #<---- 成功 k8s-node1 48m 2% 385Mi 13% k8s-node2 26m 1% 386Mi 13% #资源利用率排序 [root@k8s-master1 metrics-server]# kubectl top pods -l app=nginx --sort-by=memory NAME CPU(cores) MEMORY(bytes) nginx-f89759699-6jfdp 0m 2Mi
3. 问题排查
[root@k8s-master1 ~]# kubectl get cs NAME STATUS MESSAGE ERROR scheduler Healthy ok controller-manager Healthy ok etcd-0 Healthy {"health":"true"} 如果kubectl get cs查询失败,可能是apiserver出现了问题。 #apiserver 正常可以通过该命令查看状态 [root@k8s-master1 ~]# kubectl cluster-info Kubernetes master is running at https://10.0.0.63:6443 KubeDNS is running at https://10.0.0.63:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'. # 通过dump.log查看集群问题 [root@k8s-master1 ~]# kubectl cluster-info Kubernetes master is running at https://10.0.0.63:6443 KubeDNS is running at https://10.0.0.63:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'. #状态信息写入到文件: kubectl cluster-info dump >a.txt #kubectl describe pod [pod名] #实时观察pod动态 kubectl get pods -w [删除创建pod,这条命令里都会输出出来,并可以显示整理流程]
精彩评论