最新要闻
- 对标苹果!三星推出27寸ViewFinity S9显示器:5K屏、全金属机身
- 当前报道:免费十年突然改版!游侠对战平台宣布每天只能免费一小时
- 焦点日报:价格下来了:东方基因快速检测试剂盒4.9元/份
- 天天微头条丨每5辆新车就有4辆电动车 挪威电动车市占率近80%全球第一高
- 灭菌独立装 袋鼠医生N95医用口罩30片45元大促
- 全球百事通!最大尺寸iPad被砍!曝苹果不打算在今年发14.1英寸iPad Pro
- 环球即时:比亚迪百万级越野车 仰望R1最新预告:首搭最具辨识度的尾灯
- Firefox遭意外“误伤”:新版本被错误识别为IE 11
- 特斯拉全年销量破130万辆 还是全球纯电动车销冠
- 【全球快播报】如何不动声色的让身边的人知道你买RTX 4090了?
- 世界速看:联合三个国家尖端技术!38年后Intel重回老本行 让存储性能爆炸性提高
- 全球实时:西安高铁站私家车上站台接送乘客引争议:VIP贵宾服务 存在多年
- 2023年第一涨 国内油价今晚迎来调整 每升或上调0.2元
- 当前观点:你想成特斯拉车主吗?分析称Model 2快要来了 售价17万
- 世界热议:苹果中国工厂都要搬走?印度产iPhone加大 还要拿下iPad、Mac等
- 速看:AMD Zen4锐龙狂降价作用不大:德国电商销量不及Zen3五分之一
手机
iphone11大小尺寸是多少?苹果iPhone11和iPhone13的区别是什么?
警方通报辅警执法直播中被撞飞:犯罪嫌疑人已投案
- iphone11大小尺寸是多少?苹果iPhone11和iPhone13的区别是什么?
- 警方通报辅警执法直播中被撞飞:犯罪嫌疑人已投案
- 男子被关545天申国赔:获赔18万多 驳回精神抚慰金
- 3天内26名本土感染者,辽宁确诊人数已超安徽
- 广西柳州一男子因纠纷杀害三人后自首
- 洱海坠机4名机组人员被批准为烈士 数千干部群众悼念
家电
时讯:Kubernetes监控手册05-监控Kubelet
上一篇我们介绍了如何监控Kube-Proxy,Kube-Proxy的/metrics
接口没有认证,相对比较容易,这一篇我们介绍一下Kubelet,Kubelet的监控相比Kube-Proxy增加了认证机制,相对更复杂一些。
Kubelet 端口说明
如果你有多台Node节点,可以批量执行ss -tlnp|grep kubelet
看一下,Kubelet 监听两个固定端口(我的环境,你的环境可能不同),一个是10248,一个是10250,通过下面的命令可以知道,10248是健康检查的端口:
(相关资料图)
[root@tt-fc-dev01.nj ~]# ps aux|grep kubeletroot 163490 0.0 0.0 12136 1064 pts/1 S+ 13:34 0:00 grep --color=auto kubeletroot 166673 3.2 1.0 3517060 81336 ? Ssl Aug16 4176:52 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --hostname-override=10.206.0.16 --network-plugin=cni --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.6[root@tt-fc-dev01.nj ~]# cat /var/lib/kubelet/config.yaml | grep 102healthzPort: 10248[root@tt-fc-dev01.nj ~]# curl localhost:10248/healthzok
我们再看一下 10250,10250实际是Kubelet的默认端口,/metrics
接口就是在这个端口暴露的,我们请求一下:
[root@tt-fc-dev01.nj ~]# curl localhost:10250/metricsClient sent an HTTP request to an HTTPS server.[root@tt-fc-dev01.nj ~]# curl https://localhost:10250/metricscurl: (60) SSL certificate problem: self signed certificate in certificate chainMore details here: https://curl.haxx.se/docs/sslcerts.htmlcurl failed to verify the legitimacy of the server and therefore could notestablish a secure connection to it. To learn more about this situation andhow to fix it, please visit the web page mentioned above.[root@tt-fc-dev01.nj ~]# curl -k https://localhost:10250/metricsUnauthorized
-k
表示不校验SSL证书是否正确,最后的命令可以看到返回了 Unauthorized,表示认证失败,我们先来解决一下认证问题。认证是 Kubernetes 的一个知识点,这里先不展开(你需要Google一下了解基本常识),直接实操。
认证信息
下面的信息可以保存为 auth.yaml,创建了 ClusterRole、ServiceAccount、ClusterRoleBinding。
---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRolemetadata: name: categraf-daemonsetrules:- apiGroups: - "" resources: - nodes/metrics - nodes/stats - nodes/proxy verbs: - get---apiVersion: v1kind: ServiceAccountmetadata: name: categraf-daemonset namespace: flashcat---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRoleBindingmetadata: name: categraf-daemonsetroleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: categraf-daemonsetsubjects:- kind: ServiceAccount name: categraf-daemonset namespace: flashcat
ClusterRole是个全局概念,不属于任一个namespace,定义了很多权限点,都是读权限,监控嘛,读权限就可以了,ServiceAccount则是namespace颗粒度的一个概念,这里我们创建了一个名为categraf-daemonset的ServiceAccount,然后绑定到ClusterRole上面,具备了各种查询权限。apply一下即可:
[work@tt-fc-dev01.nj yamls]$ kubectl apply -f auth.yamlclusterrole.rbac.authorization.k8s.io/categraf-daemonset createdserviceaccount/categraf-daemonset createdclusterrolebinding.rbac.authorization.k8s.io/categraf-daemonset created[work@tt-fc-dev01.nj yamls]$ kubectl get ClusterRole | grep categraf-daemoncategraf-daemonset 2022-11-14T03:53:54Z[work@tt-fc-dev01.nj yamls]$ kubectl get sa -n flashcatNAME SECRETS AGEcategraf-daemonset 1 90mdefault 1 4d23h[work@tt-fc-dev01.nj yamls]$ kubectl get ClusterRoleBinding -n flashcat | grep categraf-daemoncategraf-daemonset ClusterRole/categraf-daemonset 91m
测试权限
上面的命令行输出可以看出来,我们已经成功创建了 ServiceAccount,把ServiceAccount的内容打印出来看一下:
[root@tt-fc-dev01.nj qinxiaohui]# kubectl get sa categraf-daemonset -n flashcat -o yamlapiVersion: v1kind: ServiceAccountmetadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"v1","kind":"ServiceAccount","metadata":{"annotations":{},"name":"categraf-daemonset","namespace":"flashcat"}} creationTimestamp: "2022-11-14T03:53:54Z" name: categraf-daemonset namespace: flashcat resourceVersion: "120570510" uid: 22f5a785-871c-4454-b82e-12bf104450a0secrets:- name: categraf-daemonset-token-7mccq
注意最后两行,这个ServiceAccount实际是关联了一个Secret,我们再看看这个Secret的内容:
[root@tt-fc-dev01.nj qinxiaohui]# kubectl get secret categraf-daemonset-token-7mccq -n flashcat -o yamlapiVersion: v1data: ca.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUMvakNDQWVhZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRJeU1ERXdPVEF4TXpjek9Gb1hEVE15TURFd056QXhNemN6T0Zvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBS2F1Ck9wU3hHdXB0ZlNraW1zbmlONFVLWnp2b1p6akdoTks1eUVlZWFPcmptdXIwdTFVYlFHbTBRWlpMem8xVi9GV1gKVERBOUthcFRNVllyS2hBQjNCVXdqdGhCaFp1NjJVQzg5TmRNSDVzNFdmMGtMNENYZWQ3V2g2R05Md0MyQ2xKRwp3Tmp1UkZRTndxMWhNWjY4MGlaT1hLZk1NbEt6bWY4aDJWZmthREdpVHk0VzZHWE5sRlRJSFFkVFBVMHVMY3dYCmc1cUVsMkd2cklmd05JSXBOV3ZoOEJvaFhyc1pOZVNlNHhGMVFqY0R2QVE4Q0xta2J2T011UGI5bGtwalBCMmsKV055RTVtVEZCZ2NCQ3dzSGhjUHhyN0E3cXJXMmtxbU1MbUJpc2dHZm9ieXFWZy90cTYzS1oxYlRvWjBIbXhicQp6TkpOZUJpbm9jbi8xblJBK3NrQ0F3RUFBYU5aTUZjd0RnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCCi93UUZNQU1CQWY4d0hRWURWUjBPQkJZRUZLVkxrbVQ5RTNwTmp3aThsck5UdXVtRm1MWHNNQlVHQTFVZEVRUU8KTUF5Q0NtdDFZbVZ5Ym1WMFpYTXdEUVlKS29aSWh2Y05BUUVMQlFBRGdnRUJBSm5QR24rR012S1ZadFVtZVc2bQoxanY2SmYvNlBFS2JzSHRkN2dINHdwREI3YW9pQVBPeTE0bVlYL2d5WWgyZHdsRk9hTWllVS9vUFlmRDRUdGxGCkZMT08yVkdLVTJBSmFNYnVBekw4ZTlsTFREM0xLOGFJUm1FWFBhQkR2V3VUYXZuSTZCWDhiNUs4SndraVd0R24KUFh0ejZhOXZDK1BoaWZDR0phMkNxQWtJV0Nrc0lWenNJcWJ0dkEvb1pHK1dhMlduemFlMC9OUFl4QS8waldOMwpVcGtDWllFaUQ4VlUwenRIMmNRTFE4Z2Mrb21uc3ljaHNjaW5KN3JsZS9XbVFES3ZhVUxLL0xKVTU0Vm1DM2grCnZkaWZtQStlaFZVZnJaTWx6SEZRbWdzMVJGMU9VczNWWUd0REt5YW9uRkc0VFlKa1NvM0IvRlZOQ0ZtcnNHUTYKZWV3PQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg== namespace: Zmxhc2hjYXQ= token: ZXlKaGJHY2lPaUpTVXpJMU5pSXNJbXRwWkNJNklqRTJZVTlNU2pObFFVbEhlbmhDV1dsVmFIcEVTRlZVWVdoZlZVaDZSbmd6TUZGZlVWUjJUR0pzVUVraWZRLmV5SnBjM01pT2lKcmRXSmxjbTVsZEdWekwzTmxjblpwWTJWaFkyTnZkVzUwSWl3aWEzVmlaWEp1WlhSbGN5NXBieTl6WlhKMmFXTmxZV05qYjNWdWRDOXVZVzFsYzNCaFkyVWlPaUptYkdGemFHTmhkQ0lzSW10MVltVnlibVYwWlhNdWFXOHZjMlZ5ZG1salpXRmpZMjkxYm5RdmMyVmpjbVYwTG01aGJXVWlPaUpqWVhSbFozSmhaaTFrWVdWdGIyNXpaWFF0ZEc5clpXNHROMjFqWTNFaUxDSnJkV0psY201bGRHVnpMbWx2TDNObGNuWnBZMlZoWTJOdmRXNTBMM05sY25acFkyVXRZV05qYjNWdWRDNXVZVzFsSWpvaVkyRjBaV2R5WVdZdFpHRmxiVzl1YzJWMElpd2lhM1ZpWlhKdVpYUmxjeTVwYnk5elpYSjJhV05sWVdOamIzVnVkQzl6WlhKMmFXTmxMV0ZqWTI5MWJuUXVkV2xrSWpvaU1qSm1OV0UzT0RVdE9EY3hZeTAwTkRVMExXSTRNbVV0TVRKaVpqRXdORFExTUdFd0lpd2ljM1ZpSWpvaWMzbHpkR1Z0T25ObGNuWnBZMlZoWTJOdmRXNTBPbVpzWVhOb1kyRjBPbU5oZEdWbmNtRm1MV1JoWlcxdmJuTmxkQ0o5Lm03czJ2Z1JuZDJzMDJOUkVwakdpc0JYLVBiQjBiRjdTRUFqb2RjSk9KLWh6YWhzZU5FSDFjNGNDbXotMDN5Z1Rkal9NT1VKaWpCalRmaW9FSWpGZHRCS0hEMnNjNXlkbDIwbjU4VTBSVXVDemRYQl9tY0J1WDlWWFM2bE5zYVAxSXNMSGdscV9Sbm5XcDZaNmlCaWp6SU05QUNuckY3MGYtd1FZTkVLc2MzdGhubmhSX3E5MkdkZnhmdGU2NmhTRGthdGhPVFRuNmJ3ZnZMYVMxV1JCdEZ4WUlwdkJmVXpkQ1FBNVhRYVNPck00RFluTE5uVzAxWDNqUGVZSW5ka3NaQ256cmV6Tnp2OEt5VFRTSlJ2VHVKMlZOU2lHaDhxTEgyZ3IzenhtQm5Qb1d0czdYeFhBTkJadG0yd0E2OE5FXzY0SlVYS0tfTlhfYmxBbFViakwtUQ==kind: Secretmetadata: annotations: kubernetes.io/service-account.name: categraf-daemonset kubernetes.io/service-account.uid: 22f5a785-871c-4454-b82e-12bf104450a0 creationTimestamp: "2022-11-14T03:53:54Z" name: categraf-daemonset-token-7mccq namespace: flashcat resourceVersion: "120570509" uid: 0a228da5-6e60-4b22-beff-65cc56683e41type: kubernetes.io/service-account-token
我们把这个token字段拿到,然后base64转码一下,作为Bearer Token来请求测试一下:
[root@tt-fc-dev01.nj qinxiaohui]# token=`kubectl get secret categraf-daemonset-token-7mccq -n flashcat -o jsonpath={.data.token} | base64 -d`[root@tt-fc-dev01.nj qinxiaohui]# curl -s -k -H "Authorization: Bearer $token" https://localhost:10250/metrics > aaaa[root@tt-fc-dev01.nj qinxiaohui]# head -n 5 aaaa# HELP apiserver_audit_event_total [ALPHA] Counter of audit events generated and sent to the audit backend.# TYPE apiserver_audit_event_total counterapiserver_audit_event_total 0# HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error in audit logging backend.# TYPE apiserver_audit_requests_rejected_total counterapiserver_audit_requests_rejected_total 0
通了!
这就说明我们创建的ServiceAccount是好使的,后面我们把 Categraf 作为采集器搞成 Daemonset,再为 Categraf 这个 Daemonset 指定 ServiceAccountName,Kubernetes就会自动把 Token 的内容挂到 Daemonset 的目录里,下面开始实操。
升级 Daemonset
上一篇咱们为 Kube-Proxy 的采集准备了 Daemonset,咱们就继续修改这个 Daemonset,让这个 Daemonset 不但可以采集 Kube-Proxy,也可以采集 Kubelet,先给 Categraf 准备一下相关的配置,可以把下面的内容保存为 categraf-configmap-v2.yaml
---kind: ConfigMapmetadata: name: categraf-configapiVersion: v1data: config.toml: | [global] hostname = "$HOSTNAME" interval = 15 providers = ["local"] [writer_opt] batch = 2000 chan_size = 10000 [[writers]] url = "http://10.206.0.16:19000/prometheus/v1/write" timeout = 5000 dial_timeout = 2500 max_idle_conns_per_host = 100 ---kind: ConfigMapmetadata: name: categraf-input-prometheusapiVersion: v1data: prometheus.toml: | [[instances]] urls = ["http://127.0.0.1:10249/metrics"] labels = { job="kube-proxy" } [[instances]] urls = ["https://127.0.0.1:10250/metrics"] bearer_token_file = "/var/run/secrets/kubernetes.io/serviceaccount/token" use_tls = true insecure_skip_verify = true labels = { job="kubelet" } [[instances]] urls = ["https://127.0.0.1:10250/metrics/cadvisor"] bearer_token_file = "/var/run/secrets/kubernetes.io/serviceaccount/token" use_tls = true insecure_skip_verify = true labels = { job="cadvisor" }
apply 一下,让新的配置生效:
[work@tt-fc-dev01.nj yamls]$ kubectl apply -f categraf-configmap-v2.yaml -n flashcatconfigmap/categraf-config unchangedconfigmap/categraf-input-prometheus configured
Categraf 的 Daemonset 需要把 ServiceAccountName 给绑定上,上一讲咱们用的 yaml 命名为:categraf-daemonset-v1.yaml ,咱们升级一下这个文件到 categraf-daemonset-v2.yaml 版本,内容如下:
apiVersion: apps/v1kind: DaemonSetmetadata: labels: app: categraf-daemonset name: categraf-daemonsetspec: selector: matchLabels: app: categraf-daemonset template: metadata: labels: app: categraf-daemonset spec: containers: - env: - name: TZ value: Asia/Shanghai - name: HOSTNAME valueFrom: fieldRef: apiVersion: v1 fieldPath: spec.nodeName - name: HOSTIP valueFrom: fieldRef: apiVersion: v1 fieldPath: status.hostIP image: flashcatcloud/categraf:v0.2.18 imagePullPolicy: IfNotPresent name: categraf volumeMounts: - mountPath: /etc/categraf/conf name: categraf-config - mountPath: /etc/categraf/conf/input.prometheus name: categraf-input-prometheus hostNetwork: true serviceAccountName: categraf-daemonset restartPolicy: Always tolerations: - effect: NoSchedule operator: Exists volumes: - configMap: name: categraf-config name: categraf-config - configMap: name: categraf-input-prometheus name: categraf-input-prometheus
这里跟 v1 版本相比,唯一的变化,就是加了serviceAccountName: categraf-daemonset
这个配置,把原来的 Daemonset 删掉,从新创建一下:
[work@tt-fc-dev01.nj yamls]$ kubectl delete ds categraf-daemonset -n flashcatdaemonset.apps "categraf-daemonset" deleted[work@tt-fc-dev01.nj yamls]$ kubectl apply -f categraf-daemonset-v2.yaml -n flashcatdaemonset.apps/categraf-daemonset created# waiting...[work@tt-fc-dev01.nj yamls]$ kubectl get pods -n flashcatNAME READY STATUS RESTARTS AGEcategraf-daemonset-d8jt8 1/1 Running 0 37scategraf-daemonset-fpx8v 1/1 Running 0 43scategraf-daemonset-mp468 1/1 Running 0 32scategraf-daemonset-s775l 1/1 Running 0 40scategraf-daemonset-wxkjk 1/1 Running 0 47scategraf-daemonset-zwscc 1/1 Running 0 35s
好了,我们去检查一下数据是否成功采集上来了:
上面这个指标是 Kubelet 自身的,即从 Kubelet 的/metrics
接口采集的,我们再来看一个 cAdvisor 的,即从/metrics/cadvisor
接口采集的:
看起来数据都上来了,导入监控大盘看看效果。
导入仪表盘
分成两部分,一个是 Kubelet 自身的仪表盘,JSON配置在这里,截图效果如下:
另外一个是Pod容器相关的大盘,JSON配置在这里(感谢张健老师悉心整理)
监控指标说明
之前孔飞老师整理的 Kubelet 相关指标的中文解释,我也一并附到这里,供大家参考:
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.# TYPE go_gc_duration_seconds summarygc的时间统计(summary指标)# HELP go_goroutines Number of goroutines that currently exist.# TYPE go_goroutines gaugegoroutine 数量# HELP go_threads Number of OS threads created.# TYPE go_threads gauge线程数量# HELP kubelet_cgroup_manager_duration_seconds [ALPHA] Duration in seconds for cgroup manager operations. Broken down by method.# TYPE kubelet_cgroup_manager_duration_seconds histogram操作cgroup的时长分布,按照操作类型统计# HELP kubelet_containers_per_pod_count [ALPHA] The number of containers per pod.# TYPE kubelet_containers_per_pod_count histogrampod中container数量的统计(spec.containers的数量)# HELP kubelet_docker_operations_duration_seconds [ALPHA] Latency in seconds of Docker operations. Broken down by operation type.# TYPE kubelet_docker_operations_duration_seconds histogram操作docker的时长分布,按照操作类型统计# HELP kubelet_docker_operations_errors_total [ALPHA] Cumulative number of Docker operation errors by operation type.# TYPE kubelet_docker_operations_errors_total counter操作docker的错误累计次数,按照操作类型统计# HELP kubelet_docker_operations_timeout_total [ALPHA] Cumulative number of Docker operation timeout by operation type.# TYPE kubelet_docker_operations_timeout_total counter操作docker的超时统计,按照操作类型统计# HELP kubelet_docker_operations_total [ALPHA] Cumulative number of Docker operations by operation type.# TYPE kubelet_docker_operations_total counter操作docker的累计次数,按照操作类型统计# HELP kubelet_eviction_stats_age_seconds [ALPHA] Time between when stats are collected, and when pod is evicted based on those stats by eviction signal# TYPE kubelet_eviction_stats_age_seconds histogram驱逐操作的时间分布,按照驱逐信号(原因)分类统计# HELP kubelet_evictions [ALPHA] Cumulative number of pod evictions by eviction signal# TYPE kubelet_evictions counter驱逐次数统计,按照驱逐信号(原因)统计# HELP kubelet_http_inflight_requests [ALPHA] Number of the inflight http requests# TYPE kubelet_http_inflight_requests gauge请求kubelet的inflight请求数,按照method path server_type统计, 注意与每秒的request数区别开# HELP kubelet_http_requests_duration_seconds [ALPHA] Duration in seconds to serve http requests# TYPE kubelet_http_requests_duration_seconds histogram请求kubelet的请求时间统计, 按照method path server_type统计# HELP kubelet_http_requests_total [ALPHA] Number of the http requests received since the server started# TYPE kubelet_http_requests_total counter请求kubelet的请求数统计,按照method path server_type统计# HELP kubelet_managed_ephemeral_containers [ALPHA] Current number of ephemeral containers in pods managed by this kubelet. Ephemeral containers will be ignored if disabled by the EphemeralContainers feature gate, and this number will be 0.# TYPE kubelet_managed_ephemeral_containers gauge当前kubelet管理的临时容器数量# HELP kubelet_network_plugin_operations_duration_seconds [ALPHA] Latency in seconds of network plugin operations. Broken down by operation type.# TYPE kubelet_network_plugin_operations_duration_seconds histogram网络插件的操作耗时分布 ,按照操作类型(operation_type)统计, 如果 --feature-gates=EphemeralContainers=false, 否则一直为0 # HELP kubelet_network_plugin_operations_errors_total [ALPHA] Cumulative number of network plugin operation errors by operation type.# TYPE kubelet_network_plugin_operations_errors_total counter网络插件累计操作错误数统计,按照操作类型(operation_type)统计# HELP kubelet_network_plugin_operations_total [ALPHA] Cumulative number of network plugin operations by operation type.# TYPE kubelet_network_plugin_operations_total counter网络插件累计操作数统计,按照操作类型(operation_type)统计# HELP kubelet_node_name [ALPHA] The node"s name. The count is always 1.# TYPE kubelet_node_name gaugenode name# HELP kubelet_pleg_discard_events [ALPHA] The number of discard events in PLEG.# TYPE kubelet_pleg_discard_events counterPLEG(pod lifecycle event generator) 丢弃的event数统计# HELP kubelet_pleg_last_seen_seconds [ALPHA] Timestamp in seconds when PLEG was last seen active.# TYPE kubelet_pleg_last_seen_seconds gaugePLEG上次活跃的时间戳# HELP kubelet_pleg_relist_duration_seconds [ALPHA] Duration in seconds for relisting pods in PLEG.# TYPE kubelet_pleg_relist_duration_seconds histogramPLEG relist pod时间分布 # HELP kubelet_pleg_relist_interval_seconds [ALPHA] Interval in seconds between relisting in PLEG.# TYPE kubelet_pleg_relist_interval_seconds histogramPLEG relist 间隔时间分布# HELP kubelet_pod_start_duration_seconds [ALPHA] Duration in seconds for a single pod to go from pending to running.# TYPE kubelet_pod_start_duration_seconds histogrampod启动时间(从pending到running)分布, kubelet watch到pod时到pod中contianer都running后, watch各种source channel的pod变更# HELP kubelet_pod_worker_duration_seconds [ALPHA] Duration in seconds to sync a single pod. Broken down by operation type: create, update, or sync# TYPE kubelet_pod_worker_duration_seconds histogrampod状态变化的时间分布, 按照操作类型(create update sync)统计, worker就是kubelet中处理一个pod的逻辑工作单位# HELP kubelet_pod_worker_start_duration_seconds [ALPHA] Duration in seconds from seeing a pod to starting a worker.# TYPE kubelet_pod_worker_start_duration_seconds histogramkubelet watch到pod到worker启动的时间分布# HELP kubelet_run_podsandbox_duration_seconds [ALPHA] Duration in seconds of the run_podsandbox operations. Broken down by RuntimeClass.Handler.# TYPE kubelet_run_podsandbox_duration_seconds histogram启动sandbox的时间分布# HELP kubelet_run_podsandbox_errors_total [ALPHA] Cumulative number of the run_podsandbox operation errors by RuntimeClass.Handler.# TYPE kubelet_run_podsandbox_errors_total counter启动sanbox出现error的总数 # HELP kubelet_running_containers [ALPHA] Number of containers currently running# TYPE kubelet_running_containers gauge当前containers运行状态的统计, 按照container状态统计,created running exited# HELP kubelet_running_pods [ALPHA] Number of pods that have a running pod sandbox# TYPE kubelet_running_pods gauge当前处于running状态pod数量# HELP kubelet_runtime_operations_duration_seconds [ALPHA] Duration in seconds of runtime operations. Broken down by operation type.# TYPE kubelet_runtime_operations_duration_seconds histogram容器运行时的操作耗时(container在create list exec remove stop等的耗时)# HELP kubelet_runtime_operations_errors_total [ALPHA] Cumulative number of runtime operation errors by operation type.# TYPE kubelet_runtime_operations_errors_total counter容器运行时的操作错误数统计(按操作类型统计)# HELP kubelet_runtime_operations_total [ALPHA] Cumulative number of runtime operations by operation type.# TYPE kubelet_runtime_operations_total counter容器运行时的操作总数统计(按操作类型统计)# HELP kubelet_started_containers_errors_total [ALPHA] Cumulative number of errors when starting containers# TYPE kubelet_started_containers_errors_total counterkubelet启动容器错误总数统计(按code和container_type统计)code包括ErrImagePull ErrImageInspect ErrImagePull ErrRegistryUnavailable ErrInvalidImageName等container_type一般为"container" "podsandbox"# HELP kubelet_started_containers_total [ALPHA] Cumulative number of containers started# TYPE kubelet_started_containers_total counterkubelet启动容器总数# HELP kubelet_started_pods_errors_total [ALPHA] Cumulative number of errors when starting pods# TYPE kubelet_started_pods_errors_total counterkubelet启动pod遇到的错误总数(只有创建sandbox遇到错误才会统计)# HELP kubelet_started_pods_total [ALPHA] Cumulative number of pods started# TYPE kubelet_started_pods_total counterkubelet启动的pod总数 # HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.# TYPE process_cpu_seconds_total counter统计cpu使用率# HELP process_max_fds Maximum number of open file descriptors.# TYPE process_max_fds gauge允许进程打开的最大fd数# HELP process_open_fds Number of open file descriptors.# TYPE process_open_fds gauge当前打开的fd数量# HELP process_resident_memory_bytes Resident memory size in bytes.# TYPE process_resident_memory_bytes gauge进程驻留内存大小# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.# TYPE process_start_time_seconds gauge进程启动时间# HELP rest_client_request_duration_seconds [ALPHA] Request latency in seconds. Broken down by verb and URL.# TYPE rest_client_request_duration_seconds histogram请求apiserver的耗时统计(按照url和请求类型统计verb)# HELP rest_client_requests_total [ALPHA] Number of HTTP requests, partitioned by status code, method, and host.# TYPE rest_client_requests_total counter请求apiserver的总次数(按照返回码code和请求类型method统计)# HELP storage_operation_duration_seconds [ALPHA] Storage operation duration# TYPE storage_operation_duration_seconds histogram存储操作耗时(按照存储plugin(configmap emptydir hostpath 等 )和operation_name分类统计)# HELP volume_manager_total_volumes [ALPHA] Number of volumes in Volume Manager# TYPE volume_manager_total_volumes gauge本机挂载的volume数量统计(按照plugin_name和state统计plugin_name包括"host-path" "empty-dir" "configmap" "projected")state(desired_state_of_world期状态/actual_state_of_world实际状态)
下面是 cAdvisor 指标梳理:
# HELP container_cpu_cfs_periods_total Number of elapsed enforcement period intervals.# TYPE container_cpu_cfs_periods_total countercfs时间片总数, 完全公平调度的时间片总数(分配到cpu的时间片数)# HELP container_cpu_cfs_throttled_periods_total Number of throttled period intervals.# TYPE container_cpu_cfs_throttled_periods_total counter容器被throttle的时间片总数# HELP container_cpu_cfs_throttled_seconds_total Total time duration the container has been throttled.# TYPE container_cpu_cfs_throttled_seconds_total counter容器被throttle的时间# HELP container_file_descriptors Number of open file descriptors for the container.# TYPE container_file_descriptors gauge容器打开的fd数# HELP container_memory_usage_bytes Current memory usage in bytes, including all memory regardless of when it was accessed# TYPE container_memory_usage_bytes gauge容器内存使用量,单位byte # HELP container_network_receive_bytes_total Cumulative count of bytes received# TYPE container_network_receive_bytes_total counter容器入方向的流量# HELP container_network_transmit_bytes_total Cumulative count of bytes transmitted# TYPE container_network_transmit_bytes_total counter容器出方向的流量# HELP container_spec_cpu_period CPU period of the container.# TYPE container_spec_cpu_period gauge容器的cpu调度单位时间# HELP container_spec_cpu_quota CPU quota of the container.# TYPE container_spec_cpu_quota gauge容器的cpu规格 ,除以单位调度时间可以计算核数# HELP container_spec_memory_limit_bytes Memory limit for the container.# TYPE container_spec_memory_limit_bytes gauge容器的内存规格,单位byte# HELP container_threads Number of threads running inside the container# TYPE container_threads gauge容器当前的线程数# HELP container_threads_max Maximum number of threads allowed inside the container, infinity if value is zero# TYPE container_threads_max gauge允许容器启动的最大线程数
相关文章
- Kubernetes监控手册01-体系介绍
- Kubernetes监控手册02-宿主监控概述
- Kubernetes监控手册03-宿主监控实操
- Kubernetes监控手册04-监控Kube-Proxy
关于作者
本文作者秦晓辉、孔飞,快猫星云监控技术爱好者,文章内容是快猫技术团队共同沉淀的结晶,作者做了编辑整理,我们会持续输出监控、稳定性保障相关的技术文章,文章可转载,转载请注明出处,尊重技术人员的成果。
时讯:Kubernetes监控手册05-监控Kubelet
袋鼠云产品功能更新报告03期丨产品体验全面优化,请查收!
对标苹果!三星推出27寸ViewFinity S9显示器:5K屏、全金属机身
当前报道:免费十年突然改版!游侠对战平台宣布每天只能免费一小时
焦点日报:价格下来了:东方基因快速检测试剂盒4.9元/份
天天微头条丨每5辆新车就有4辆电动车 挪威电动车市占率近80%全球第一高
灭菌独立装 袋鼠医生N95医用口罩30片45元大促
世界视点!从源代码构建TensorFlow流程记录
全球百事通!最大尺寸iPad被砍!曝苹果不打算在今年发14.1英寸iPad Pro
环球即时:比亚迪百万级越野车 仰望R1最新预告:首搭最具辨识度的尾灯
Firefox遭意外“误伤”:新版本被错误识别为IE 11
特斯拉全年销量破130万辆 还是全球纯电动车销冠
【全球快播报】如何不动声色的让身边的人知道你买RTX 4090了?
天天快讯:github上传本地代码到仓库教程
世界速看:联合三个国家尖端技术!38年后Intel重回老本行 让存储性能爆炸性提高
全球实时:西安高铁站私家车上站台接送乘客引争议:VIP贵宾服务 存在多年
2023年第一涨 国内油价今晚迎来调整 每升或上调0.2元
当前观点:你想成特斯拉车主吗?分析称Model 2快要来了 售价17万
世界热议:苹果中国工厂都要搬走?印度产iPhone加大 还要拿下iPad、Mac等
速看:AMD Zen4锐龙狂降价作用不大:德国电商销量不及Zen3五分之一
环球观察:抖音推出桌面端聊天软件抖音聊天:神似微信电脑版
天天百事通!微信支付之支付码支付
关注:学习Python第一个程序“Hello,World”
【新要闻】1.2 今天和对象吵架 但依旧学了代码
热文:谁说的不流畅?Steam玩家调查:Win11成香饽饽 NV四年前神卡1650领跑
新年暴击!超10家车企宣布涨价 特斯拉公开“唱反调”
三星将在CES 2023展示新款显示屏:折叠、滑动都支持
赚翻的节奏:美国超级百万彩票头奖升至54亿元 竟多次没人中奖
全球热门:女司机等红灯陌生男突然上车搭讪:拉车门坐到后排 被当场吓哭
环球关注:保研经历分享
环球消息!MySQL——事务
今亮点!特斯拉司机在德国高速公路上睡着!无视交警 结果“很刑”
世界讯息:2022年新生儿爆款名字出炉上热搜 这些字用得最多:土爆了?
全球微速讯:电商晒数据!4年卖出那么多块SSD 三星质量稳如狗:零返修
天天日报丨1.2复习了一下MySQL的索引
Codeforces Good Bye 2022 CF 1770 F Koxia and Sequence 题解
世界热头条丨WPF+ASP.NET SignalR实现动态折线图
世界头条:1. 线程管理基础
天天短讯!越南地产大佬都做出60万的车了 建议国内这位好好学
环球快资讯丨2023元旦档总票房5.47亿:《阿凡达2》撑起“半壁江山”
bbs项目(部分讲解)
天天通讯!手慢真无了!徐福记零食礼盒大促:39.9元到手 送礼有面
天天即时看!NVIDIA服软了!RTX 4070 Ti大量偷跑:价格低至6299元
环球速讯:国行6499元买到就是赚到?RTX 4070 Ti性能抢先看:对比4080
世界观焦点:解放生产力!一文教你快速入门正则表达式
全球快看:你肯定达标了?31省份最低工资表公布:上海第一 超2500元
热议:29岁男子身高2.89米成世界上最高的人:还在生长中!
JavaWeb学习笔记
焦点快看:因14 Plus销量太差 曝苹果将对iPhone 15 Plus价格大调整
环球头条:超14亿美元!《阿凡达2》票房达到回本线 你贡献了几张票?
世界快看:3G CDMA再见!美国运营商关闭 中国电信也快关闭了
新消息丨苹果中国不手软:iPhone 14之前机型保修费大涨 700元换块电池
即时焦点:tclap库的使用
美好午餐肉罐头促销:5盒不到20元 味道鲜美入口留香
天天新资讯:MySQL——基础架构
Python类与对象详解
【环球热闻】Python重用父类方法
Python类的封装教程
世界热门:电竞圈说的“满血”到底是啥 真能带来游戏体验升级?实测揭秘真相
女子花2700修热水器:成本只有13元
docker复杂安装
热点在线丨AcWing1170. 排队布局[USACO05]
当前速读:今天是卢伟冰入职小米4周年!卢伟冰想起4年前雷军说过的话
取暖电器火了!成为老外眼中的香饽饽
看点:docker安装软件
天天看点:根治安卓卡顿!一加11超21万人预约:本周发
蔚来2022年累计交付新车122486辆 明年目标超雷克萨斯
速讯:微软吹大了!Win11流畅度没那么神:Office/PS还没Win10快
实时焦点:女子跨年夜捡烟花废品赚552元 一晚上捡了920斤:网友叹服人勤劳怎么都赚钱
2023油价新年第一涨来了:我国汽油全面进入国六B时代 油品质量不缩水
全球微动态丨2022年第一票房是《壮志凌云2》 阿汤哥驾F14胜5代战机的爽片赢下全球影迷
全球播报:JDBC
Maven基础
【天天新要闻】Redmi 12C只卖699元:但支持双频WiFi 还有3.5mm耳机孔和TF卡槽
当前信息:三亚近期酒店房间几乎售罄:飞机上坐满了人
世界通讯!摄影师对小米13 Pro长焦刮目相看:原本不抱希望 但实际效果令人意外
世界消息!一加11本周发!李杰:一部你越用越爱、越用越惊喜的手机
实时焦点:甘肃省高速公路实施差异化收费 绿牌新能源最高能省50%过路费
焦点短讯!女子接触发霉玉米肺里长黄曲霉菌 没戴口罩防护:医生科普毒性超强
特斯拉美国遭起诉:违反劳动法
焦点简讯:1TB+ SSD固态盘继续便宜 用户消费不动:微软背锅
当前快讯:东北500米超长冰滑梯走红!世界最长“出溜滑”:1秒10米风驰电掣
动态焦点:一招轻松解决node内存溢出问题
环球视讯!2020年“老卡”姗姗来迟:映泰突然推出RTX 3080/3070非公显卡
世界聚焦:PG认证的注意事项 (这很重要!)
【全球聚看点】《原神》赚麻了!推出2年移动端收入超过275亿元
视讯!3人同时订酒店价格相差1倍 去哪儿回应:正常 因素有很多
观天下!Django中间件
环球看点!造车新势力第一!2022年埃安累计销量27.1万辆 明年冲击60万辆
热头条丨Django组件之form组件
全球观察:索尼音乐在B站传了17万个视频 网友:当免费服务器了
当前时讯:比亚迪技术加持 创维汽车2022年总销量21916台 暴增255.72%
天天日报丨复兴号新动车元旦首秀:红白灰绿四色机身成“网红”车型
N95有效防护时间为4到6小时:和戴普通口罩效果有什么区别?
Django组件之Ajax请求
视焦点讯!RTX 4070 Ti完整规格、性能曝光:4090一半性能、一半的价格
新年快乐!今天遇到一个问题:3个鸡蛋5元,6个鸡蛋多少钱?
iPhone 15 mini重出江湖?苹果把这几个功能加上卖爆
世界热文:蒙脱石散为啥突然火了?医生提醒:不要乱用 小心便秘
今日快讯:关于使用boto3方式获取AWS-所有EC2-机型及属性时所遇到的结果返回不全的坑点及使用分布机制的解决方法