
kubernetes监控--Prometheus
发布日期:2025-04-03 14:23:50
浏览次数:10
分类:精选文章
本文共 6648 字,大约阅读时间需要 22 分钟。
基于 Kubernetes 1.5.2 的整容监控方案:从 Kubernets 到 Prometheus 与 Grafana
本文将介绍一个基于 Kubernetes 1.5.2 版本的完整容器监控方案,涵盖节点状态、内存、CPU、网络、存储等多方面的资源监控。将使用 Prometheus 作为监控工具,结合 Grafana 制作直观的监控仪表盘。以下是实现方案的详细步骤。
引入 Kub_state_metrics
Kub_state_metrics 是 Kubernetes 集群中一个关键的监控组件,用于收集和报告 Kubernetes 集群的状态信息。以下是安装步骤:
创建 monitoring 命名空间:
kubectl create namespace monitoring
创建服务账户及其相关角色:
kubectl create sa -n monitoringkubectl create rolebindings --role="read-only" --serviceaccount="something" -n monitoring
部署 Kub_state_metrics 并创建服务:
apiVersion: extensions/v1beta1kind: Deploymentmetadata: name: kub-state-metrics namespace: monitoringspec: replicas: 1 template: metadata: labels: app: kub-state-metrics spec: containers: - name: kub-state-metrics image: quay.io/coreos/kub-state-metrics:kubestate-1.5.2 ports: - containerPort: 8080
创建服务:
apiVersion: v1kind: Servicemetadata: name: kub-state-metrics namespace: monitoring annotations: prometheus.io/scrape: "true"spec: ports: - name: kub-state-metrics port: 8080 protocol: TCP selector: app: kub-state-metrics
部署 Prometheus Sink(节点指标)
为了监控节点资源使用情况,我们需要部署 Prometheus Sink(节点指标收集器:node-exporter)。以下是安装步骤:
部署 Prometheus Sink:
apiVersion: extensions/v1beta1kind: DaemonSetmetadata: name: prometheus-node-exporter namespace: monitoring annotations: prometheus.io/port: 9102spec: template: metadata: name: prometheus-node-exporter labels: app: prometheus component: node-exporter spec: containers: - image: docker.io/prom/node-exporter:v0.14.0 name: prometheus-node-exporter ports: - name: prom-node-exp containerPort: 9100 hostPort: 9100
创建服务:
apiVersion: v1kind: Servicemetadata: name: prometheus-node-exporter namespace: monitoring annotations: prometheus.io/scrape: "true"spec: type: ClusterIP ports: - name: prometheus-node-exporter port: 9100 protocol: TCP selector: app: prometheus component: node-exporter
定向监控节点磁盘空间使用情况
为了监控节点磁盘使用情况,我们可以使用自定义 DaemonSet 工作流程:
部署磁盘空间监控 DaemonSet:
apiVersion: extensions/v1beta1kind: DaemonSetmetadata: name: node-directory-size-metrics namespace: monitoringspec: template: metadata: labels: app: node-directory-size-metrics spec: containers: - name: read-du image: giantswarm/tiny-tools volumeMounts: - name: host-fs-var mountPath: /mnt/var readOnly: true - name: metrics mountPath: /tmp - name: caddy image: dockermuenster/caddy:latest ports: - containerPort: 9102 volumeMounts: - name: metrics mountPath: /var/www
创建服务:
apiVersion: v1kind: Servicemetadata: name: node-directory-size-metrics namespace: monitoringspec: type: NodePort ports: - name: metrics port: 9102 protocol: TCP
Prometheus 配置和部署
部署 Prometheus 服务器:
apiVersion: v1kind: ConfigMapmetadata: name: prometheus-core namespace: monitoringdata: prometheus.yaml: | global: scrape_interval: 30s scrape_timeout: 30s evaluation_interval: 30s rule_files: - /etc/prometheus-rules/*.rules scrape_configs: - job_name: kubernetes-nodes kubernetes_sd_configs: - role: node relabel_configs: - source_labels: [__address__] regex: '(.*):10250' replacement: '${1}:10255' target_label: __address__ - job_name: kubernetes-endpoints kubernetes_sd_configs: - role: endpoints relabel_configs: - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme] action: replace target_label: __scheme__ regex: (https?)
部署 Prometheus 规则:
apiVersion: v1kind: ConfigMapmetadata: name: prometheus-rules namespace: monitoringdata: cpu-usage.rules: | ALERT NodeCPUUsage IF (100 - (avg by (instance) (irate(node_cpu{name="node-exporter",mode="idle"}[5m])) * 100)) > 75 FOR 2m LABELS { severity = "page" } ANNOTATIONS { SUMMARY = "{instance}: High CPU usage detected" DESCRIPTION = "{instance}: CPU usage is above 75% (current value is: {value})" } ...
部署 Prometheus 服务:
apiVersion: extensions/v1beta1kind: Deploymentmetadata: name: prometheus-core namespace: monitoringspec: replicas: 1 template: metadata: name: prometheus-main labels: app: prometheus component: core spec: serviceAccountName: prometheus-k8s containers: - name: prometheus image: prom/prometheus:v1.7.1 args: - '-storage.local.retention=12h' - '-config.file=/etc/prometheus/prometheus.yaml' - '-alertmanager.url=http://alertmanager:9093/' ports: - name: webui containerPort: 9090
创建服务:
apiVersion: v1kind: Servicemetadata: name: prometheus namespace: monitoring labels: app: prometheus component: coreannotations: prometheus.io/scrape: "true"spec: type: NodePort ports: - name: webui port: 9090
Grafana 部署与配置
部署 Grafana:
apiVersion: extensions/v1beta1kind: Deploymentmetadata: name: grafana-core namespace: monitoringspec: replicas: 1 template: metadata: labels: app: grafana component: core spec: containers: - name: grafana-core image: docker.io/grafana/grafana:latest ports: - name: grafana containerPort: 3000 env: - name: GF_AUTH_ANONYMOUS_ENABLED value: "true" - name: GF_AUTH_ANONYMOUS_ORG_ROLE value: Admin
创建服务:
apiVersion: v1kind: Servicemetadata: name: grafana namespace: monitoring labels: app: grafana component: corespec: type: NodePort ports: - name: grafana nodePort: 31000
使用 Grafana 模板创建监控仪表盘:
以下是 Grafana 模板的简要说明:
{ "annotations": { "list": [] }, "editable": true, "graphTooltip": 0, "id": 21, "rows": [ { " Panels": [ { " Targets": [ { " expr": "sum(container_memory_usage_bytes{pod_name=\"$pod\", namespace=\"$namespace\"}) by (namespace, pod_name)" " format": "time_series" " interval": "30s" " legendFormat": "total" }, .... ] }, ... ] } ], "repeat": null}
可以通过 Grafana UI 直接导入此模板。
总结
通过以上步骤,我们成功构建了一个完整的 Kubernetes 集群监控方案,涵盖了节点、磁盘、网络等多个维度的资源监控,并通过 Prometheus 和 Grafana 打造了一套高效的监控和可视化平台。
发表评论
最新留言
留言是一种美德,欢迎回访!
[***.207.175.100]2025年05月01日 07时28分02秒
关于作者

喝酒易醉,品茶养心,人生如梦,品茶悟道,何以解忧?唯有杜康!
-- 愿君每日到此一游!
推荐文章
KubeSphere容器平台本地部署并实现无公网IP远程监控集群
2025-04-03
KubeSphere核心实战_KubeSphere平台安装_在kubernetes上安装kubesphere_安装k8s集群_加入worker节点---分布式云原生部署架构搭建035
2025-04-03
KubeSphere核心实战_KubeSphere平台安装_简介_升级配置与重置系统_在kubernetes上安装kubesphere_安装k8s集群_基础环境---分布式云原生部署架构搭建034
2025-04-03
KubeSphere核心实战_在Centos7.9/linux单节点使用kubekey一键安装完整平台_启用插件_一键安装docker_k8s_kubesphere---分布式云原生部署架构搭建038
2025-04-03
KubeSphere核心实战_安装默认存储类型_实现pv和pvc存储空间动态创建_安装Metrics-server_动态监控集群以及pod_内存及cpu资源占用情况---分布式云原生部署架构搭建036
2025-04-03
KuiperInfer深度学习推理框架-源码阅读和二次开发(3):计算图
2025-04-03
KVM 存储配置与管理详解
2025-04-03
KVM 安全策略配置实战
2025-04-03
KVM 性能测试优化实战
2025-04-03
KVM 硬件平台适配
2025-04-03
KVM克隆虚拟机和libguestfs-tools管理工具(3)
2025-04-03
KVM命令行管理企业级实战
2025-04-03
kvm虚拟化中用增量镜像创建vm的脚本(已测OK)
2025-04-03
KVM虚拟化(一)—— 介绍与简单使用
2025-04-03
KVM迁移与维护实战
2025-04-03
KxMenu下拉菜单
2025-04-03
KXML2部分详解(J2ME)
2025-04-03
KXML解释本地或网络上的XML文件
2025-04-03