kubernetes监控--Prometheus
发布日期:2025-04-03 14:23:50 浏览次数:10 分类:精选文章

本文共 6648 字,大约阅读时间需要 22 分钟。

基于 Kubernetes 1.5.2 的整容监控方案:从 Kubernets 到 Prometheus 与 Grafana

本文将介绍一个基于 Kubernetes 1.5.2 版本的完整容器监控方案,涵盖节点状态、内存、CPU、网络、存储等多方面的资源监控。将使用 Prometheus 作为监控工具,结合 Grafana 制作直观的监控仪表盘。以下是实现方案的详细步骤。


引入 Kub_state_metrics

Kub_state_metrics 是 Kubernetes 集群中一个关键的监控组件,用于收集和报告 Kubernetes 集群的状态信息。以下是安装步骤:

  • 创建 monitoring 命名空间:

    kubectl create namespace monitoring
  • 创建服务账户及其相关角色:

    kubectl create sa -n monitoringkubectl create rolebindings --role="read-only" --serviceaccount="something" -n monitoring
  • 部署 Kub_state_metrics 并创建服务:

    apiVersion: extensions/v1beta1kind: Deploymentmetadata:  name: kub-state-metrics  namespace: monitoringspec:  replicas: 1  template:    metadata:      labels:        app: kub-state-metrics    spec:      containers:      - name: kub-state-metrics        image: quay.io/coreos/kub-state-metrics:kubestate-1.5.2        ports:        - containerPort: 8080

    创建服务:

    apiVersion: v1kind: Servicemetadata:  name: kub-state-metrics  namespace: monitoring  annotations:    prometheus.io/scrape: "true"spec:  ports:  - name: kub-state-metrics    port: 8080    protocol: TCP  selector:    app: kub-state-metrics

  • 部署 Prometheus Sink(节点指标)

    为了监控节点资源使用情况,我们需要部署 Prometheus Sink(节点指标收集器:node-exporter)。以下是安装步骤:

  • 部署 Prometheus Sink:

    apiVersion: extensions/v1beta1kind: DaemonSetmetadata:  name: prometheus-node-exporter  namespace: monitoring  annotations:    prometheus.io/port: 9102spec:  template:    metadata:      name: prometheus-node-exporter      labels:        app: prometheus        component: node-exporter    spec:      containers:      - image: docker.io/prom/node-exporter:v0.14.0        name: prometheus-node-exporter        ports:        - name: prom-node-exp          containerPort: 9100          hostPort: 9100

    创建服务:

    apiVersion: v1kind: Servicemetadata:  name: prometheus-node-exporter  namespace: monitoring  annotations:    prometheus.io/scrape: "true"spec:  type: ClusterIP  ports:  - name: prometheus-node-exporter    port: 9100    protocol: TCP  selector:    app: prometheus    component: node-exporter

  • 定向监控节点磁盘空间使用情况

    为了监控节点磁盘使用情况,我们可以使用自定义 DaemonSet 工作流程:

  • 部署磁盘空间监控 DaemonSet:

    apiVersion: extensions/v1beta1kind: DaemonSetmetadata:  name: node-directory-size-metrics  namespace: monitoringspec:  template:    metadata:      labels:        app: node-directory-size-metrics    spec:      containers:      - name: read-du        image: giantswarm/tiny-tools        volumeMounts:        - name: host-fs-var            mountPath: /mnt/var            readOnly: true        - name: metrics            mountPath: /tmp      - name: caddy        image: dockermuenster/caddy:latest        ports:        - containerPort: 9102        volumeMounts:        - name: metrics          mountPath: /var/www

    创建服务:

    apiVersion: v1kind: Servicemetadata:  name: node-directory-size-metrics  namespace: monitoringspec:  type: NodePort  ports:  - name: metrics    port: 9102    protocol: TCP

  • Prometheus 配置和部署

  • 部署 Prometheus 服务器:

    apiVersion: v1kind: ConfigMapmetadata:  name: prometheus-core  namespace: monitoringdata:  prometheus.yaml: |    global:      scrape_interval: 30s      scrape_timeout: 30s      evaluation_interval: 30s    rule_files:    - /etc/prometheus-rules/*.rules    scrape_configs:    - job_name: kubernetes-nodes      kubernetes_sd_configs:      - role: node      relabel_configs:      - source_labels: [__address__]          regex: '(.*):10250'          replacement: '${1}:10255'          target_label: __address__    - job_name: kubernetes-endpoints      kubernetes_sd_configs:      - role: endpoints      relabel_configs:      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]        action: keep        regex: true      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]        action: replace        target_label: __metrics_path__        regex: (.+)      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]        action: replace        target_label: __scheme__        regex: (https?)
  • 部署 Prometheus 规则:

    apiVersion: v1kind: ConfigMapmetadata:  name: prometheus-rules  namespace: monitoringdata:  cpu-usage.rules: |    ALERT NodeCPUUsage    IF (100 - (avg by (instance) (irate(node_cpu{name="node-exporter",mode="idle"}[5m])) * 100)) > 75    FOR 2m    LABELS { severity = "page" }    ANNOTATIONS {      SUMMARY = "{instance}: High CPU usage detected"      DESCRIPTION = "{instance}: CPU usage is above 75% (current value is: {value})"    }  ...
  • 部署 Prometheus 服务:

    apiVersion: extensions/v1beta1kind: Deploymentmetadata:  name: prometheus-core  namespace: monitoringspec:  replicas: 1  template:    metadata:      name: prometheus-main      labels:        app: prometheus        component: core    spec:      serviceAccountName: prometheus-k8s      containers:      - name: prometheus        image: prom/prometheus:v1.7.1        args:        - '-storage.local.retention=12h'        - '-config.file=/etc/prometheus/prometheus.yaml'        - '-alertmanager.url=http://alertmanager:9093/'        ports:        - name: webui          containerPort: 9090

    创建服务:

    apiVersion: v1kind: Servicemetadata:  name: prometheus  namespace: monitoring  labels:    app: prometheus    component: coreannotations:  prometheus.io/scrape: "true"spec:  type: NodePort  ports:  - name: webui    port: 9090

  • Grafana 部署与配置

  • 部署 Grafana:

    apiVersion: extensions/v1beta1kind: Deploymentmetadata:  name: grafana-core  namespace: monitoringspec:  replicas: 1  template:    metadata:      labels:        app: grafana        component: core    spec:      containers:      - name: grafana-core        image: docker.io/grafana/grafana:latest        ports:        - name: grafana          containerPort: 3000        env:        - name: GF_AUTH_ANONYMOUS_ENABLED          value: "true"        - name: GF_AUTH_ANONYMOUS_ORG_ROLE          value: Admin

    创建服务:

    apiVersion: v1kind: Servicemetadata:  name: grafana  namespace: monitoring  labels:    app: grafana    component: corespec:  type: NodePort  ports:  - name: grafana    nodePort: 31000
  • 使用 Grafana 模板创建监控仪表盘:

    以下是 Grafana 模板的简要说明:

    {  "annotations": {    "list": []  },  "editable": true,  "graphTooltip": 0,  "id": 21,  "rows": [     {      " Panels": [        {          " Targets": [            {              " expr": "sum(container_memory_usage_bytes{pod_name=\"$pod\", namespace=\"$namespace\"}) by (namespace, pod_name)"              " format": "time_series"              " interval": "30s"              " legendFormat": "total"            },            ....          ]        },        ...      ]    }  ],  "repeat": null}

    可以通过 Grafana UI 直接导入此模板。


  • 总结

    通过以上步骤,我们成功构建了一个完整的 Kubernetes 集群监控方案,涵盖了节点、磁盘、网络等多个维度的资源监控,并通过 Prometheus 和 Grafana 打造了一套高效的监控和可视化平台。

    上一篇:kubernetes社区项目生态概览
    下一篇:kubernetes的概念介绍_服务发现负载均衡_存储编排_自动部署和回滚_自动完成装箱计算_自我修复_集群的方式_架构原理---分布式云原生部署架构搭建013

    发表评论

    最新留言

    留言是一种美德,欢迎回访!
    [***.207.175.100]2025年05月01日 07时28分02秒

    关于作者

        喝酒易醉,品茶养心,人生如梦,品茶悟道,何以解忧?唯有杜康!
    -- 愿君每日到此一游!

    推荐文章

    KubeSphere容器平台本地部署并实现无公网IP远程监控集群 2025-04-03
    KubeSphere核心实战_kubesphere多租户_添加企业空间_创建项目_给项目邀请成员---分布式云原生部署架构搭建042 2025-04-03
    KubeSphere核心实战_KubeSphere平台安装_在kubernetes上安装kubesphere_安装k8s集群_加入worker节点---分布式云原生部署架构搭建035 2025-04-03
    KubeSphere核心实战_KubeSphere平台安装_简介_升级配置与重置系统_在kubernetes上安装kubesphere_安装k8s集群_基础环境---分布式云原生部署架构搭建034 2025-04-03
    KubeSphere核心实战_kubesphere部署redis02_创建redis现指定存储卷_配置外网访问服务---分布式云原生部署架构搭建048 2025-04-03
    KubeSphere核心实战_在Centos7.9/linux单节点使用kubekey一键安装完整平台_启用插件_一键安装docker_k8s_kubesphere---分布式云原生部署架构搭建038 2025-04-03
    KubeSphere核心实战_安装默认存储类型_实现pv和pvc存储空间动态创建_安装Metrics-server_动态监控集群以及pod_内存及cpu资源占用情况---分布式云原生部署架构搭建036 2025-04-03
    KuiperInfer深度学习推理框架-源码阅读和二次开发(3):计算图 2025-04-03
    KVM 存储配置与管理详解 2025-04-03
    KVM 安全策略配置实战 2025-04-03
    KVM 性能测试优化实战 2025-04-03
    KVM 硬件平台适配 2025-04-03
    KVM克隆虚拟机和libguestfs-tools管理工具(3) 2025-04-03
    KVM命令行管理企业级实战 2025-04-03
    kvm虚拟化中用增量镜像创建vm的脚本(已测OK) 2025-04-03
    KVM虚拟化(一)—— 介绍与简单使用 2025-04-03
    KVM迁移与维护实战 2025-04-03
    KxMenu下拉菜单 2025-04-03
    KXML2部分详解(J2ME) 2025-04-03
    KXML解释本地或网络上的XML文件 2025-04-03