Kubernetes/Helm/N9e对接Kube-Prometheus-Stack.md
offends 7a2f41e7d6
All checks were successful
continuous-integration/drone Build is passing
synchronization
2024-08-07 18:54:39 +08:00

4.1 KiB
Raw Blame History

本文作者:丁辉

N9e对接Kube-Prometheus-Stack

更新Kube-Prometheus-Stack

  1. 编写 values.yaml

    vi kube-prometheus-stack-values.yaml
    
  2. 内容如下

    prometheusOperator:
      admissionWebhooks:
        patch:
            enabled: true
            image:
              registry: registry.aliyuncs.com # 配置国内镜像加速
              repository: google_containers/kube-webhook-certgen
    grafana:
      enabled: false
    alertmanager:
      enabled: false
    defaultRules:
      create: false
    # 这些设置表明所提及的选择器规则、服务监视器、Pod 监视器和抓取配置)将具有独立的配置,而不会基于 Helm 图形值。(否则你的 ServiceMonitor 可能不会被自动发现)
    prometheus:
      prometheusSpec:
        ruleSelectorNilUsesHelmValues: false
        serviceMonitorSelectorNilUsesHelmValues: false
        podMonitorSelectorNilUsesHelmValues: false
        probeSelectorNilUsesHelmValues: false
        scrapeConfigSelectorNilUsesHelmValues: false
        # 服务器上启用 --web.enable-remote-write-receiver 标志
        enableRemoteWriteReceiver: true
        # 启用 Prometheus 中被禁用的特性
        enableFeatures:
        - remote-write-receiver
        # 挂载持久化存储
        storageSpec:
         volumeClaimTemplate:
           spec:
             # 选择默认的 sc 创建存储(我已在集群内准备 nfs-client
             storageClassName:
             accessModes: ["ReadWriteOnce"]
             resources:
               requests:
                 storage: 10Gi # 根据自己的需求申请 pvc 大小
        # 挂载本地时区
        volumes:
          - name: timezone
            hostPath:
              path: /usr/share/zoneinfo/Asia/Shanghai
        volumeMounts:
          - name: timezone
            mountPath: /etc/localtime
            readOnly: true
    
  3. 更新

    helm upgrade kube-prometheus-stack -f kube-prometheus-stack-values.yaml --set "kube-state-metrics.image.registry=k8s.dockerproxy.com" prometheus-community/kube-prometheus-stack -n monitor
    

更新N9e

  1. 获取 nightingale-center svc

    kubectl get svc nightingale-center -n monitor | grep -v NAME | awk '{print $3}'
    
  2. 编写 values.yaml

    vi n9e-values.yaml
    

    内容如下

    expose:
      type: clusterIP # 使用 clusterIP
    
    externalURL: https://hello.n9e.info # 改为自己的外部服务访问地址
    
    persistence:
      enabled: true
    
    categraf:
      internal:
        docker_socket: unix:///var/run/docker.sock # 如果您的kubernetes运行时是容器或其他则清空此变量。
    
    n9e:
      internal:
        image:
          repository: flashcatcloud/nightingale
          tag: latest # 使用最新版镜像
    
    prometheus:
      type: external
      external:
        host: "10.43.119.105" # 这里添加 nightingale-center svc
        port: "9090"
        username: ""
        password: ""
      podAnnotations: {}
    
  3. 更新

    helm upgrade nightingale ./n9e-helm -n monitor -f n9e-values.yaml
    
  4. 编写 ServiceMonitor

    vi n9e-servicemonitor.yaml
    

    内容如下

    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      name: n9e-center-monitor
      namespace: monitor
    spec:
      endpoints:
      - path: /metrics
        port: port
      namespaceSelector:
        matchNames:
        - monitor
      selector:
        matchLabels:
          app: n9e
    
  5. 部署

    kubectl apply -f n9e-servicemonitor.yaml
    
  6. N9e 添加数据源

    http://kube-prometheus-stack-prometheus:9090/
    

问题记录

WARNING writer/writer.go:129 push data with remote write:http://10.43.119.105:9090/api/v1/write request got status code: 400, response body: out of order sample
WARNING writer/writer.go:79 post to http://10.43.119.105:9090/api/v1/write got error: push data with remote write:http://10.43.119.105:9090/api/v1/write request got status code: 400, response body: out of order sample

上报数据 400, 暂时没有思路咋解决