Kubernetes/Helm/N9e对接Kube-Prometheus-Stack.md
offends 7a2f41e7d6
All checks were successful
continuous-integration/drone Build is passing
synchronization
2024-08-07 18:54:39 +08:00

166 lines
4.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

> 本文作者:丁辉
# N9e对接Kube-Prometheus-Stack
## 更新Kube-Prometheus-Stack
1. 编写 values.yaml
```bash
vi kube-prometheus-stack-values.yaml
```
2. 内容如下
```yaml
prometheusOperator:
admissionWebhooks:
patch:
enabled: true
image:
registry: registry.aliyuncs.com # 配置国内镜像加速
repository: google_containers/kube-webhook-certgen
grafana:
enabled: false
alertmanager:
enabled: false
defaultRules:
create: false
# 这些设置表明所提及的选择器规则、服务监视器、Pod 监视器和抓取配置)将具有独立的配置,而不会基于 Helm 图形值。(否则你的 ServiceMonitor 可能不会被自动发现)
prometheus:
prometheusSpec:
ruleSelectorNilUsesHelmValues: false
serviceMonitorSelectorNilUsesHelmValues: false
podMonitorSelectorNilUsesHelmValues: false
probeSelectorNilUsesHelmValues: false
scrapeConfigSelectorNilUsesHelmValues: false
# 服务器上启用 --web.enable-remote-write-receiver 标志
enableRemoteWriteReceiver: true
# 启用 Prometheus 中被禁用的特性
enableFeatures:
- remote-write-receiver
# 挂载持久化存储
storageSpec:
volumeClaimTemplate:
spec:
# 选择默认的 sc 创建存储(我已在集群内准备 nfs-client
storageClassName:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi # 根据自己的需求申请 pvc 大小
# 挂载本地时区
volumes:
- name: timezone
hostPath:
path: /usr/share/zoneinfo/Asia/Shanghai
volumeMounts:
- name: timezone
mountPath: /etc/localtime
readOnly: true
```
3. 更新
```bash
helm upgrade kube-prometheus-stack -f kube-prometheus-stack-values.yaml --set "kube-state-metrics.image.registry=k8s.dockerproxy.com" prometheus-community/kube-prometheus-stack -n monitor
```
## 更新N9e
1. 获取 nightingale-center svc
```bash
kubectl get svc nightingale-center -n monitor | grep -v NAME | awk '{print $3}'
```
2. 编写 values.yaml
```bash
vi n9e-values.yaml
```
内容如下
```yaml
expose:
type: clusterIP # 使用 clusterIP
externalURL: https://hello.n9e.info # 改为自己的外部服务访问地址
persistence:
enabled: true
categraf:
internal:
docker_socket: unix:///var/run/docker.sock # 如果您的kubernetes运行时是容器或其他则清空此变量。
n9e:
internal:
image:
repository: flashcatcloud/nightingale
tag: latest # 使用最新版镜像
prometheus:
type: external
external:
host: "10.43.119.105" # 这里添加 nightingale-center svc
port: "9090"
username: ""
password: ""
podAnnotations: {}
```
3. 更新
```bash
helm upgrade nightingale ./n9e-helm -n monitor -f n9e-values.yaml
```
4. 编写 ServiceMonitor
```bash
vi n9e-servicemonitor.yaml
```
内容如下
```yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: n9e-center-monitor
namespace: monitor
spec:
endpoints:
- path: /metrics
port: port
namespaceSelector:
matchNames:
- monitor
selector:
matchLabels:
app: n9e
```
5. 部署
```bash
kubectl apply -f n9e-servicemonitor.yaml
```
6. N9e 添加数据源
```bash
http://kube-prometheus-stack-prometheus:9090/
```
## 问题记录
> ```
> WARNING writer/writer.go:129 push data with remote write:http://10.43.119.105:9090/api/v1/write request got status code: 400, response body: out of order sample
> WARNING writer/writer.go:79 post to http://10.43.119.105:9090/api/v1/write got error: push data with remote write:http://10.43.119.105:9090/api/v1/write request got status code: 400, response body: out of order sample
> ```
>
> 上报数据 400, 暂时没有思路咋解决