Kubernetes/部署文档/Rancher/Rke2部署Kubernetes集群.md
offends 7a2f41e7d6
All checks were successful
continuous-integration/drone Build is passing
synchronization
2024-08-07 18:54:39 +08:00

476 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

> 本文作者:丁辉
[Rke2文档](https://docs.rke2.io/)
[Rancher中文文档](https://docs.rancher.cn/)
# Rke2部署Kubernetes集群
| 节点名称 | IP | Kubernetes角色 |
| :----------: | :----------: | :----------------------------------------: |
| k8s-master-1 | 192.168.1.10 | Controlplane,etcd,worker,keepalived-master |
| k8s-master-2 | 192.168.1.20 | Controlplane,etcd,worker,keepalived-backup |
| k8s-master-3 | 192.168.1.30 | controlplane,etcd,worker,keepalived-backup |
| k8s-worker-1 | 192.168.1.40 | worker |
> Master节点VIP: 192.168.1.100
## 环境准备
> !!!每次部署都写挺麻烦的索性都放在一个文件内了请查看 [Kubernetes基础环境准备](https://gitee.com/offends/Kubernetes/blob/main/%E9%83%A8%E7%BD%B2%E6%96%87%E6%A1%A3/Kubernetes%E5%9F%BA%E7%A1%80%E7%8E%AF%E5%A2%83%E5%87%86%E5%A4%87.md) ,请按照此文档初始化环境
### 所有节点执行
1. 更改主机名
- 192.168.1.10
```bash
hostnamectl set-hostname k8s-master-1 && bash
```
- 192.168.1.20
```bash
hostnamectl set-hostname k8s-master-2 && bash
```
- 192.168.1.30
```bash
hostnamectl set-hostname k8s-master-3 && bash
```
- 192.168.1.40
```bash
hostnamectl set-hostname k8s-node-1 && bash
```
2. 编辑 /etc/hosts 文件
```bash
vi /etc/hosts
```
添加如下内容
```bash
192.168.1.10 k8s-master-1
192.168.1.20 k8s-master-2
192.168.1.30 k8s-master-3
192.168.1.40 k8s-node-1
```
3. 在三台节点上配置 NetworkManager
- 配置 cali 和 flannel 的网卡不被 NetworkManager 管理
```bash
mkdir -p /etc/NetworkManager/conf.d
```
内容如下
```bash
cat <<EOF > /etc/NetworkManager/conf.d/rke2-canal.conf
[keyfile]
unmanaged-devices=interface-name:cali*;interface-name:flannel*
EOF
```
- 重启 NetworkManager
```bash
systemctl daemon-reload
systemctl restart NetworkManager
```
### 安装主节点
[Rke2-Github-releases](https://github.com/rancher/rke2/releases)
1. 安装 RKE2
```bash
curl -sfL https://get.rke2.io | sh -
```
> - 使用国内源
>
> ```bash
> curl -sfL http://rancher-mirror.rancher.cn/rke2/install.sh | INSTALL_RKE2_MIRROR=cn INSTALL_RKE2_TYPE="server" sh -
> ```
>
> - 指定版本
>
> ```bash
> curl -sfL https://rancher-mirror.rancher.cn/rke2/install.sh | INSTALL_RKE2_MIRROR=cn INSTALL_RKE2_TYPE="server" INSTALL_RKE2_VERSION="v1.29.3+rke2r1" sh -
> ```
2. 创建 RKE2 配置文件
[Server 配置参考](https://docs.rke2.io/zh/reference/server_config)
[高级选项和配置](https://docs.rke2.io/zh/advanced)
```bash
mkdir -p /etc/rancher/rke2/
vi /etc/rancher/rke2/config.yaml
```
内容如下
```yaml
#server: "https://192.168.1.100:9345" # 全部 Master 启动后解除注释, 重启服务"
# 创建 token
token: rke2-create-token
# 负载均衡统一入口 IP 或 域名
tls-san:
- "192.168.1.100"
# 阿里镜像源加速, 通常由社区志愿者维护, 镜像同步通常存在滞后性
#system-default-registry: "registry.cn-hangzhou.aliyuncs.com"
# 节点 NAME
node-name: k8s-master-1 # 与当前主机名保持一致
# 节点污点, 禁止 master 节点运行容器
node-taint:
- "CriticalAddonsOnly=true:NoExecute"
disable: # 取消安装 Rke2 默认安装 Charts
- "rke2-ingress-nginx"
- "rke2-metrics-server"
#### 网络配置
# 指定网络模式, [ ipvs , iptables ] 默认是:iptables
kube-proxy-arg:
- "proxy-mode=iptables"
# Kubernetes 集群域名
cluster-domain: "cluster.local"
# 指定要部署的 CNIContainer Network Interface插件[ none , calico , flannel , canal , cilium ] 默认: canal
cni: "canal"
# 指定 Pod IP 的 IPv4/IPv6 网络 CIDR
cluster-cidr: "10.42.0.0/16"
# 指定 Service IP 的 IPv4/IPv6 网络 CIDR
service-cidr: "10.43.0.0/16"
# 指定用于具有 NodePort 访问权限的 Service 的端口范围
service-node-port-range: "30000-32767"
#### ETCD存储配置
# 快照备份时间
etcd-snapshot-schedule-cron: "0 */12 * * *"
# 快照文件保留个数
etcd-snapshot-retention: "10"
# 快照存储目录, 默认位置 /var/lib/rancher/rke2/server/db/snapshots
etcd-snapshot-dir: "${data-dir}/db/snapshots"
#### 存储目录配置
# kube-config 文件位置
write-kubeconfig: "/root/.kube/config"
# kube-config 文件权限
write-kubeconfig-mode: "0644"
# Rke2文件存储目录
data-dir: "/var/lib/rancher/rke2"
```
> 其他参数配置
>
> ```yaml
> # 自定义垃圾回收机制
> kubelet-arg:
> # 设置硬性回收阈值,当节点的文件系统可用空间低于 10% 或内存可用空间低于 2048Mi 时kubelet 将触发硬性回收,即强制驱逐 Pod 以释放资源
> - "eviction-hard=nodefs.available<10%,memory.available<2048Mi"
> # 置软性回收的宽限期当节点的文件系统可用空间或镜像文件系统可用空间低于一定阈值时kubelet 将在触发硬性回收之前等待 30 秒
> - "eviction-soft-grace-period=nodefs.available=30s,imagefs.available=30s"
> # 设置软性回收的阈值,当节点的文件系统可用空间低于 10% 或镜像文件系统可用空间低于 10% 时kubelet 将触发软性回收,尝试释放资源
> - "eviction-soft=nodefs.available<10%,imagefs.available<10%"
>
> kube-controller-manager-extra-env:
> # 设置 Kubernetes 集群签名证书的路径
> - "cluster-signing-cert-file=/etc/kubernetes/ssl/kube-ca.pem"
> # 设置 Kubernetes 集群签名密钥的路径
> - "cluster-signing-key-file=/etc/kubernetes/ssl/kube-ca-key.pem"
>
> kube-apiserver-extra-env:
> # apiserver启用metadata.selfLink 字段
> - "feature-gates='RemoveSelfLink=false'"
> ```
3. 开始部署主节点
```bash
systemctl enable rke2-server.service
systemctl start rke2-server.service
```
> 启动失败查看日志
>
> ```bash
> rke2 server --config /etc/rancher/rke2/config.yaml --debug
> ```
4. 配置 RKE2 可执行文件加入到系统的 PATH 中
```bash
echo "export PATH=$PATH:/var/lib/rancher/rke2/bin" >> /etc/profile && source /etc/profile
```
5. 验证
```bash
kubectl get node
```
6. 配置 crictl 软链接
```bash
ln -s /var/lib/rancher/rke2/agent/etc/crictl.yaml /etc/crictl.yaml
```
7. 验证
```bash
crictl ps
```
10. 查看集群 Token
```bash
cat /var/lib/rancher/rke2/server/node-token
```
## 添加管理节点[2,3节点同理]
1. 安装 RKE2
```bash
curl -sfL https://get.rke2.io | sh -
```
> - 使用国内源
>
> ```bash
> curl -sfL http://rancher-mirror.rancher.cn/rke2/install.sh | INSTALL_RKE2_MIRROR=cn INSTALL_RKE2_TYPE="server" sh -
> ```
>
> - 指定版本
>
> ```bash
> curl -sfL https://rancher-mirror.rancher.cn/rke2/install.sh | INSTALL_RKE2_MIRROR=cn INSTALL_RKE2_TYPE="server" INSTALL_RKE2_CHANNEL=v1.20 sh -
> ```
2. 创建 RKE2 配置文件
```bash
mkdir -p /etc/rancher/rke2/
vi /etc/rancher/rke2/config.yaml
```
内容如下
```yaml
# 指定要连接的集群服务器地址
server: https://192.168.1.100:9345
# Master 节点 token
token: <token> #主节点获取的token值
# 负载均衡统一入口 IP 或 域名
tls-san:
- "192.168.1.100"
# 阿里镜像源加速, 通常由社区志愿者维护, 镜像同步通常存在滞后性
#system-default-registry: "registry.cn-hangzhou.aliyuncs.com"
# 节点 NAME
node-name: k8s-master-2 # 与当前主机名保持一致
# 节点污点, 禁止 master 节点运行容器
node-taint:
- "CriticalAddonsOnly=true:NoExecute"
disable: # 取消安装 Rke2 默认安装 Charts
- "rke2-ingress-nginx"
- "rke2-metrics-server"
#### 网络配置
# 指定网络模式, [ ipvs , iptables ] 默认是:iptables
kube-proxy-arg:
- "proxy-mode=iptables"
# Kubernetes 集群域名
cluster-domain: "cluster.local"
# 指定要部署的 CNIContainer Network Interface插件[ none , calico , flannel , canal , cilium ] 默认: canal
cni: "canal"
# 指定 Pod IP 的 IPv4/IPv6 网络 CIDR
cluster-cidr: "10.42.0.0/16"
# 指定 Service IP 的 IPv4/IPv6 网络 CIDR
service-cidr: "10.43.0.0/16"
# 指定用于具有 NodePort 访问权限的 Service 的端口范围
service-node-port-range: "30000-32767"
#### ETCD存储配置
# 快照备份时间
etcd-snapshot-schedule-cron: "0 */12 * * *"
# 快照文件保留个数
etcd-snapshot-retention: "10"
# 快照存储目录, 默认位置 /var/lib/rancher/rke2/server/db/snapshots
etcd-snapshot-dir: "${data-dir}/db/snapshots"
#### 存储目录配置
# kube-config 文件位置
write-kubeconfig: "/root/.kube/config"
# kube-config 文件权限
write-kubeconfig-mode: "0644"
# Rke2文件存储目录
data-dir: "/var/lib/rancher/rke2"
```
4. 启动
```bash
systemctl enable rke2-server.service
systemctl start rke2-server.service
```
## 添加计算节点
[Agent 配置参考](https://docs.rke2.io/zh/reference/linux_agent_config)
1. 安装 RKE2
```bash
curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE="agent" sh -
```
> - 使用国内源
>
> ```bash
> curl -sfL http://rancher-mirror.rancher.cn/rke2/install.sh | INSTALL_RKE2_MIRROR=cn INSTALL_RKE2_TYPE="agent" sh -
> ```
>
> - 指定版本
>
> ```bash
> curl -sfL https://rancher-mirror.rancher.cn/rke2/install.sh | INSTALL_RKE2_MIRROR=cn INSTALL_RKE2_TYPE="agent" INSTALL_RKE2_CHANNEL=v1.20 sh -
> ```
2. 创建 RKE2 配置文件
```bash
mkdir -p /etc/rancher/rke2/
vi /etc/rancher/rke2/config.yaml
```
内容如下
```yaml
# 指定要连接的集群服务器地址
server: https://192.168.1.100:9345
# Master 节点 token
token: <token> #主节点获取的token值
# 节点 NAME
node-name: k8s-node-1 # 与当前主机名保持一致
# 阿里镜像源加速, 通常由社区志愿者维护, 镜像同步通常存在滞后性
#system-default-registry: "registry.cn-hangzhou.aliyuncs.com"
#### 网络配置
# 指定网络模式, [ ipvs , iptables ] 默认是:iptables
kube-proxy-arg:
- "proxy-mode=iptables"
```
3. 部署计算节点
```bash
systemctl enable rke2-agent.service
systemctl start rke2-agent.service
```
4. 计算节点添加角色标签
```bash
kubectl label node ${node} node-role.kubernetes.io/worker=true --overwrite
```
## 卸载节点
1. 删除 node 节点
```bash
kubectl delete node ${node}
```
2. 停止 Rke2
```bash
rke2-killall.sh
```
3. 卸载 Rke2
```bash
rke2-uninstall.sh
```
## RKE2高可用部署Kubernetes
1. 编辑 Nginx 配置文件
```bash
vi nginx.conf
```
内容如下
```nginx
events {
worker_connections 1024;
}
stream {
upstream kube-apiserver {
server host1:6443 max_fails=3 fail_timeout=30s;
server host2:6443 max_fails=3 fail_timeout=30s;
server host3:6443 max_fails=3 fail_timeout=30s;
}
upstream rke2 {
server host1:9345 max_fails=3 fail_timeout=30s;
server host2:9345 max_fails=3 fail_timeout=30s;
server host3:9345 max_fails=3 fail_timeout=30s;
}
server {
listen 6443;
proxy_connect_timeout 2s;
proxy_timeout 900s;
proxy_pass kube-apiserver;
}
server {
listen 9345;
proxy_connect_timeout 2s;
proxy_timeout 900s;
proxy_pass rke2;
}
}
```
2. 启动 Nginx
```bash
docker run -itd -p 9345:9345 -p 6443:6443 -v ~/nginx.conf:/etc/nginx/nginx.conf nginx
```
3. 更改之前的 config.yaml
```bash
vi /etc/rancher/rke2/config.yaml
```
内容如下
```bash
tls-san:
- xxx.xxx.xxx.xxx
```