[!TIP]
K8S
部署prometheus
以及grafana
并且使用Ingress
对外访问
Ingress Nginx Controller
的安装教程地址:(https://janrs.com/2023/02/k8s%e9%83%a8%e7%bd%b2ingress-controller/)
转载请注明出处:https://janrs.com
k8s部署prometheus以及grafana
k8s
部署prometheus
以及grafana
,并且挂载nfs
进行持久化
1.创建nfs服务
查看教程,地址:(https://janrs.com/2023/02/k8s%e9%83%a8%e7%bd%b2nfs/)
2.创建命名空间
创建
kubectl create ns monitoring
查看创建结果
kubectl get ns
显示
NAME STATUS AGE
default Active 23h
ingress-nginx Active 175m
kube-node-lease Active 23h
kube-public Active 23h
kube-system Active 23h
kuboard Active 23h
monitoring Active 4s
nfs Active 23h
web-nginx Active 144m
2-1.生成拉取镜像密钥
生成密钥
[!NOTE]
--docker-password
参数修改为自己的密码
kubectl --namespace monitoring create secret docker-registry aliimagesecret --docker-server=registry.cn-shenzhen.aliyuncs.com --docker-username=yjy86868@163.com --docker-password=${PASSWORD} --docker-email=yjy86868@163.com
3.创建pvc
创建grafana.-pvc.yaml
vim monitoring-grafana-pvc.yaml
添加以下yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana-pvc
namespace: monitoring
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi
storageClassName: nfs-storage
status:
accessModes:
- ReadWriteMany
capacity:
storage: 10Gi
执行创建
kubectl apply -f monitoring-grafana-pvc.yaml
查看创建结果
kubectl get pvc -n monitoring
显示状态为Bound表示成功
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
grafana-pvc Bound pvc-2d0399ca-b20b-4df0-b8f9-a4674410627e 10Gi RWX nfs-storage 27s
4.部署prometheus
[!NOTE]
官方提供的教程有些镜像下载不了,所以要手动更换一下镜像
并且官方提供的教程没有挂载nfs
,这里部署之前需要先修改一下再进行部署
4-1.下载prometheus
直接下载
git clone https://github.com/coreos/kube-prometheus.git
4-1-1.修改grafana-deployment.yaml
修改grafana deployment
挂载创建的pvc
打开grafana-deployment.yaml
cd kube-prometheus/manifests/ &&
vim grafana-deployment.yaml
找到带有emptyDir
的grafana-storage
配置参数
将官方的emptyDir
更换为persistentVolumeClaim
并且添加上面创建的pvc
: grafana-pvc
然后保存。
[!NOTE]
grafana-pvc
就是刚才上面创建pvc
时指定的名称
...
securityContext:
fsGroup: 65534
runAsNonRoot: true
runAsUser: 65534
serviceAccountName: grafana
volumes:
# 在这里修改
#- emptyDir: {}
- name: nfs-storage
persistentVolumeClaim:
claimName: grafana-pvc
- name: grafana-datasources
...
修改指定存储的位置
...
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
volumeMounts:
- mountPath: /var/lib/grafana
# 位置在这。同样修改为上面创建的sc
name: nfs-storage
readOnly: false
- mountPath: /etc/grafana/provisioning/datasources
name: grafana-datasources
readOnly: false
- mountPath: /etc/grafana/provisioning/dashboards
name: grafana-dashboards
readOnly: false
- mountPath: /tmp
name: tmp-plugins
...
4-1-2.prometheus-k8s持久化
[!NOTE]
有出现Warning
可以不用管
prometheus-server
获取各端点数据并存储与本地,创建方式为自定义资源 crd
中的prometheus
。
创建自定义资源prometheus
后,会启动一个statefulset
,即prometheus-server
。 默认是没有配置持久化存储的。
修改prometheus-prometheus.yaml
cd kube-prometheus/manifests/ &&
vim prometheus-prometheus.yaml
修改位置为如下所示
[!NOTE]
storageClassName
参数指定的值为已创建好的NFS Storage Class Name
enableFeatures: []
externalLabels: {}
imagePullSecrets:
- name: aliimagesecret
image: registry.cn-shenzhen.aliyuncs.com/yjy_k8s/k8s-prometheus-prometheus:v2.38.0
#image: quay.io/prometheus/prometheus:v2.38.0
# 在这里添加storage
storage:
volumeClaimTemplate:
spec:
storageClassName: nfs-storage
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
nodeSelector:
kubernetes.io/os: linux
podMetadata:
labels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/instance: k8s
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.38.0
4-1-3.修改存储时长
设置一下日志的保存时间
cd manifests &&
vim prometheusOperator-deployment.yaml
修改位置为
spec:
automountServiceAccountToken: true
imagePullSecrets:
- name: aliimagesecret
containers:
- args:
- --kubelet-service=kube-system/kubelet
- --prometheus-config-reloader=registry.cn-shenzhen.aliyuncs.com/yjy_k8s/k8s-prometheus-prometheus-config-reloader:v0.58.0
# 在这里添加下面这行配置
- storage.tsdb.retention.time=180d ## 修改存储时长
#- --prometheus-config-reloader=quay.io/prometheus-operator/prometheus-config-reloader:v0.58.0
image: registry.cn-shenzhen.aliyuncs.com/yjy_k8s/k8s-prometheus-prometheus-operator:v0.58.0
#image: quay.io/prometheus-operator/prometheus-operator:v0.58.0
name: prometheus-operator
ports:
- containerPort: 8080
name: http
resources:
limits:
cpu: 200m
memory: 200Mi
4-1-4.查看所需镜像
a) 替换镜像地址
有些镜像国内下载不了,需要替换成国内的
进入文件夹
cd kube-prometheus/manifests
列出所有需要用到的镜像
grep -riE 'quay.io|k8s.gcr|grafana/|image:' *
显示
alertmanager-alertmanager.yaml: image: quay.io/prometheus/alertmanager:v0.24.0
blackboxExporter-deployment.yaml: image: quay.io/prometheus/blackbox-exporter:v0.22.0
blackboxExporter-deployment.yaml: image: jimmidyson/configmap-reload:v0.5.0
blackboxExporter-deployment.yaml: image: quay.io/brancz/kube-rbac-proxy:v0.13.0
grafana-deployment.yaml: image: grafana/grafana:9.1.1
grafana-prometheusRule.yaml: runbook_url: https://runbooks.prometheus-operator.dev/runbooks/grafana/grafanarequestsfailing
kubeStateMetrics-deployment.yaml: image: k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.6.0
kubeStateMetrics-deployment.yaml: image: quay.io/brancz/kube-rbac-proxy:v0.13.0
kubeStateMetrics-deployment.yaml: image: quay.io/brancz/kube-rbac-proxy:v0.13.0
nodeExporter-daemonset.yaml: image: quay.io/prometheus/node-exporter:v1.3.1
nodeExporter-daemonset.yaml: image: quay.io/brancz/kube-rbac-proxy:v0.13.0
prometheusAdapter-deployment.yaml: image: k8s.gcr.io/prometheus-adapter/prometheus-adapter:v0.9.1
prometheusOperator-deployment.yaml: - --prometheus-config-reloader=quay.io/prometheus-operator/prometheus-config-reloader:v0.58.0
prometheusOperator-deployment.yaml: image: quay.io/prometheus-operator/prometheus-operator:v0.58.0
prometheusOperator-deployment.yaml: image: quay.io/brancz/kube-rbac-proxy:v0.13.0
prometheus-prometheus.yaml: image: quay.io/prometheus/prometheus:v2.38.0
b) 下载镜像
把上面有用到的所有镜像依次到阿里镜像仓库构建好
[!NOTE]
上面检索出来的镜像有些是重复的。但是需要在不同的yaml
文件使用
我个人使用的是阿里的私人镜像仓库。把上面显示的所有需要用到的镜像先到阿里镜像仓库构建好。然后替换。
c) 替换镜像
根据上面检索出来的镜像所在文件,把阿里构建好的镜像全部替换上去
[!NOTE]
如果跟我一样使用的是私人的镜像仓库,还要把密钥设置上去。
4-2.安装
根据官网教程直接安装
kubectl create -f manifests/setup
以上步骤安装后,需要执行以下命令查看是否已经准备好
until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done
显示以下信息表示已经准备好安装
No resources found
执行安装
kubectl create -f manifests/
安装后查看pods
kubectl get pods -n monitoring
显示以下信息表示安装成功
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 2/2 Running 0 35m
alertmanager-main-1 2/2 Running 0 35m
alertmanager-main-2 2/2 Running 0 35m
blackbox-exporter-67976f746b-x45vl 3/3 Running 0 35m
grafana-98b487d5f-mtd8q 1/1 Running 0 35m
kube-state-metrics-7fbb67d4df-cwhf7 3/3 Running 0 35m
node-exporter-7zckp 2/2 Running 0 35m
node-exporter-fz6k8 2/2 Running 0 35m
node-exporter-lls7g 2/2 Running 0 35m
prometheus-adapter-7f5d756f48-pm4nb 1/1 Running 0 35m
prometheus-adapter-7f5d756f48-tlhrg 1/1 Running 0 35m
prometheus-k8s-0 2/2 Running 0 35m
prometheus-k8s-1 2/2 Running 0 35m
prometheus-operator-84576d8b79-2r4ss 2/2 Running 0 35m
4-3.配置NetworkPolicy
[!NOTE]
k8s
的网络策略
是基于pod
实现了。这个需要了解
网络规则的应用是通过podSelector
来控制的
本教程使用的是Calico
官方提供的默认的部署文件是有网络隔离的,ingress nginx
无法访问。
需要手动进行配置让ingress nginx controller
访问。
4-3-1.查看Ingress Nginx的podLabel
查看ingress-nginx
的pod
kubectl get pods -n ingress-nginx
显示
NAME READY STATUS RESTARTS AGE
ingress-nginx-admission-create-hxh9h 0/1 Completed 0 17h
ingress-nginx-admission-patch-5hh89 0/1 Completed 0 17h
ingress-nginx-controller-g8wdx 1/1 Running 0 17h
nginx-errors-5c6dd76c59-xnb4b 1/1 Running 0 17h
查看label
kubectl get pod ingress-nginx-controller-g8wdx --show-labels -n ingress-nginx
显示
NAME READY STATUS RESTARTS AGE LABELS
ingress-nginx-controller-g8wdx 1/1 Running 0 17h app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx,controller-revision-hash=6b75b78ffb,pod-template-generation=1
[!NOTE]
可以看到有一个很多个label
。通常有关网络的使用,是选用app.kubernetes.io/name=ingress-nginx
来选择的
4-3-2.配置Prometheus NetworkPolicy
打开prometheus-networkPolicy.yaml
vim prometheus-networkPolicy.yaml
添加网络规则
直接在nodeSelector
添加上面获取到的ingress-nginx
的label
...
- from:
- podSelector:
matchLabels:
app.kubernetes.io/name: grafana
ports:
- port: 9090
protocol: TCP
podSelector:
matchLabels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/instance: k8s
app.kubernetes.io/name: prometheus
# 在这里添加
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: kube-prometheus
policyTypes:
- Egress
- Ingress
...
重新加载配置
kubectl apply -f prometheus-networkPolicy.yaml
显示
networkpolicy.networking.k8s.io/prometheus-k8s configured
4-3-3.配置Grafana NetworkPolicy
打开grafana-networkPolicy.yaml
vim grafana-networkPolicy.yaml
添加网络规则
直接在nodeSelector
添加上面获取到的ingress-nginx
的label
...
- from:
- podSelector:
matchLabels:
app.kubernetes.io/name: grafana
ports:
- port: 9090
protocol: TCP
podSelector:
matchLabels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/instance: k8s
app.kubernetes.io/name: prometheus
# 在这里添加
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: kube-prometheus
policyTypes:
- Egress
- Ingress
...
重新加载配置
kubectl apply -f grafana-networkPolicy.yaml
显示
networkpolicy.networking.k8s.io/grafana configured
4-4.创建Ingress
配置好网络策略后,就可以创建Ingress
进行访问了
创建monitoring-ingress.yaml
vim monitoring-ingress.yaml
添加以下yaml
[!NOTE]
下面用到的域名是我自己的。如果要域名访问。修改- host
参数替换成自己的。
怎么给Ingress Nginx
设置域名访问。在本文开头提到的教程有
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
k8s.eip.work/workload: grafana
k8s.kuboard.cn/workload: grafana
generation: 2
labels:
app: grafana
name: grafana
namespace: monitoring
spec:
ingressClassName: 'nginx'
rules:
- host: k8s-grafana.janrs.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: grafana
port:
number: 3000
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
k8s.kuboard.cn/workload: prometheus-k8s
generation: 2
labels:
app: prometheus
prometheus: k8s
managedFields:
- apiVersion: networking.k8s.io/v1
name: prometheus-k8s
namespace: monitoring
spec:
ingressClassName: 'nginx'
rules:
- host: k8s-prom.janrs.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: prometheus-k8s
port:
number: 9090
执行创建
kubectl apply -f monitoring-ingress.yaml
查看创建结果
kubectl get ingress -n monitoring
显示
NAME CLASS HOSTS ADDRESS PORTS AGE
grafana nginx k8s-grafana.janrs.com 172.31.235.118 80 3h31m
prometheus-k8s nginx k8s-prom.janrs.com 172.31.235.118 80 3h31m
域名访问
直接在浏览器打开自己配置的域名
4-5.添加Dashboard
在左侧控制栏Dashboards -> Import
输入常用的DashboardID : 8919
然后导入即可
4-6.每个节点部署Node Exporter
[!NOTE]
默认部署的Node Exporter
只有监控被调度到的那个节点
要监控到每个节点需要用到节点亲和性
节点亲和性也是用到label
标签选择器。具体自行谷歌补充知识。很简单
这样就可以在上面添加的ID
为8919
的Dashboard
监控到每个节点的资源使用情况
打开nodeExporter-daemonset.yaml
vim nodeExporter-daemonset.yaml
修改为节点亲和性部署
注释掉硬性调度策略nodeSelector
。然后改为亲和性调度部署。代码如下:
...
runAsGroup: 65532
runAsNonRoot: true
runAsUser: 65532
hostNetwork: true
hostPID: true
# 注释或删除掉该硬性调度
#nodeSelector:
#node-label-prometheus: 'true'
#kubernetes.io/os: linux
#添加节点亲和性调度
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-label-node-exporter
operator: In
values:
- 'true'
priorityClassName: system-cluster-critical
securityContext:
runAsNonRoot: true
runAsUser: 65534
serviceAccountName: node-exporter
...
执行
kubectl apply -f nodeExporter-daemonset.yaml
查看执行结果
kubectl get pods -n monitoring -o wide | grep node-exporter
显示
[!NOTE]
我这边有3
台Worker
节点。所以只有显示三个
node-exporter-2q2rw 2/2 Running 0 15m 172.16.222.231 k8s-node02 <none> <none>
node-exporter-jprr2 2/2 Running 0 14m 172.16.222.233 k8s-node03 <none> <none>
node-exporter-pj7gn 2/2 Running 0 15m 172.16.222.230 k8s-node01 <none> <none>
删除Prometheus
kubectl delete --ignore-not-found=true -f manifests/ -f manifests/setup
发表回复