Skip to main content

遇到了传说中的container runtime is down PLEG is not healthy

· 4 min read

在一次异常断电后, 开发环境的一个小 kubernetes cluster 中不幸遭遇了 PLEG is not healthy 问题, 表现是 k8s 中的 pod 状态变成 Unknown 或 ContainerCreating, k8s 节点状态变成 NotReady:

# kubectl get nodes
NAME STATUS ROLES AGE VERSION EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-dev-master Ready master 1y v1.10.0 <none> CentOS Linux 7 (Core) 3.10.0-957.21.3.el7.x86_64 docker://17.3.0
k8s-dev-node1 NotReady node 1y v1.10.0 <none> CentOS Linux 7 (Core) 3.10.0-957.21.3.el7.x86_64 docker://Unknown
k8s-dev-node2 NotReady node 1y v1.10.0 <none> CentOS Linux 7 (Core) 3.10.0-957.21.3.el7.x86_64 docker://Unknown
k8s-dev-node3 NotReady node 289d v1.10.0 <none> CentOS Linux 7 (Core) 3.10.0-957.21.3.el7.x86_64 docker://Unknown
k8s-dev-node4 Ready node 289d v1.10.0 <none> CentOS Linux 7 (Core) 3.10.0-957.21.3.el7.x86_64 docker://17.3.0

Kubelet 日志中提示: skipping pod synchronization, container runtime is down PLEG is not healthy:

9月 25 11:05:06 k8s-dev-node1 kubelet[546]: I0925 11:05:06.003645     546 kubelet.go:1794] skipping pod synchronization - [container runtime is down PLEG is not healthy: pleg was last seen active 21m18.877402888s ago; threshold is 3m0s]
9月 25 11:05:11 k8s-dev-node1 kubelet[546]: I0925 11:05:11.004116 546 kubelet.go:1794] skipping pod synchronization - [container runtime is down PLEG is not healthy: pleg was last seen active 21m23.877803484s ago; threshold is 3m0s]
9月 25 11:05:16 k8s-dev-node1 kubelet[546]: I0925 11:05:16.004382 546 kubelet.go:1794] skipping pod synchronization - [container runtime is down PLEG is not healthy: pleg was last seen active 21m28.878169681s ago; threshold is 3m0s]

重启节点 docker 和 kubelet 后恢复,过不了多久又出错变成 NotReady, google 了一把,在 stackoverflow 和 github/kubernetes 上有相关的 issue:

#45419在 v1.16 中才被 fix, 从 1.10 升级到 1.16 太繁琐, 看到 #61117中的一个评论说通过请求节点上的/var/lib/kubelet/pods 目录可以解决, 第一次试了下由于 mount 卷的占用问题没有删除掉该目录, 问题没有解决, 后面索性级升级了 docker, 从 17.3.0 升级到了 19.3.2, 并请除了每个节点中/var/lib/kubelet/pods/, /var/lib/docker 两个目录下的所有数据后,问题解决了。

大致过程:

# 先禁用docker和kubelet自动启动, 重启后清除文件:
systemctl disable docker && systemctl disable kubelet
reboot
rm -rf /var/lib/kubelet/pods/
rm -rf /var/lib/docker

# 中间顺便把docker-ce从17.3.0升级到了19.3.2

# 升级完docker后修改docker.service还指定17.3.0中默认的storage-driver为overlay, 中间试过overlay2, devicemapper, vfs, kubelet中都有报错, 不知是kubernetes v1.10的支持问题还是数据没有清除干净
vi /etc/systemd/system/docker.service

ExecStart=/usr/bin/dockerd ... --storage-driver=overlay

# 重新加载配置后启动docker
systemctl daemon-reload
systemctl start docker && systemctl enable docker
systemctl status docker

# 由于/var/lib/docker目录被整体删除, 如果节点不能直接访问k8s镜像库,需要手动导入节点需要的基础镜像:
docker load -i kubernetes-v10.0-node.tar

# 启动Kubelet
systemctl start kubelet && systemctl enable kubelet
systemctl status kubelet

问题解决:

# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-dev-master Ready master 1y v1.10.0 <none> CentOS Linux 7 (Core) 3.10.0-957.21.3.el7.x86_64 docker://17.3.0
k8s-dev-node1 Ready node 1y v1.10.0 <none> CentOS Linux 7 (Core) 3.10.0-957.21.3.el7.x86_64 docker://19.3.2
k8s-dev-node2 Ready node 1y v1.10.0 <none> CentOS Linux 7 (Core) 3.10.0-957.21.3.el7.x86_64 docker://19.3.2
k8s-dev-node3 Ready node 289d v1.10.0 <none> CentOS Linux 7 (Core) 3.10.0-957.21.3.el7.x86_64 docker://19.3.2
k8s-dev-node4 Ready node 289d v1.10.0 <none> CentOS Linux 7 (Core) 3.10.0-957.21.3.el7.x86_64 docker://19.3.2

本次断电不幸造成了 kong 网关上 3 个月的配置数据丢失:(, 备份! 备份! 备份!

Kubernetes CronJob failed to schedule: Cannot determine if job needs to be started: Too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew

· 2 min read

Kubernetes v1.13.3, schedule 了一个 cronjob, 每 5 分钟运行一次, 但发现已经有 3 天没有新的 pod 被创建了:

# kubectl get cronjob/dingtalk-atndsyncer
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
dingtalk-atndsyncer */5 * * * * False 0 3d1h 4d21h

cronjob 中的.spec.concurrencyPolicy 为 Forbid, 不允许多任务并行, describe 该 cronjob 提示:FailedNeedsStart, 具体 message 是"Cannot determine if job needs to be started: Too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew."

# kubectl describe cronjob/dingtalk-atndsyncer
Name: dingtalk-atndsyncer
Namespace: default
Labels: app=dingtalk-atndsyncer
Annotations: <none>
Schedule: */5 * * * *
Concurrency Policy: Forbid
Suspend: False
Starting Deadline Seconds: <unset>
Selector: <unset>
Parallelism: <unset>
Completions: <unset>
Pod Template:
Labels: <none>
Containers:
dingtalk-atndsyncer:
Image: dingtalk-atndsyncer:v1.0
Port: <none>
Host Port: <none>
Environment:
ASPNETCORE_ENVIRONMENT: Production
Mounts: <none>
Volumes: <none>
Last Schedule Time: Fri, 06 Sep 2019 08:15:00 +0800
Active Jobs: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedNeedsStart 43m (x790 over 178m) cronjob-controller Cannot determine if job needs to be started: Too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew.
Warning FailedNeedsStart 25m (x89 over 40m) cronjob-controller Cannot determine if job needs to be started: Too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew.
Warning FailedNeedsStart 119s (x117 over 22m) cronjob-controller Cannot determine if job needs to be started: Too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew.

Google 后仔细阅读官方文档(https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#starting-deadline), 说是如果没有配置.spec.startingDeadlineSeconds, 则从最后一次的 schedule 时间统计错过的 schedule 次数,如果超过 100 次就不再 schedule 尝试把.spec.startingDeadlineSeconds 配置为 300 秒, 意味着如果 5 分钟内错过 schedule 超过 100 次,才不会 schedule (因为 schedule 周期是 5 分钟, 所以是一个不太可能达到的条件), 配置后任务 schedule 正常了

ClustrMaps