Kubernetes v1.13.3, schedule了一个cronjob, 每5分钟运行一次, 但发现已经有3天没有新的pod被创建了:
# kubectl get cronjob/dingtalk-atndsyncer NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE dingtalk-atndsyncer */5 * * * * False 0 3d1h 4d21h
cronjob中的.spec.concurrencyPolicy为Forbid, 不允许多任务并行, describe该cronjob提示:FailedNeedsStart, 具体message是”Cannot determine if job needs to be started: Too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew.”
# kubectl describe cronjob/dingtalk-atndsyncer
Name: dingtalk-atndsyncer
Namespace: default
Labels: app=dingtalk-atndsyncer
Annotations: <none>
Schedule: */5 * * * *
Concurrency Policy: Forbid
Suspend: False
Starting Deadline Seconds: <unset>
Selector: <unset>
Parallelism: <unset>
Completions: <unset>
Pod Template:
Labels: <none>
Containers:
dingtalk-atndsyncer:
Image: dingtalk-atndsyncer:v1.0
Port: <none>
Host Port: <none>
Environment:
ASPNETCORE_ENVIRONMENT: Production
Mounts: <none>
Volumes: <none>
Last Schedule Time: Fri, 06 Sep 2019 08:15:00 +0800
Active Jobs: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedNeedsStart 43m (x790 over 178m) cronjob-controller Cannot determine if job needs to be started: Too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew.
Warning FailedNeedsStart 25m (x89 over 40m) cronjob-controller Cannot determine if job needs to be started: Too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew.
Warning FailedNeedsStart 119s (x117 over 22m) cronjob-controller Cannot determine if job needs to be started: Too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew.
Google后仔细阅读官方文档(https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#starting-deadline), 说是如果没有配置.spec.startingDeadlineSeconds, 则从最后一次的schedule时间统计错过的schedule次数,如果超过100次就不再schedule
尝试把.spec.startingDeadlineSeconds配置为300秒, 意味着如果5分钟内错过schedule超过100次,才不会schedule (因为schedule周期是5分钟, 所以是一个不太可能达到的条件), 配置后任务schedule正常了