Kubernetes v1.13.3, schedule了一个cronjob, 每5分钟运行一次, 但发现已经有3天没有新的pod被创建了:
# kubectl get cronjob/dingtalk-atndsyncer NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE dingtalk-atndsyncer */5 * * * * False 0 3d1h 4d21h
cronjob中的.spec.concurrencyPolicy为Forbid, 不允许多任务并行, describe该cronjob提示:FailedNeedsStart, 具体message是”Cannot determine if job needs to be started: Too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew.”
# kubectl describe cronjob/dingtalk-atndsyncer Name: dingtalk-atndsyncer Namespace: default Labels: app=dingtalk-atndsyncer Annotations: <none> Schedule: */5 * * * * Concurrency Policy: Forbid Suspend: False Starting Deadline Seconds: <unset> Selector: <unset> Parallelism: <unset> Completions: <unset> Pod Template: Labels: <none> Containers: dingtalk-atndsyncer: Image: dingtalk-atndsyncer:v1.0 Port: <none> Host Port: <none> Environment: ASPNETCORE_ENVIRONMENT: Production Mounts: <none> Volumes: <none> Last Schedule Time: Fri, 06 Sep 2019 08:15:00 +0800 Active Jobs: <none> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedNeedsStart 43m (x790 over 178m) cronjob-controller Cannot determine if job needs to be started: Too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew. Warning FailedNeedsStart 25m (x89 over 40m) cronjob-controller Cannot determine if job needs to be started: Too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew. Warning FailedNeedsStart 119s (x117 over 22m) cronjob-controller Cannot determine if job needs to be started: Too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew.
Google后仔细阅读官方文档(https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#starting-deadline), 说是如果没有配置.spec.startingDeadlineSeconds, 则从最后一次的schedule时间统计错过的schedule次数,如果超过100次就不再schedule
尝试把.spec.startingDeadlineSeconds配置为300秒, 意味着如果5分钟内错过schedule超过100次,才不会schedule (因为schedule周期是5分钟, 所以是一个不太可能达到的条件), 配置后任务schedule正常了