IT漫步

技术生活札记©Yaohui

How to install specific hotfix on Windows Server

Windows容器环境有个特点, Host与Container的OS Builder Number必须匹配, 有点场景甚至要求Revision Number匹配, 所以经常要为K8s Node安装指定Revision 的hotfix,  用powershell在线安装时下载过程缓慢而不可控, 体验最好的路径还是直接找到相应Revision Number的msu安装包,直接安装: 1. 从Windows Update History网站找到版本对应的KB. 如: Windows Server 1809 OS Build 10.0.17763.1158 https://support.microsoft.com/en-us/help/4549949 2. 在Windows Update Catelog按KB搜索: https://www.catalog.update.microsoft.com/ 找到相应的下载包. 如17763.1158对应的KB4549949: https://www.catalog.update.microsoft.com/Search.aspx?q=KB4549949 3. 下载msu安装包后使用wusa指令安装即可: wusa windows10.0-kb4549949-x64_90e8805e69944530b8d4d4877c7609b9a9e68d81.msu 附: 为了防止Windows Node版本变更, 还要关闭Windows Auto Update, 防止Node OS自己变更版本: a). 查看Auto Update 状态: %systemroot%\system32\Cscript %systemroot%\system32\scregedit.wsf /AU /v b). 禁用 Windows …


For Windows Container, you need to set –image-pull-progress-deadline for kubelet

Windows镜像动则几个G, 基于Windows Server Core的镜像5~10G, Windows节点上的kubelet在下载镜像的时候经常会cancel掉: Failed to pull image "XXX": rpc error: code = Unknown desc = context canceled   造成这个问题的原因是因为默认的image pulling progress deadline是1分钟, 如果1分钟内镜像下载没有任何进度更新, 下载动作就会取消, 比较大的镜像就无法成功下载. 见官方文档: If no pulling progress is made before this deadline, the image pulling will be cancelled. This docker-specific flag only works when container-runtime is set to docker. (default …


Describe Kubelet Service Parameters on Azure Windows node

Query Kubelet service Managed by nssm C:\k>sc qc kubelet [SC] QueryServiceConfig SUCCESS SERVICE_NAME: kubelet TYPE : 10 WIN32_OWN_PROCESS START_TYPE : 2 AUTO_START ERROR_CONTROL : 1 NORMAL BINARY_PATH_NAME : C:\k\nssm.exe LOAD_ORDER_GROUP : TAG : 0 DISPLAY_NAME : Kubelet DEPENDENCIES : docker SERVICE_START_NAME : LocalSystem Query kubelet AppParameters by nssm C:\k>nssm get kubelet Application C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe C:\k>nssm get …


Run Windows container with Hyper-V isolation mode in Kubernetes

Windows Container有两种隔离运行模式Hyper-V和Process, 参见:Isolation Modes 两种模式下的host的OS版本与containter的OS版本存在兼容性又不相同,参见:Windows container version compatibility 很明显Hyper-V模式的兼容性要比Process模式要好,向下兼容,也就是高版本的host OS可以运行低版本的container OS, 反之不行; 而Process模式下Windows Server中则要求host OS与container OS的版本完全相同, Windows 10中则不支持Process模式.   某一天,我想在Kubernetes Windows 节点中以Hyper-V模式运行Container, 于是乎发现1.17的文档中写道: Note: In this document, when we talk about Windows containers we mean Windows containers with process isolation. Windows containers with Hyper-V isolation is planned for a future release. 不甘心又google了一下,发现: 1. 有人提了bug, 已经被修复了: https://github.com/kubernetes/kubernetes/issues/58750 2. 代码也merge了: …


ernel:unregister_netdevice: waiting for eth0 to become free. Usage count = 1

[Copied from]: https://access.redhat.com/solutions/3659011 RHEL7 and kubernetes: kernel:unregister_netdevice: waiting for eth0 to become free. Usage count = 1  SOLUTION VERIFIED – Updated October 13 2019 at 9:17 PM – English  Issue We are trying to prototype kubernetes on top of RHEL and encounter the situation that the device seems to be frozen. There are repeated messages similar to: Raw …


[Kubernetes] Create deployment, service by Python client

Install Kubernetes Python Client and PyYaml: # pip install kubernetes pyyaml 1. Get Namespaces or Pods by CoreV1Api: # -*- coding: utf-8 -*- from kubernetes import client, config, utils config.kube_config.load_kube_config(config_file="../kubecfg.yaml") coreV1Api = client.CoreV1Api() print("\nListing all namespaces") for ns in coreV1Api.list_namespace().items: print(ns.metadata.name) print("\nListing pods with their IP, namespace, names:") for pod in coreV1Api.list_pod_for_all_namespaces(watch=False).items: print("%s\t\t%s\t%s" % (pod.status.pod_ip, …


Customize hosts record on docker and kubernetes

Docker: docker run -it --rm --add-host=host1:172.17.0.2 --add-host=host2:192.168.1.3 busybox use “–add-host” to add entries to /etc/hosts   Kubernetes: apiVersion: v1 kind: Pod metadata: name: hostaliases-pod spec: hostAliases: - ip: "127.0.0.1" hostnames: - "foo.local" - "bar.local" - ip: "10.1.2.3" hostnames: - "foo.remote" - "bar.remote" containers: - name: cat-hosts image: busybox command: - cat args: - "/etc/hosts" use …


遇到了传说中的container runtime is down PLEG is not healthy

在一次异常断电后, 开发环境的一个小kubernetes cluster中不幸遭遇了PLEG is not healthy问题, 表现是k8s中的pod状态变成Unknown或ContainerCreating, k8s节点状态变成NotReady: # kubectl get nodes NAME STATUS ROLES AGE VERSION EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME k8s-dev-master Ready master 1y v1.10.0 <none> CentOS Linux 7 (Core) 3.10.0-957.21.3.el7.x86_64 docker://17.3.0 k8s-dev-node1 NotReady node 1y v1.10.0 <none> CentOS Linux 7 (Core) 3.10.0-957.21.3.el7.x86_64 docker://Unknown k8s-dev-node2 NotReady node 1y v1.10.0 <none> CentOS Linux 7 (Core) 3.10.0-957.21.3.el7.x86_64 …


Kubernetes CronJob failed to schedule: Cannot determine if job needs to be started: Too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew

Kubernetes v1.13.3, schedule了一个cronjob, 每5分钟运行一次, 但发现已经有3天没有新的pod被创建了: # kubectl get cronjob/dingtalk-atndsyncer NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE dingtalk-atndsyncer */5 * * * * False 0 3d1h 4d21h cronjob中的.spec.concurrencyPolicy为Forbid, 不允许多任务并行, describe该cronjob提示:FailedNeedsStart, 具体message是”Cannot determine if job needs to be started: Too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew.” # kubectl describe …


Kubernetes 1.13.3 external etcd clean up | Kubernetes外部etcd数据清除

Kubernetes配置过程中如果出了问题, 可以用kubeadm reset重置Kubernetes cluster状态, 但如果使用了外部etcd cluster, 则执行kubeadm reset不会清除外部etcd cluster中的数据, 也就意味着如果再次执行kubeadm init, 则会看到上一个kubenetes cluster中的数据。 查询和手动清除外部etcd cluster的方式如下(以Kubernetes 1.13.3为例): 1. 查询所有数据: docker run --rm -it --net host -v /etc/kubernetes:/etc/kubernetes -e ETCDCTL_API=3 k8s.gcr.io/etcd:3.2.24 etcdctl --cert="/etc/kubernetes/pki/etcd/healthcheck-client.crt" --key="/etc/kubernetes/pki/etcd/healthcheck-client.key" --cacert="/etc/kubernetes/pki/etcd/ca.crt" --endpoints https://etcd1.cloud.k8s:2379 get "" --prefix 2. 删除所有数据: docker run --rm -it --net host -v /etc/kubernetes:/etc/kubernetes -e ETCDCTL_API=3 k8s.gcr.io/etcd:3.2.24 etcdctl --cert="/etc/kubernetes/pki/etcd/healthcheck-client.crt" --key="/etc/kubernetes/pki/etcd/healthcheck-client.key" --cacert="/etc/kubernetes/pki/etcd/ca.crt" --endpoints https://etcd1.cloud.k8s:2379 del "" --prefix …

Proudly powered by WordPress