ylhyh – Page 5 – IT漫步

yum is broken after upgrade python from 2.x to 3.x on CentOS 7

2019年10月28日2019年10月29日 by ylhyh

yum is broken after upgrade python from 2.x to 3.x on CentOS7:

# yum
  File "/usr/bin/yum", line 30
    except KeyboardInterrupt, e:
                            ^

Resolution: modify /usr/bin/yum and /usr/libexec/urlgrabber-ext-down, change first line from “#!/usr/bin/python” to “#!/usr/bin/python2”:

#!/usr/bin/python2
import sys
try:
    import yum
except ImportError:
    print >> sys.stderr, """\
...

#! /usr/bin/python2
#  A very simple external downloader
#  Copyright 2011-2012 Zdenek Pavlas

#   This library is free software; you can redistribute it and/or
#   modify it under the terms of the GNU Lesser General Public

Setup Windows development environment for Tencent-BlueKing Standard OPS

2019年10月24日 by ylhyh

1. Install Python 2.7 and npm tool

2. Clone source code from: https://github.com/Tencent/bk-sops

3. Run cmd shell, change current directory to root path of bk-sops

4. Install required packages by run:

pip install --proxy={proxy if any} -r requirements.txt

5. Open url “http://{BK_PAAS_HOST}/admin/app/app/” of your BlueKing paas platform, copy app-code and token for app “标准运维”(bk_sops)

6. Create a bat file and replace these variables marked with {} to real values in your environment. Run the bat file to setup env variables:

env.bat

set APP_ID=bk_sops
set APP_TOKEN={APP_TOKEN}
set BK_PAAS_HOST={BK_PAAS_HOST}
set BK_CC_HOST={BK_CC_HOST}
set BK_JOB_HOST={BK_JOB_HOST}

7. Other steps please refer to: https://github.com/Tencent/bk-sops/blob/master/docs/install/dev_deploy.md

At last, you canc reate a bat file to run local development environment quickly:

run-local-sops.bat

start cmd.exe /c "env.bat & python manage.py celery worker -l info"
start cmd.exe /c "env.bat & python manage.py celery beat -l info"
env.bat & python manage.py runserver 8000

Create standard component by run:

python manage.py create_atoms_app docker_image_tags

# Issues

1. Please download and install C++ Compiler for Python 2.7: http://aka.ms/vcpython27 if you meet the error below:

    Running setup.py install for ujson ... error
    ERROR: Command errored out with exit status 1:
     command: 'd:\python27\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'c:\\users\\admini~1\\appdata\\local\\temp\\pip-install-jnby9g\\ujson\\setup.py'"'"'; __file__='"'"'c:\\users\\admini~1\\appdata\\local\\temp\\pip-install-jnby9g\\ujson\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'c:\users\admini~1\appdata\local\temp\pip-record-4zcz9a\install-record.txt' --single-version-externally-managed --compile
         cwd: c:\users\admini~1\appdata\local\temp\pip-install-jnby9g\ujson\
    Complete output (5 lines):
    running install
    running build
    running build_ext
    building 'ujson' extension
    error: Microsoft Visual C++ 9.0 is required. Get it from http://aka.ms/vcpython27
    ----------------------------------------
ERROR: Command errored out with exit status 1: 'd:\python27\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'c:\\users\\admini~1\\appdata\\local\\temp\\pip-install-jnby9g\\ujson\\setup.py'"'"'; __file__='"'"'c:\\users\\admini~1\\appdata\\local\\temp\\pip-install-jnby9g\\ujson\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'c:\users\admini~1\appdata\local\temp\pip-record-4zcz9a\install-record.txt' --single-version-externally-managed --compile Check the logs for full command output.

2. Remove double quote sign from env variable values if you meet the kind of error like below:

  File "D:\Projects\BlueKing\bk-sops\blueapps\conf\log.py", line 47, in get_logging_config_dict
    os.makedirs(log_dir)
  File "D:\Python27\lib\os.py", line 157, in makedirs
    mkdir(name, mode)
WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: 'D:\\Projects\\BlueKing\\logs\\"bk_sops"'

3. Please install python-magic-bin if you meet the issue below:

  File "D:\Projects\BlueKing\bk-sops\gcloud\contrib\appmaker\api.py", line 16, in 
    import magic
  File "D:\Python27\lib\site-packages\magic.py", line 181, in 
    raise ImportError('failed to find libmagic.  Check your installation')
ImportError: failed to find libmagic.  Check your installation

install python-magic-bin

pip install python-magic-bin==0.4.14

4. Run `npm rebuild node-sass` to download node sass binding if you meet error during running: npm run build — –STATIC_ENV=dev

This usually happens because your environment has changed since running `npm install`.
Run `npm rebuild node-sass` to download the binding for your current environment.

Win10打开休眠模式

2019年10月12日2019年10月12日 by ylhyh

升级正式版win10以后，发现竟然没有休眠选项，从电源管理器里面也没有找到，有时候有些重要的工作，希望第二天打开直接在第一天的状态，不用重新打开各种文件，而电脑又没必要开一整夜，于是必需要打开休眠选项。

以管理员权限进入命令行
输入命令：

powercfg /a

查看电脑支持的睡眠模式，是否休眠未打开
如果未打开，然后输入

powercfg /h on

最后再用命令：powercfg /a 查看是否打开了休眠。
如果已经打开了，就可以去电源管理器中去设置了。

在开始菜单上右键，选择电源选项，
  选择“选择电源按钮的功能”，
  单击“更改当前不可用的设置，
  单击“更改当前不可用的设置”，
  选择需要的设置“休眠”，
  最后 “保存修改”
这样打开开始菜单，选择“电源”现在可以休眠了
———————
Refer：https://blog.csdn.net/saindy5828/article/details/72857332

.net core 3.0日志中打印时间 | logging timestamp in .net core

2019年9月30日2019年9月30日 by ylhyh

“遭万人唾弃”的.net core在3.0中终于可以在日志中打印时间戳了, 一个大家都觉得是炒鸡煎蛋的功能在github上挂了3年(https://github.com/aspnet/Logging/issues/483).

目前只发现在Microsoft.Extensions.Logging.Console中有这么一个TimestampFormat, 是不是也就意味着只能在Console Log中打印时间戳 ?

用法1, AddConsole时指定Format:

    public class Startup
    {
        public Startup(IConfiguration configuration)
        {
            Configuration = configuration;
        }

        public IConfiguration Configuration { get; }

        // This method gets called by the runtime. Use this method to add services to the container.
        public void ConfigureServices(IServiceCollection services)
        {
            services.AddControllers();
            services.AddLogging(opt =>
            {
                opt.AddConsole(cfg => { cfg.TimestampFormat = "[yyyy-MM-dd HH:mm:ss]"; });
            });
        }

        //...
    }
}

用法2, appsettings.json中配置Format:

{
  "Logging": {
    "LogLevel": {
      "Default": "Information",
      "Microsoft": "Warning",
      "Microsoft.Hosting.Lifetime": "Information"
    },
    "Console": {
      "TimestampFormat": "[yyyy-MM-dd HH:mm:ss]"
    }
  }
  //...
}

Customize hosts record on docker and kubernetes

2019年9月26日2019年9月26日 by ylhyh

Docker:

docker run -it --rm --add-host=host1:172.17.0.2 --add-host=host2:192.168.1.3 busybox

use “–add-host” to add entries to /etc/hosts

Kubernetes:

apiVersion: v1
kind: Pod
metadata:
  name: hostaliases-pod
spec:
  hostAliases:
  - ip: "127.0.0.1"
    hostnames:
    - "foo.local"
    - "bar.local"
  - ip: "10.1.2.3"
    hostnames:
    - "foo.remote"
    - "bar.remote"
  containers:
  - name: cat-hosts
    image: busybox
    command:
    - cat
    args:
    - "/etc/hosts"

use “spec.hostAliases” to configure hosts entry for pod/deployment

https://kubernetes.io/docs/concepts/services-networking/add-entries-to-pod-etc-hosts-with-host-aliases/

遇到了传说中的container runtime is down PLEG is not healthy

2019年9月25日2019年9月25日 by ylhyh

在一次异常断电后, 开发环境的一个小kubernetes cluster中不幸遭遇了PLEG is not healthy问题, 表现是k8s中的pod状态变成Unknown或ContainerCreating, k8s节点状态变成NotReady:

# kubectl get nodes
NAME             STATUS     ROLES     AGE	VERSION   EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION               CONTAINER-RUNTIME
k8s-dev-master   Ready      master    1y        v1.10.0   <none>        CentOS Linux 7 (Core)   3.10.0-957.21.3.el7.x86_64   docker://17.3.0
k8s-dev-node1    NotReady   node      1y        v1.10.0   <none>        CentOS Linux 7 (Core)   3.10.0-957.21.3.el7.x86_64   docker://Unknown
k8s-dev-node2    NotReady   node      1y        v1.10.0   <none>        CentOS Linux 7 (Core)   3.10.0-957.21.3.el7.x86_64   docker://Unknown
k8s-dev-node3    NotReady   node      289d	v1.10.0   <none>        CentOS Linux 7 (Core)   3.10.0-957.21.3.el7.x86_64   docker://Unknown
k8s-dev-node4    Ready      node      289d	v1.10.0   <none>        CentOS Linux 7 (Core)   3.10.0-957.21.3.el7.x86_64   docker://17.3.0

Kubelet日志中提示: skipping pod synchronization, container runtime is down PLEG is not healthy:

9月 25 11:05:06 k8s-dev-node1 kubelet[546]: I0925 11:05:06.003645     546 kubelet.go:1794] skipping pod synchronization - [container runtime is down PLEG is not healthy: pleg was last seen active 21m18.877402888s ago; threshold is 3m0s]
9月 25 11:05:11 k8s-dev-node1 kubelet[546]: I0925 11:05:11.004116     546 kubelet.go:1794] skipping pod synchronization - [container runtime is down PLEG is not healthy: pleg was last seen active 21m23.877803484s ago; threshold is 3m0s]
9月 25 11:05:16 k8s-dev-node1 kubelet[546]: I0925 11:05:16.004382     546 kubelet.go:1794] skipping pod synchronization - [container runtime is down PLEG is not healthy: pleg was last seen active 21m28.878169681s ago; threshold is 3m0s]

重启节点docker和kubelet后恢复，过不了多久又出错变成NotReady, google了一把，在stackoverflow和github/kubernetes上有相关的issue:

但#45419在v1.16中才被fix, 从1.10升级到1.16太繁琐, 看到 #61117中的一个评论说通过请求节点上的/var/lib/kubelet/pods目录可以解决, 第一次试了下由于mount卷的占用问题没有删除掉该目录, 问题没有解决, 后面索性级升级了docker, 从17.3.0升级到了19.3.2, 并请除了每个节点中/var/lib/kubelet/pods/, /var/lib/docker两个目录下的所有数据后，问题解决了。

大致过程:

# 先禁用docker和kubelet自动启动, 重启后清除文件:
systemctl disable docker && systemctl disable kubelet
reboot
rm -rf /var/lib/kubelet/pods/
rm -rf /var/lib/docker

# 中间顺便把docker-ce从17.3.0升级到了19.3.2

# 升级完docker后修改docker.service还指定17.3.0中默认的storage-driver为overlay, 中间试过overlay2, devicemapper, vfs, kubelet中都有报错, 不知是kubernetes v1.10的支持问题还是数据没有清除干净
vi /etc/systemd/system/docker.service

ExecStart=/usr/bin/dockerd ... --storage-driver=overlay

# 重新加载配置后启动docker
systemctl daemon-reload
systemctl start docker && systemctl enable docker
systemctl status docker

# 由于/var/lib/docker目录被整体删除, 如果节点不能直接访问k8s镜像库，需要手动导入节点需要的基础镜像:
docker load -i kubernetes-v10.0-node.tar

# 启动Kubelet
systemctl start kubelet && systemctl enable kubelet
systemctl status kubelet

问题解决：

# kubectl get nodes -o wide
NAME             STATUS    ROLES     AGE       VERSION   EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION               CONTAINER-RUNTIME
k8s-dev-master   Ready     master    1y        v1.10.0   <none>        CentOS Linux 7 (Core)   3.10.0-957.21.3.el7.x86_64   docker://17.3.0
k8s-dev-node1    Ready     node      1y        v1.10.0   <none>        CentOS Linux 7 (Core)   3.10.0-957.21.3.el7.x86_64   docker://19.3.2
k8s-dev-node2    Ready     node      1y        v1.10.0   <none>        CentOS Linux 7 (Core)   3.10.0-957.21.3.el7.x86_64   docker://19.3.2
k8s-dev-node3    Ready     node      289d      v1.10.0   <none>        CentOS Linux 7 (Core)   3.10.0-957.21.3.el7.x86_64   docker://19.3.2
k8s-dev-node4    Ready     node      289d      v1.10.0   <none>        CentOS Linux 7 (Core)   3.10.0-957.21.3.el7.x86_64   docker://19.3.2

本次断电不幸造成了kong网关上3个月的配置数据丢失:(, 备份! 备份! 备份!

Kubernetes CronJob failed to schedule: Cannot determine if job needs to be started: Too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew

2019年9月9日 by ylhyh

Kubernetes v1.13.3, schedule了一个cronjob, 每5分钟运行一次, 但发现已经有3天没有新的pod被创建了:

# kubectl get cronjob/dingtalk-atndsyncer
NAME                  SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
dingtalk-atndsyncer   */5 * * * *   False     0        3d1h            4d21h

cronjob中的.spec.concurrencyPolicy为Forbid, 不允许多任务并行, describe该cronjob提示：FailedNeedsStart, 具体message是”Cannot determine if job needs to be started: Too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew.”

# kubectl describe cronjob/dingtalk-atndsyncer
Name:                       dingtalk-atndsyncer
Namespace:                  default
Labels:                     app=dingtalk-atndsyncer
Annotations:                <none>
Schedule:                   */5 * * * *
Concurrency Policy:         Forbid
Suspend:                    False
Starting Deadline Seconds:  <unset>
Selector:                   <unset>
Parallelism:                <unset>
Completions:                <unset>
Pod Template:
  Labels:  <none>
  Containers:
   dingtalk-atndsyncer:
    Image:      dingtalk-atndsyncer:v1.0
    Port:       <none>
    Host Port:  <none>
    Environment:
      ASPNETCORE_ENVIRONMENT:               Production
    Mounts:                                 <none>
  Volumes:                                  <none>
Last Schedule Time:                         Fri, 06 Sep 2019 08:15:00 +0800
Active Jobs:                                <none>
Events:
  Type     Reason            Age                   From                Message
  ----     ------            ----                  ----                -------
  Warning  FailedNeedsStart  43m (x790 over 178m)  cronjob-controller  Cannot determine if job needs to be started: Too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew.
  Warning  FailedNeedsStart  25m (x89 over 40m)    cronjob-controller  Cannot determine if job needs to be started: Too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew.
  Warning  FailedNeedsStart  119s (x117 over 22m)  cronjob-controller  Cannot determine if job needs to be started: Too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew.

Google后仔细阅读官方文档(https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#starting-deadline), 说是如果没有配置.spec.startingDeadlineSeconds, 则从最后一次的schedule时间统计错过的schedule次数，如果超过100次就不再schedule
尝试把.spec.startingDeadlineSeconds配置为300秒, 意味着如果5分钟内错过schedule超过100次，才不会schedule (因为schedule周期是5分钟, 所以是一个不太可能达到的条件), 配置后任务schedule正常了

Access control in RabbitMQ

2019年8月26日 by ylhyh

AMQP 0-9-1 Operation		configure	write	read
exchange.declare	(passive=false)	exchange
exchange.declare	(passive=true)
exchange.declare	(with [AE](ae.html))	exchange	exchange (AE)	exchange
exchange.delete		exchange
queue.declare	(passive=false)	queue
queue.declare	(passive=true)
queue.declare	(with [DLX](dlx.html))	queue	exchange (DLX)	queue
queue.delete		queue
exchange.bind			exchange (destination)	exchange (source)
exchange.unbind			exchange (destination)	exchange (source)
queue.bind			queue	exchange
queue.unbind			queue	exchange
basic.publish			exchange
basic.get				queue
basic.consume				queue
queue.purge				queue

Refer:

https://www.rabbitmq.com/access-control.html

https://www.rabbitmq.com/dlx.html

https://www.rabbitmq.com/ae.html

Kong 1.2.0 – WebSocket connection to ‘ws://xxx’ failed: Error during WebSocket handshake: ‘Upgrade’ header is missing

2019年8月7日 by ylhyh

升级Kong 1.1.1到1.2.0后，原来工作正常的WebSocket连接不上了，前端提示：’Upgrade’ header is missing
Server端提示：

System.Net.WebSockets.WebSocketException (2): The remote party closed the WebSocket connection without completing the close handshake.

Google了下应该是Kong 1.2.0的bug:

Websocket Upgrade header missing after upgrade to 1.2.0

kong 1.2.1 websocket fail

fix(proxy) do not clear upgrade header (case-insensitive), fix #4779 #4780

但Milestone是1.2.2, 还没有发布. 紧急有效的修复方法是从github中拿到修复后的handler.lua文件, 创建configmap:

apiVersion: v1
data:
  handler.lua: |-
    -- Kong runloop
    --
    -- This consists of local_events that need to
    -- be ran at the very beginning and very end of the lua-nginx-module contexts.
    -- It mainly carries information related to a request from one context to the next one,
    -- through the `ngx.ctx` table.
    --
    -- In the `access_by_lua` phase, it is responsible for retrieving the route being proxied by
    -- a consumer. Then it is responsible for loading the plugins to execute on this request.
    local ck           = require "resty.cookie"
    local meta         = require "kong.meta"
    ...
kind: ConfigMap
metadata:
  name: kong-1.2.0-0-runloop-handler.lua
  namespace: kong

然后在kong daemonset/deployment中把configmap挂载到路径/usr/local/share/lua/5.1/kong/runloop/handler.lua:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  labels:
    app: kong
    name: kong
  name: kong
  namespace: kong
spec:
  selector:
    matchLabels:
      name: kong
  template:
    metadata:
      labels:
        app: kong
        name: kong
    spec:
      containers:
      - env:
        - name: KONG_PLUGINS
          value: bundled,q1-api-auth,q1-user-auth,q1-user-check-permission
        - name: KONG_PROXY_ACCESS_LOG
          value: /dev/stdout
        - name: KONG_PROXY_ERROR_LOG
          value: /dev/stderr
        - name: KONG_ADMIN_LISTEN
          value: 0.0.0.0:8001, 0.0.0.0:8444 ssl
        - name: KONG_PROXY_LISTEN
          value: 0.0.0.0:8000, 0.0.0.0:8443 ssl
        - name: KONG_STREAM_LISTEN
          value: 0.0.0.0:9000 transparent
        - name: KONG_DATABASE
          value: postgres
        - name: KONG_PG_DATABASE
          value: kong
        - name: KONG_PG_USER
          value: kong
        - name: KONG_PG_PASSWORD
          value: PASSWORD
        - name: KONG_PG_HOST
          value: 192.168.130.246
        - name: KONG_PG_PORT
          value: "5432"
        image: kong:1.2.0-centos
        imagePullPolicy: IfNotPresent
        name: kong
        ports:
        - containerPort: 8000
          name: kong-proxy
          protocol: TCP
        - containerPort: 8443
          name: kong-proxy-ssl
          protocol: TCP
        - containerPort: 8001
          name: kong-admin
          protocol: TCP
        - containerPort: 8444
          name: kong-admin-ssl
          protocol: TCP
        - containerPort: 9000
          name: kong-stream
          protocol: TCP
        volumeMounts:
        - mountPath: /usr/local/share/lua/5.1/kong/utils
          name: kong-utils
        - mountPath: /usr/local/share/lua/5.1/kong/plugins/q1-api-auth
          name: q1-api-auth
        - mountPath: /usr/local/share/lua/5.1/kong/plugins/q1-user-auth
          name: q1-user-auth
        - mountPath: /usr/local/share/lua/5.1/kong/plugins/q1-user-check-permission
          name: q1-user-check-permission
        - mountPath: /usr/local/lib/luarocks/rocks/kong/1.2.0-0/kong-1.2.0-0.rockspec
          name: kong-rockspec
          subPath: kong-1.2.0-0.rockspec
        - mountPath: /usr/local/share/lua/5.1/kong/runloop/handler.lua
          name: runloop-handler-lua
          subPath: handler.lua
      dnsPolicy: ClusterFirst
      imagePullSecrets:
      - name: docker-secret
      nodeSelector:
        beta.kubernetes.io/os: linux
        kubernetes.io/role: node
      restartPolicy: Always
      volumes:
      - configMap:
          name: kong-utils
        name: kong-utils
      - configMap:
          name: q1-api-auth
        name: q1-api-auth
      - configMap:
          name: q1-user-auth
        name: q1-user-auth
      - configMap:
          name: q1-user-check-permission
        name: q1-user-check-permission
      - configMap:
          name: kong-1.2.0-0.rockspec
        name: kong-rockspec
      - configMap:
          name: kong-1.2.0-0-runloop-handler.lua
        name: runloop-handler-lua

问题解决

NTP synchronized cannot set to yes

2019年7月30日 by ylhyh

CentOS 7.5, 配置过ntpd与某个时间服务器同步后，偶然发现某一个节点的NTP synchronized一直是no:

# timedatectl
      Local time: 二 2019-07-30 09:41:08 CST
  Universal time: 二 2019-07-30 01:41:08 UTC
        RTC time: 二 2019-07-30 01:08:13
       Time zone: Asia/Shanghai (CST, +0800)
     NTP enabled: yes
NTP synchronized: no
 RTC in local TZ: no
      DST active: n/a

停掉ntpd, 执行ntpd -gq重新调整时间后，再启动ntpd:

# systemctl stop ntpd
# ntpd -gq
ntpd: time slew +0.000041s
# systemctl start ntpd

等待一会儿后,NTP synchronized恢复成yes:

# timedatectl
      Local time: 二 2019-07-30 09:44:28 CST
  Universal time: 二 2019-07-30 01:44:28 UTC
        RTC time: 二 2019-07-30 01:44:28
       Time zone: Asia/Shanghai (CST, +0800)
     NTP enabled: yes
NTP synchronized: yes
 RTC in local TZ: no
      DST active: n/a

Ref: https://askubuntu.com/questions/929805/timedatectl-ntp-sync-cannot-set-to-yes