Install Python 3.7.5 on CentOS 7 by source tarball

目前CentOS 7的 yum repo中只有Python 3.6.8, 项目中要使用3.7.5, 只能从源码安装

1) 安装依赖组件

# yum install gcc openssl-devel bzip2-devel libffi-devel

2) 下载Python 3.7.5源码包, 解压
From https://www.python.org/downloads/release/python-375/

# cd /usr/src
# curl https://www.python.org/ftp/python/3.7.5/Python-3.7.5.tgz -O
# tar zxvf Python-3.7.5.tgz

3) 配置安装

# cd /usr/src/Python-3.7.5
# ./configure --enable-optimizations
# make altinstall

4) 创建python软链接

安装后的Python 3.7 执行文件位于: /usr/local/bin/python3.7

# ln -s /usr/local/bin/python3.7 /usr/bin/python3
# ln -s /usr/bin/python3 /usr/bin/python

# python -V
Python 3.7.5

5) 清理

# rm -f /usr/src/Python-3.7.5.tgz

[Airflow] Change default sqlite to mysql database and manage services with systemd

上一篇文章介绍了怎样在CentOS7上快速安装airflow: /2019/10/29/setup-apache-airflow-on-centos-7

一、使用systemd管理airflow服务

1、为airflow创建user和group:

# useradd -U airflow

2、创建pid和log目录:

# mkdir -p /run/airflow
# chown airflow:airflow /run/airflow
# chmod 755 /run/airflow

# mkdir -p /var/log/airflow
# chown airflow:airflow /var/log/airflow
# chmod 755 /var/log/airflow

3、生成环境变量文件:

# cat <<EOF > /etc/sysconfig/airflow
AIRFLOW_CONFIG=/etc/airflow/airflow.cfg
AIRFLOW_HOME=/etc/airflow
EOF

4、把之前安装在~/airflow目录下的airflow移动到/etc:

# mv ~/airflow /etc/

5、修改/etc/airflow/airflow.cfg

a. 修改dags_folder, plugins_folder:

dags_folder = $AIRFLOW_HOME/dags
plugins_folder = $AIRFLOW_HOME/plugins

b. 修改各个log目录的路径:

base_log_folder = /var/log/airflow
dag_processor_manager_log_location = /var/log/airflow/dag_processor_manager/dag_processor_manager.log
child_process_log_directory = /var/log/airflow/scheduler

6、创建各个服务的systemd文件, 从github airflow代码库(https://github.com/apache/airflow/tree/master/scripts/systemd)找systemd文件模板, 创建各个服务的systemd文件, 注意修改各个文件的路径:

a. airflow webserver:

# cat <<EOF > /usr/lib/systemd/system/airflow-webserver.service
[Unit]
Description=Airflow webserver daemon
After=network.target
Wants=

[Service]
EnvironmentFile=/etc/sysconfig/airflow
User=airflow
Group=airflow
Type=simple
ExecStart=/usr/local/bin/airflow webserver --pid /run/airflow/webserver.pid
Restart=on-failure
RestartSec=5s
PrivateTmp=true

[Install]
WantedBy=multi-user.target
EOF

b. airflow scheduler:

cat <<EOF > /usr/lib/systemd/system/airflow-scheduler.service
[Unit]
Description=Airflow scheduler daemon
After=network.target
Wants=

[Service]
EnvironmentFile=/etc/sysconfig/airflow
User=airflow
Group=airflow
Type=simple
ExecStart=/usr/local/bin/airflow scheduler
Restart=always
RestartSec=5s

[Install]
WantedBy=multi-user.target
EOF

c. 其他…

二、使用MySql数据库

1、使用charset “utf8mb4″和collation “utf8mb4_general_ci”为airflow创建MySql数据库

2、安装MySql for Python的驱动pymysql

# pip3 install pymysql

3、修改/etc/airflow/airflow.cfg

a. 修改sql_alchemy_conn, 把默认的sqlite数据库修改为MySql:

sql_alchemy_conn = mysql+pymysql://{username}:{password}@{hostname}:3306/airflow

格式:{数据库类型}+{驱动}://{用户名}:{密码}@{MySql服务器地址}:{端口}/{数据库名}, 更多信息参见SqlAlchemy文档: https://docs.sqlalchemy.org/

b. 修改executor为LocalExecutor:

executor = LocalExecutor

c. 初始化MySql数据库:

# airflow initdb

三、启动webserver, scheduler等服务

# systemctl enable airflow-webserver && systemctl start airflow-webserver
# systemctl enable airflow-scheduler && systemctl start airflow-scheduler

四、其他

检查/var/log/messages查看各服务的状态,发现scheduler有奇怪错误:

Oct 31 05:56:35 build-node airflow: Traceback (most recent call last):
Oct 31 05:56:35 build-node airflow: File "/usr/lib64/python3.6/multiprocessing/process.py", line 258, in _bootstrap
Oct 31 05:56:35 build-node airflow: self.run()
Oct 31 05:56:35 build-node airflow: File "/usr/lib64/python3.6/multiprocessing/process.py", line 93, in run
Oct 31 05:56:35 build-node airflow: self._target(*self._args, **self._kwargs)
Oct 31 05:56:35 build-node airflow: File "/usr/local/lib/python3.6/site-packages/airflow/jobs/scheduler_job.py", line 128, in _run_file_processor
Oct 31 05:56:35 build-node airflow: set_context(log, file_path)
Oct 31 05:56:35 build-node airflow: File "/usr/local/lib/python3.6/site-packages/airflow/utils/log/logging_mixin.py", line 170, in set_context
Oct 31 05:56:35 build-node airflow: handler.set_context(value)
Oct 31 05:56:35 build-node airflow: File "/usr/local/lib/python3.6/site-packages/airflow/utils/log/file_processor_handler.py", line 65, in set_context
Oct 31 05:56:35 build-node airflow: local_loc = self._init_file(filename)
Oct 31 05:56:35 build-node airflow: File "/usr/local/lib/python3.6/site-packages/airflow/utils/log/file_processor_handler.py", line 141, in _init_file
Oct 31 05:56:35 build-node airflow: os.makedirs(directory)
Oct 31 05:56:35 build-node airflow: File "/usr/lib64/python3.6/os.py", line 210, in makedirs
Oct 31 05:56:35 build-node airflow: makedirs(head, mode, exist_ok)
Oct 31 05:56:35 build-node airflow: File "/usr/lib64/python3.6/os.py", line 210, in makedirs
Oct 31 05:56:35 build-node airflow: makedirs(head, mode, exist_ok)
Oct 31 05:56:35 build-node airflow: File "/usr/lib64/python3.6/os.py", line 210, in makedirs
Oct 31 05:56:35 build-node airflow: makedirs(head, mode, exist_ok)
Oct 31 05:56:35 build-node airflow: [Previous line repeated 3 more times]
Oct 31 05:56:35 build-node airflow: File "/usr/lib64/python3.6/os.py", line 220, in makedirs
Oct 31 05:56:35 build-node airflow: mkdir(name, mode)
Oct 31 05:56:35 build-node airflow: PermissionError: [Errno 13] Permission denied: '/var/log/airflow/scheduler/2019-10-31/../../../usr'

airflow scheduler尝试在/var/log/目录下创建目录, 用户airflow没有权限, 所以出现PermissionError, 如果在/var/log/目录下创建usr目录并把owner分配给airflow, 在目录 “/var/log/airflow/scheduler/2019-10-31/../../../usr/local/lib/python3.6/site-packages/airflow/example_dags/”会产生很多log文件:

# ls -la /var/log/airflow/scheduler/2019-10-31/../../../usr/local/lib/python3.6/site-packages/airflow/example_dags/
total 2212
drwxr-xr-x. 3 airflow airflow   4096 Oct 31 06:04 .
drwxr-xr-x. 3 airflow airflow     26 Oct 31 06:04 ..
-rw-r--r--. 1 airflow airflow  90610 Oct 31 06:18 docker_copy_data.py.log
-rw-r--r--. 1 airflow airflow  93636 Oct 31 06:18 example_bash_operator.py.log
-rw-r--r--. 1 airflow airflow  95777 Oct 31 06:18 example_branch_operator.py.log
-rw-r--r--. 1 airflow airflow  50840 Oct 31 06:18 example_branch_python_dop_operator_3.py.log
-rw-r--r--. 1 airflow airflow  93480 Oct 31 06:18 example_docker_operator.py.log
-rw-r--r--. 1 airflow airflow  94792 Oct 31 06:18 example_http_operator.py.log
-rw-r--r--. 1 airflow airflow  93152 Oct 31 06:18 example_latest_only.py.log
-rw-r--r--. 1 airflow airflow  98334 Oct 31 06:18 example_latest_only_with_trigger.py.log
-rw-r--r--. 1 airflow airflow 103648 Oct 31 06:18 example_passing_params_via_test_command.py.log
-rw-r--r--. 1 airflow airflow  93150 Oct 31 06:18 example_pig_operator.py.log
-rw-r--r--. 1 airflow airflow  67744 Oct 31 06:18 example_python_operator.py.log
-rw-r--r--. 1 airflow airflow  49610 Oct 31 06:18 example_short_circuit_operator.py.log
-rw-r--r--. 1 airflow airflow  92332 Oct 31 06:18 example_skip_dag.py.log
-rw-r--r--. 1 airflow airflow 101844 Oct 31 06:18 example_subdag_operator.py.log
-rw-r--r--. 1 airflow airflow  99220 Oct 31 06:18 example_trigger_controller_dag.py.log
-rw-r--r--. 1 airflow airflow  97252 Oct 31 06:18 example_trigger_target_dag.py.log
-rw-r--r--. 1 airflow airflow  90364 Oct 31 06:18 example_xcom.py.log
drwxr-xr-x. 2 airflow airflow     27 Oct 31 06:04 subdags
-rw-r--r--. 1 airflow airflow  55590 Oct 31 06:18 test_utils.py.log
-rw-r--r--. 1 airflow airflow  86240 Oct 31 06:18 tutorial.py.log

绝对路径是: “/var/log/usr/local/lib/python3.6/site-packages/airflow/example_dags/”, 搞不懂airflow scheduler为什么没有使用airflow.cfg中的log目录配置,而是使用一个相对路径. 应该是scheduler的bug, 有人已经在Airflow JIRA中提了issue, 我也把遇到的问题写在了Comment里, 见: https://issues.apache.org/jira/browse/AIRFLOW-4719

[Kubernetes] Create deployment, service by Python client

Install Kubernetes Python Client and PyYaml:

# pip install kubernetes pyyaml

1. Get Namespaces or Pods by CoreV1Api:

# -*- coding: utf-8 -*-
from kubernetes import client, config, utils

config.kube_config.load_kube_config(config_file="../kubecfg.yaml")
coreV1Api = client.CoreV1Api()

print("\nListing all namespaces")
for ns in coreV1Api.list_namespace().items:
    print(ns.metadata.name)

print("\nListing pods with their IP, namespace, names:")
for pod in coreV1Api.list_pod_for_all_namespaces(watch=False).items:
    print("%s\t\t%s\t%s" % (pod.status.pod_ip, pod.metadata.namespace, pod.metadata.name))

2. Create Deployment and Service by AppsV1Api:

# -*- coding: utf-8 -*-
from kubernetes import client, config, utils
import yaml

config.kube_config.load_kube_config(config_file="../kubecfg.yaml")
yamlDeploy = open( r'deploy.yaml')
jsonDeploy = yaml.load(yamlDeploy, Loader = yaml.FullLoader)

yamlService = open(r'service.yaml')
jsonService = yaml.load(yamlService, Loader = yaml.FullLoader)

appsV1Api = client.AppsV1Api()

if jsonDeploy['kind'] == 'Deployment':
    appsV1Api.create_namespaced_deployment(
        namespace="default", body = jsonDeploy
    )

if jsonService['kind'] == 'Service':
    coreV1Api.create_namespaced_service(
        namespace="default",
        body=jsonService
    )

3. Create ANY type of objects from a yaml file by utils.create_from_yaml, you can put multiple resources in one yaml file:

# -*- coding: utf-8 -*-
from kubernetes import client, config, utils

config.kube_config.load_kube_config(config_file="../kubecfg.yaml")

k8sClient = client.ApiClient()
utils.create_from_yaml(k8sClient, "deploy-service.yaml")

Reference:
https://github.com/kubernetes-client/python/blob/6709b753b4ad2e09aa472b6452bbad9f96e264e3/examples/create_deployment_from_yaml.py
https://stackoverflow.com/questions/56673919/kubernetes-python-api-client-execute-full-yaml-file

Install Python 3.6 on CentOS 7

1)安装IUS软件源

#安装EPEL依赖
sudo yum install epel-release

#安装IUS软件源
sudo yum install https://centos7.iuscommunity.org/ius-release.rpm

2)安装Python3.6

sudo yum install python36u

#创建符号链接(可选)
sudo ln -s /bin/python3.6 /bin/python3

3)安装pip3(可选)

sudo yum install python36u-pip

#创建一个到pip3的符号链接(可选)
sudo ln -s /bin/pip3.6 /bin/pip3