上一篇文章介绍了怎样在CentOS7上快速安装airflow: /2019/10/29/setup-apache-airflow-on-centos-7
一、使用systemd管理airflow服务
1、为airflow创建user和group:
# useradd -U airflow
2、创建pid和log目录:
# mkdir -p /run/airflow # chown airflow:airflow /run/airflow # chmod 755 /run/airflow # mkdir -p /var/log/airflow # chown airflow:airflow /var/log/airflow # chmod 755 /var/log/airflow
3、生成环境变量文件:
# cat <<EOF > /etc/sysconfig/airflow AIRFLOW_CONFIG=/etc/airflow/airflow.cfg AIRFLOW_HOME=/etc/airflow EOF
4、把之前安装在~/airflow目录下的airflow移动到/etc:
# mv ~/airflow /etc/
5、修改/etc/airflow/airflow.cfg
a. 修改dags_folder, plugins_folder:
dags_folder = $AIRFLOW_HOME/dags plugins_folder = $AIRFLOW_HOME/plugins
b. 修改各个log目录的路径:
base_log_folder = /var/log/airflow dag_processor_manager_log_location = /var/log/airflow/dag_processor_manager/dag_processor_manager.log child_process_log_directory = /var/log/airflow/scheduler
6、创建各个服务的systemd文件, 从github airflow代码库(https://github.com/apache/airflow/tree/master/scripts/systemd)找systemd文件模板, 创建各个服务的systemd文件, 注意修改各个文件的路径:
a. airflow webserver:
# cat <<EOF > /usr/lib/systemd/system/airflow-webserver.service [Unit] Description=Airflow webserver daemon After=network.target Wants= [Service] EnvironmentFile=/etc/sysconfig/airflow User=airflow Group=airflow Type=simple ExecStart=/usr/local/bin/airflow webserver --pid /run/airflow/webserver.pid Restart=on-failure RestartSec=5s PrivateTmp=true [Install] WantedBy=multi-user.target EOF
b. airflow scheduler:
cat <<EOF > /usr/lib/systemd/system/airflow-scheduler.service [Unit] Description=Airflow scheduler daemon After=network.target Wants= [Service] EnvironmentFile=/etc/sysconfig/airflow User=airflow Group=airflow Type=simple ExecStart=/usr/local/bin/airflow scheduler Restart=always RestartSec=5s [Install] WantedBy=multi-user.target EOF
c. 其他…
二、使用MySql数据库
1、使用charset “utf8mb4″和collation “utf8mb4_general_ci”为airflow创建MySql数据库
2、安装MySql for Python的驱动pymysql
# pip3 install pymysql
3、修改/etc/airflow/airflow.cfg
a. 修改sql_alchemy_conn, 把默认的sqlite数据库修改为MySql:
sql_alchemy_conn = mysql+pymysql://{username}:{password}@{hostname}:3306/airflow
格式:{数据库类型}+{驱动}://{用户名}:{密码}@{MySql服务器地址}:{端口}/{数据库名}, 更多信息参见SqlAlchemy文档: https://docs.sqlalchemy.org/
b. 修改executor为LocalExecutor:
executor = LocalExecutor
c. 初始化MySql数据库:
# airflow initdb
三、启动webserver, scheduler等服务
# systemctl enable airflow-webserver && systemctl start airflow-webserver # systemctl enable airflow-scheduler && systemctl start airflow-scheduler
四、其他
检查/var/log/messages查看各服务的状态,发现scheduler有奇怪错误:
Oct 31 05:56:35 build-node airflow: Traceback (most recent call last): Oct 31 05:56:35 build-node airflow: File "/usr/lib64/python3.6/multiprocessing/process.py", line 258, in _bootstrap Oct 31 05:56:35 build-node airflow: self.run() Oct 31 05:56:35 build-node airflow: File "/usr/lib64/python3.6/multiprocessing/process.py", line 93, in run Oct 31 05:56:35 build-node airflow: self._target(*self._args, **self._kwargs) Oct 31 05:56:35 build-node airflow: File "/usr/local/lib/python3.6/site-packages/airflow/jobs/scheduler_job.py", line 128, in _run_file_processor Oct 31 05:56:35 build-node airflow: set_context(log, file_path) Oct 31 05:56:35 build-node airflow: File "/usr/local/lib/python3.6/site-packages/airflow/utils/log/logging_mixin.py", line 170, in set_context Oct 31 05:56:35 build-node airflow: handler.set_context(value) Oct 31 05:56:35 build-node airflow: File "/usr/local/lib/python3.6/site-packages/airflow/utils/log/file_processor_handler.py", line 65, in set_context Oct 31 05:56:35 build-node airflow: local_loc = self._init_file(filename) Oct 31 05:56:35 build-node airflow: File "/usr/local/lib/python3.6/site-packages/airflow/utils/log/file_processor_handler.py", line 141, in _init_file Oct 31 05:56:35 build-node airflow: os.makedirs(directory) Oct 31 05:56:35 build-node airflow: File "/usr/lib64/python3.6/os.py", line 210, in makedirs Oct 31 05:56:35 build-node airflow: makedirs(head, mode, exist_ok) Oct 31 05:56:35 build-node airflow: File "/usr/lib64/python3.6/os.py", line 210, in makedirs Oct 31 05:56:35 build-node airflow: makedirs(head, mode, exist_ok) Oct 31 05:56:35 build-node airflow: File "/usr/lib64/python3.6/os.py", line 210, in makedirs Oct 31 05:56:35 build-node airflow: makedirs(head, mode, exist_ok) Oct 31 05:56:35 build-node airflow: [Previous line repeated 3 more times] Oct 31 05:56:35 build-node airflow: File "/usr/lib64/python3.6/os.py", line 220, in makedirs Oct 31 05:56:35 build-node airflow: mkdir(name, mode) Oct 31 05:56:35 build-node airflow: PermissionError: [Errno 13] Permission denied: '/var/log/airflow/scheduler/2019-10-31/../../../usr'
airflow scheduler尝试在/var/log/目录下创建目录, 用户airflow没有权限, 所以出现PermissionError, 如果在/var/log/目录下创建usr目录并把owner分配给airflow, 在目录 “/var/log/airflow/scheduler/2019-10-31/../../../usr/local/lib/python3.6/site-packages/airflow/example_dags/”会产生很多log文件:
# ls -la /var/log/airflow/scheduler/2019-10-31/../../../usr/local/lib/python3.6/site-packages/airflow/example_dags/ total 2212 drwxr-xr-x. 3 airflow airflow 4096 Oct 31 06:04 . drwxr-xr-x. 3 airflow airflow 26 Oct 31 06:04 .. -rw-r--r--. 1 airflow airflow 90610 Oct 31 06:18 docker_copy_data.py.log -rw-r--r--. 1 airflow airflow 93636 Oct 31 06:18 example_bash_operator.py.log -rw-r--r--. 1 airflow airflow 95777 Oct 31 06:18 example_branch_operator.py.log -rw-r--r--. 1 airflow airflow 50840 Oct 31 06:18 example_branch_python_dop_operator_3.py.log -rw-r--r--. 1 airflow airflow 93480 Oct 31 06:18 example_docker_operator.py.log -rw-r--r--. 1 airflow airflow 94792 Oct 31 06:18 example_http_operator.py.log -rw-r--r--. 1 airflow airflow 93152 Oct 31 06:18 example_latest_only.py.log -rw-r--r--. 1 airflow airflow 98334 Oct 31 06:18 example_latest_only_with_trigger.py.log -rw-r--r--. 1 airflow airflow 103648 Oct 31 06:18 example_passing_params_via_test_command.py.log -rw-r--r--. 1 airflow airflow 93150 Oct 31 06:18 example_pig_operator.py.log -rw-r--r--. 1 airflow airflow 67744 Oct 31 06:18 example_python_operator.py.log -rw-r--r--. 1 airflow airflow 49610 Oct 31 06:18 example_short_circuit_operator.py.log -rw-r--r--. 1 airflow airflow 92332 Oct 31 06:18 example_skip_dag.py.log -rw-r--r--. 1 airflow airflow 101844 Oct 31 06:18 example_subdag_operator.py.log -rw-r--r--. 1 airflow airflow 99220 Oct 31 06:18 example_trigger_controller_dag.py.log -rw-r--r--. 1 airflow airflow 97252 Oct 31 06:18 example_trigger_target_dag.py.log -rw-r--r--. 1 airflow airflow 90364 Oct 31 06:18 example_xcom.py.log drwxr-xr-x. 2 airflow airflow 27 Oct 31 06:04 subdags -rw-r--r--. 1 airflow airflow 55590 Oct 31 06:18 test_utils.py.log -rw-r--r--. 1 airflow airflow 86240 Oct 31 06:18 tutorial.py.log
绝对路径是: “/var/log/usr/local/lib/python3.6/site-packages/airflow/example_dags/”, 搞不懂airflow scheduler为什么没有使用airflow.cfg中的log目录配置,而是使用一个相对路径. 应该是scheduler的bug, 有人已经在Airflow JIRA中提了issue, 我也把遇到的问题写在了Comment里, 见: https://issues.apache.org/jira/browse/AIRFLOW-4719