Created by: duyet
What is this Python project?
Apache Airflow: Use airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command line utilities make performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress, and troubleshoot issues when needed.
What's the difference between this Python project and similar ones?
Airflow vs. Luigi:
Airflow
- Easy-to-use UI (+)
- Built in scheduler (+)
- Easy testing of DAGs (+)
- Separates output data and task state (+)
- Strong and active community (+) Luigi
- Creating and testing tasks is difficult (-)
- The UI is challenging to navigate (-)
- Not scalable due to tight coupling with cron jobs; the number of worker processes is bounded by number of cron workers assigned to a job (-)
- Re-running pipelines is not possible
Airflow vs. Oozie
Airflow
- Python Code for DAGs (+)
- Has connectors for every major service/cloud provider (+)
- More versatile (+)
- Advanced metrics (+)
- Better UI and API (+)
- Capable of creating extremely complex workflows (+)
- Jinja Templating (+)
- Can be parallelized (=)
- Native Connections to HDFS, HIVE, PIG etc.. (=)
- Graph as DAG (=)
Oozie
- Java or XML for DAGs (---)
- Hard to build complex pipelines (-)
- Smaller, less active community (-)
- Worse WEB GUI (-)
- Java API (-)
- Can be parallelized (=)
- Native Connections to HDFS, HIVE, PIG etc.. (=)
- Graph as DAG (=)
--
Anyone who agrees with this pull request could vote for it by adding a