Metadata-Version: 2.1
Name: acryl-datahub-airflow-plugin
Version: 0.8.43.2rc0
Summary: Datahub Airflow plugin to capture executions and send to Datahub
Home-page: https://datahubproject.io/
License: Apache License 2.0
Project-URL: Documentation, https://datahubproject.io/docs/
Project-URL: Source, https://github.com/datahub-project/datahub
Project-URL: Changelog, https://github.com/datahub-project/datahub/releases
Classifier: Development Status :: 5 - Production/Stable
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: Unix
Classifier: Operating System :: POSIX :: Linux
Classifier: Environment :: Console
Classifier: Environment :: MacOS X
Classifier: Topic :: Software Development
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: acryl-datahub[airflow] (>=0.8.36)
Requires-Dist: apache-airflow (>=1.10.2)
Requires-Dist: typing-extensions (>=3.10.0.2)
Requires-Dist: pydantic (>=1.5.1)
Requires-Dist: mypy-extensions (>=0.4.3)
Requires-Dist: typing-inspect
Requires-Dist: dataclasses (>=0.6) ; python_version < "3.7"
Provides-Extra: dev
Requires-Dist: types-cachetools ; extra == 'dev'
Requires-Dist: types-toml ; extra == 'dev'
Requires-Dist: types-tabulate ; extra == 'dev'
Requires-Dist: twine ; extra == 'dev'
Requires-Dist: types-python-dateutil ; extra == 'dev'
Requires-Dist: apache-airflow (>=1.10.2) ; extra == 'dev'
Requires-Dist: types-pkg-resources ; extra == 'dev'
Requires-Dist: pytest-cov (>=2.8.1) ; extra == 'dev'
Requires-Dist: freezegun ; extra == 'dev'
Requires-Dist: requests-mock ; extra == 'dev'
Requires-Dist: isort (>=5.7.0) ; extra == 'dev'
Requires-Dist: black (>=21.12b0) ; extra == 'dev'
Requires-Dist: build ; extra == 'dev'
Requires-Dist: types-freezegun ; extra == 'dev'
Requires-Dist: types-pytz ; extra == 'dev'
Requires-Dist: mypy (>=0.920) ; extra == 'dev'
Requires-Dist: mypy-extensions (>=0.4.3) ; extra == 'dev'
Requires-Dist: jsonpickle ; extra == 'dev'
Requires-Dist: types-click (==0.1.12) ; extra == 'dev'
Requires-Dist: types-PyYAML ; extra == 'dev'
Requires-Dist: flake8-tidy-imports (>=4.3.0) ; extra == 'dev'
Requires-Dist: acryl-datahub[airflow] (>=0.8.36) ; extra == 'dev'
Requires-Dist: pydantic (>=1.5.1) ; extra == 'dev'
Requires-Dist: deepdiff ; extra == 'dev'
Requires-Dist: tox ; extra == 'dev'
Requires-Dist: types-requests ; extra == 'dev'
Requires-Dist: pytest-docker (<0.12,>=0.10.3) ; extra == 'dev'
Requires-Dist: pytest (>=6.2.2) ; extra == 'dev'
Requires-Dist: sqlalchemy-stubs ; extra == 'dev'
Requires-Dist: typing-inspect ; extra == 'dev'
Requires-Dist: types-dataclasses ; extra == 'dev'
Requires-Dist: typing-extensions (>=3.10.0.2) ; extra == 'dev'
Requires-Dist: pytest-asyncio (>=0.16.0) ; extra == 'dev'
Requires-Dist: flake8 (>=3.8.3) ; extra == 'dev'
Requires-Dist: types-six ; extra == 'dev'
Requires-Dist: packaging ; extra == 'dev'
Requires-Dist: pydantic (>=1.9.0) ; extra == 'dev'
Requires-Dist: coverage (>=5.1) ; extra == 'dev'
Provides-Extra: dev-airflow1
Requires-Dist: types-cachetools ; extra == 'dev-airflow1'
Requires-Dist: types-toml ; extra == 'dev-airflow1'
Requires-Dist: types-tabulate ; extra == 'dev-airflow1'
Requires-Dist: twine ; extra == 'dev-airflow1'
Requires-Dist: types-python-dateutil ; extra == 'dev-airflow1'
Requires-Dist: apache-airflow (>=1.10.2) ; extra == 'dev-airflow1'
Requires-Dist: types-pkg-resources ; extra == 'dev-airflow1'
Requires-Dist: pytest-cov (>=2.8.1) ; extra == 'dev-airflow1'
Requires-Dist: freezegun ; extra == 'dev-airflow1'
Requires-Dist: apache-airflow (==1.10.15) ; extra == 'dev-airflow1'
Requires-Dist: requests-mock ; extra == 'dev-airflow1'
Requires-Dist: isort (>=5.7.0) ; extra == 'dev-airflow1'
Requires-Dist: black (>=21.12b0) ; extra == 'dev-airflow1'
Requires-Dist: build ; extra == 'dev-airflow1'
Requires-Dist: types-freezegun ; extra == 'dev-airflow1'
Requires-Dist: types-pytz ; extra == 'dev-airflow1'
Requires-Dist: mypy (>=0.920) ; extra == 'dev-airflow1'
Requires-Dist: mypy-extensions (>=0.4.3) ; extra == 'dev-airflow1'
Requires-Dist: jsonpickle ; extra == 'dev-airflow1'
Requires-Dist: types-click (==0.1.12) ; extra == 'dev-airflow1'
Requires-Dist: types-PyYAML ; extra == 'dev-airflow1'
Requires-Dist: flake8-tidy-imports (>=4.3.0) ; extra == 'dev-airflow1'
Requires-Dist: acryl-datahub[airflow] (>=0.8.36) ; extra == 'dev-airflow1'
Requires-Dist: pydantic (>=1.5.1) ; extra == 'dev-airflow1'
Requires-Dist: deepdiff ; extra == 'dev-airflow1'
Requires-Dist: tox ; extra == 'dev-airflow1'
Requires-Dist: types-requests ; extra == 'dev-airflow1'
Requires-Dist: pytest-docker (<0.12,>=0.10.3) ; extra == 'dev-airflow1'
Requires-Dist: pytest (>=6.2.2) ; extra == 'dev-airflow1'
Requires-Dist: sqlalchemy-stubs ; extra == 'dev-airflow1'
Requires-Dist: apache-airflow-backport-providers-snowflake ; extra == 'dev-airflow1'
Requires-Dist: typing-inspect ; extra == 'dev-airflow1'
Requires-Dist: types-dataclasses ; extra == 'dev-airflow1'
Requires-Dist: typing-extensions (>=3.10.0.2) ; extra == 'dev-airflow1'
Requires-Dist: pytest-asyncio (>=0.16.0) ; extra == 'dev-airflow1'
Requires-Dist: flake8 (>=3.8.3) ; extra == 'dev-airflow1'
Requires-Dist: types-six ; extra == 'dev-airflow1'
Requires-Dist: packaging ; extra == 'dev-airflow1'
Requires-Dist: pydantic (>=1.9.0) ; extra == 'dev-airflow1'
Requires-Dist: coverage (>=5.1) ; extra == 'dev-airflow1'
Provides-Extra: dev-airflow1-base
Requires-Dist: apache-airflow (==1.10.15) ; extra == 'dev-airflow1-base'
Requires-Dist: apache-airflow-backport-providers-snowflake ; extra == 'dev-airflow1-base'
Requires-Dist: dataclasses (>=0.6) ; (python_version < "3.7") and extra == 'dev-airflow1'
Requires-Dist: dataclasses (>=0.6) ; (python_version < "3.7") and extra == 'dev'

# Datahub Airflow Plugin

## Capabilities

DataHub supports integration of

- Airflow Pipeline (DAG) metadata
- DAG and Task run information
- Lineage information when present

## Installation

1. You need to install the required dependency in your airflow.

  ```shell
    pip install acryl-datahub-airflow-plugin
  ```

::: note

We recommend you use the lineage plugin if you are on Airflow version >= 2.0.2 or on MWAA with an Airflow version >= 2.0.2
:::

2. Disable lazy plugin load in your airflow.cfg

  ```yaml
  core.lazy_load_plugins : False
  ```

3. You must configure an Airflow hook for Datahub. We support both a Datahub REST hook and a Kafka-based hook, but you only need one.

   ```shell
   # For REST-based:
   airflow connections add  --conn-type 'datahub_rest' 'datahub_rest_default' --conn-host 'http://localhost:8080'
   # For Kafka-based (standard Kafka sink config can be passed via extras):
   airflow connections add  --conn-type 'datahub_kafka' 'datahub_kafka_default' --conn-host 'broker:9092' --conn-extra '{}'
   ```

4. Add your `datahub_conn_id` and/or `cluster` to your `airflow.cfg` file if it is not align with the default values. See configuration parameters below

    **Configuration options:**

    |Name   | Default value   | Description   |
    |---|---|---|
    | datahub.datahub_conn_id | datahub_rest_deafault  | The name of the datahub connection you set in step 1.  |
    | datahub.cluster |  prod | name of the airflow cluster  |
    | datahub.capture_ownership_info | true  |  If true, the owners field of the DAG will be capture as a DataHub corpuser.   |
    | datahub.capture_tags_info  | true   | If true, the tags field of the DAG will be captured as DataHub tags.  |
    | datahub.graceful_exceptions  | true  | If set to true, most runtime errors in the lineage backend will be suppressed and will not cause the overall task to fail. Note that configuration issues will still throw exceptions.|

5. Configure `inlets` and `outlets` for your Airflow operators. For reference, look at the sample DAG in [`lineage_backend_demo.py`](../../metadata-ingestion/src/datahub_provider/example_dags/lineage_backend_demo.py), or reference [`lineage_backend_taskflow_demo.py`](../../metadata-ingestion/src/datahub_provider/example_dags/lineage_backend_taskflow_demo.py) if you're using the [TaskFlow API](https://airflow.apache.org/docs/apache-airflow/stable/concepts/taskflow.html).
6. [optional] Learn more about [Airflow lineage](https://airflow.apache.org/docs/apache-airflow/stable/lineage.html), including shorthand notation and some automation.

## How to validate installation

  1. Go and check in Airflow at Admin -> Plugins menu if you can see the Datahub plugin
  2. Run an Airflow DAG and you should see in the task logs Datahub releated log messages like:

  ```
  Emitting Datahub ...
  ```

## Additional references

Related Datahub videos:
[Airflow Lineage](https://www.youtube.com/watch?v=3wiaqhb8UR0)
[Airflow Run History in DataHub](https://www.youtube.com/watch?v=YpUOqDU5ZYg)
