is there a dagster connector/integration for Hive ...
# ask-community
c
is there a dagster connector/integration for Hive clusters?
o
hi @clay! there's no native dagster-hive integration at the moment
c
Any bypasses or workarounds? I guess I could use airflow to run the Hive jobs and then trigger dagster afterwards
o
ah if you already have a working airflow setup, have you seen Migrating Airflow to Dagster?
c
There's nothing about the flow/script that's complicated -- just that it pulls a large amount of raw data from Hive, so I need to figure out how to make the Hive data accessible to Dagster. I don't have access to adjust the Airflow setup at all, unfortunately.
o
and by "make hive data accessible to Dagster", what would that look like ideally? I guess I'm trying to find the matrix of stuff that is in the following categories: ā€¢ currently handled by Airflow, should stay handled by Airflow ā€¢ currently handled by Airflow, should be handled by Dagster ā€¢ currently not handled by Airflow, should be handled by Dagster
c
It's "Currently handled by Airflow, would like it to be completely handled by Dagster."
o
Gotcha -- and it's currently handled by some custom Airflow operator?
c
Yes
Essentially, I could solve it easily by having a hive resource in Dagster. Maybe I'll look into hacking that together.
o
yeah that sounds totally reasonable (at the end of the day, either way you're just invoking python code, so some level of copy/paste will get the job done). there's also a utility to take an existing Airflow dag and turn it into a Dagster job: https://docs.dagster.io/integrations/airflow/reference#ingesting-dags-from-airflow
but depending on your level of "not having access to the airflow implementation" that might not work
c
I would have to invade a building in Shanghi and convince 20 people to help me. šŸ™‚
šŸ˜‚ 1
So... "no access to the airflow implementation" is mostly correct
The company has written a wrapper around Airflow
With the clever name of Dataflow
data party 1
o
well I guess my official recommendation would be to not invade any buildings
šŸŽ‰ 1
c
The DAG itself is simple so recreating that in Dagster is < 30 min of work. It's just the Hive part that I'll have to figure out. Anyhow, thanks for the help!
šŸŒˆ 1
o
yep no problem -- your idea of creating a HiveResource makes the most sense to me with that extra context, hopefully that's not too painful!
šŸ‘ 1