https://dagster.io/ logo
#ask-community
Title
# ask-community
c

clay

05/26/2023, 2:26 PM
is there a dagster connector/integration for Hive clusters?
o

owen

05/26/2023, 4:38 PM
hi @clay! there's no native dagster-hive integration at the moment
c

clay

05/26/2023, 4:55 PM
Any bypasses or workarounds? I guess I could use airflow to run the Hive jobs and then trigger dagster afterwards
o

owen

05/26/2023, 4:56 PM
ah if you already have a working airflow setup, have you seen Migrating Airflow to Dagster?
c

clay

05/26/2023, 5:29 PM
There's nothing about the flow/script that's complicated -- just that it pulls a large amount of raw data from Hive, so I need to figure out how to make the Hive data accessible to Dagster. I don't have access to adjust the Airflow setup at all, unfortunately.
o

owen

05/26/2023, 5:39 PM
and by "make hive data accessible to Dagster", what would that look like ideally? I guess I'm trying to find the matrix of stuff that is in the following categories: • currently handled by Airflow, should stay handled by Airflow • currently handled by Airflow, should be handled by Dagster • currently not handled by Airflow, should be handled by Dagster
c

clay

05/26/2023, 5:40 PM
It's "Currently handled by Airflow, would like it to be completely handled by Dagster."
o

owen

05/26/2023, 5:40 PM
Gotcha -- and it's currently handled by some custom Airflow operator?
c

clay

05/26/2023, 5:40 PM
Yes
Essentially, I could solve it easily by having a hive resource in Dagster. Maybe I'll look into hacking that together.
o

owen

05/26/2023, 5:42 PM
yeah that sounds totally reasonable (at the end of the day, either way you're just invoking python code, so some level of copy/paste will get the job done). there's also a utility to take an existing Airflow dag and turn it into a Dagster job: https://docs.dagster.io/integrations/airflow/reference#ingesting-dags-from-airflow
but depending on your level of "not having access to the airflow implementation" that might not work
c

clay

05/26/2023, 5:43 PM
I would have to invade a building in Shanghi and convince 20 people to help me. 🙂
😂 1
So... "no access to the airflow implementation" is mostly correct
The company has written a wrapper around Airflow
With the clever name of Dataflow
data party 1
o

owen

05/26/2023, 5:45 PM
well I guess my official recommendation would be to not invade any buildings
🎉 1
c

clay

05/26/2023, 5:46 PM
The DAG itself is simple so recreating that in Dagster is < 30 min of work. It's just the Hive part that I'll have to figure out. Anyhow, thanks for the help!
🌈 1
o

owen

05/26/2023, 5:46 PM
yep no problem -- your idea of creating a HiveResource makes the most sense to me with that extra context, hopefully that's not too painful!
👍 1
2 Views