Hello Everyone I am trying Dagster thinking about maybe repl dagster #announcements

Hello Everyone! I am trying Dagster, thinking abou...

Alexandre Miyazaki

11/28/2020, 1:58 PM

Hello Everyone! I am trying Dagster, thinking about maybe replacing our Azkaban (Pipeline orchestrator similar to Airflow). Our schedules are usually Spark jobs. I was wondering if someone have spark job flows built on Dagster that can be shared.... I couldn't find doing some research.... Thanks a lot!

Simon Späti

11/29/2020, 4:43 PM

Hi Alexandre, there is a basic example here: https://docs.dagster.io/examples/basic_pyspark But I'm also using Spark and delta.io. Some of my solids you can see here: https://github.com/sspaeti-com/dagster-data-pipelines/blob/main/src/pipelines/real-estate/realestate/common/solids_spark_delta.py This is not something finalised or finished, but might help you starting. I'm also planing to put that somewhere in a awesome-dagster-code.

👍 1

Alexandre Miyazaki

11/29/2020, 6:21 PM

Hey Simon, thanks for your reply! I was looking for actually Spark via spark-submit or something like that, the majority of our jobs is written is Scala. I will share in this thread the repo that I'm building with some examples once I have something running, maybe could help someone who's also starting with Spark on Dagster. Thanks again! 😁

👍 1

Simon Späti

11/29/2020, 8:32 PM

Ah ok Scala is a different topic, I’m not sure how to approach this best, curious to hear the Dagster teams thoughs on this. But just to be sure, with my examples, or with Dagster in general, you can write your resource (e.g. I’m using pyspark here) and then you can use spark simply by `context.resources.pyspark.spark_session.read.json(s3_path)`e.g. in all your solids, which is a pretty powerful thing. Rather than submitting a spark-submit all the time. The connection details you define once as part of your environment (dev, test, prod) YAML-configs. But yeah if that is a necessity then thats a whole other thing.

👍 1

Alexandre Miyazaki

11/30/2020, 11:47 AM

This is for sure a very nice feature using the python integration! I will search a way to make this parametrization better using resources (and since we have QA/PROD environments, this makes sense). I am still working on having Dagster communicating with our QA cluster, as soon as I have some results, I share with you 🙂 .

celebrate 1

4 Views

Open in Slack

Previous Next