Hi all slightly smiling face Is anyone here using Dagster wi dagster #announcements

Hi all :slightly_smiling_face: Is anyone here usin...

Ivan Rivera

08/24/2020, 7:14 AM

Hi all 🙂 Is anyone here using Dagster with Databricks? I’ve seen a dagster-databricks module, but it’s not very well documented, so I’d be keen to hear about your setup

alex

08/24/2020, 10:38 PM

yea its very new and community contributed so we’re still working on getting it documented. I think referencing the tests is one thing you can do for now https://github.com/dagster-io/dagster/blob/master/python_modules/libraries/dagster-databricks/dagster_databricks_tests/test_pyspark.py

thankyou 1

alex

08/24/2020, 10:38 PM

cc @sandy

sandy

08/24/2020, 10:39 PM

hey @Ivan Rivera - happy to answer any questions you have on how it works. I believe @Binh Pham has used it and may have some takeaways

thankyou 1

Binh Pham

08/24/2020, 11:10 PM

I was able to setup dagster-databricks and got it to run successfully following the simple_pyspark example: https://github.com/dagster-io/dagster/tree/master/examples/legacy_examples/dagster_examples/simple_pyspark But errors happening on the databrick's cluster was not being sent back to dagster, which is unfortunate because I wanted to use dagster for monitoring purposes. Unsure if this a limitation with dagster-databricks or databricks run now api. I ended up using databricks-connect and creating a simple resource for it:

Copy code

from pyspark.sql import SparkSession

class PySparkResource(object):
    def __init__(self):
        self.spark_session = SparkSession.builder.getOrCreate()

@resource
def pyspark_resource(_):
    return PySparkResource()

databricks-connect requires that you don't have any other version of pyspark though: https://docs.databricks.com/dev-tools/databricks-connect.html#step-1-install-the-client So to able to use intermediate storage and provide a serialization plan for PySpark DataFrames, I just copied dagster_pyspark/types.py into my project. https://github.com/dagster-io/dagster/blob/master/python_modules/libraries/dagster-pyspark/dagster_pyspark/types.py So far this has worked for me. But happy to hear if there is a better way to get the best of both worlds. 🙂

thankyou 2

Ivan Rivera

08/25/2020, 6:03 AM

Awesome, thank you @alex, @sandy and @Binh Pham! That’s plenty of info for me to get started with

Open in Slack

Previous Next