Hi all :slightly_smiling_face: Is anyone here usin...
# announcements
i
Hi all 🙂 Is anyone here using Dagster with Databricks? I’ve seen a dagster-databricks module, but it’s not very well documented, so I’d be keen to hear about your setup
a
yea its very new and community contributed so we’re still working on getting it documented. I think referencing the tests is one thing you can do for now https://github.com/dagster-io/dagster/blob/master/python_modules/libraries/dagster-databricks/dagster_databricks_tests/test_pyspark.py
thankyou 1
cc @sandy
s
hey @Ivan Rivera - happy to answer any questions you have on how it works. I believe @Binh Pham has used it and may have some takeaways
thankyou 1
b
I was able to setup dagster-databricks and got it to run successfully following the simple_pyspark example: https://github.com/dagster-io/dagster/tree/master/examples/legacy_examples/dagster_examples/simple_pyspark But errors happening on the databrick's cluster was not being sent back to dagster, which is unfortunate because I wanted to use dagster for monitoring purposes. Unsure if this a limitation with dagster-databricks or databricks run now api. I ended up using databricks-connect and creating a simple resource for it:
Copy code
from pyspark.sql import SparkSession

class PySparkResource(object):
    def __init__(self):
        self.spark_session = SparkSession.builder.getOrCreate()

@resource
def pyspark_resource(_):
    return PySparkResource()
databricks-connect requires that you don't have any other version of pyspark though: https://docs.databricks.com/dev-tools/databricks-connect.html#step-1-install-the-client So to able to use intermediate storage and provide a serialization plan for PySpark DataFrames, I just copied dagster_pyspark/types.py into my project. https://github.com/dagster-io/dagster/blob/master/python_modules/libraries/dagster-pyspark/dagster_pyspark/types.py So far this has worked for me. But happy to hear if there is a better way to get the best of both worlds. 🙂
thankyou 2
i
Awesome, thank you @alex, @sandy and @Binh Pham! That’s plenty of info for me to get started with