Hi all :slightly_smiling_face: Is anyone here usin...
# announcements
Hi all 🙂 Is anyone here using Dagster with Databricks? I’ve seen a dagster-databricks module, but it’s not very well documented, so I’d be keen to hear about your setup
yea its very new and community contributed so we’re still working on getting it documented. I think referencing the tests is one thing you can do for now https://github.com/dagster-io/dagster/blob/master/python_modules/libraries/dagster-databricks/dagster_databricks_tests/test_pyspark.py
thankyou 1
cc @sandy
hey @Ivan Rivera - happy to answer any questions you have on how it works. I believe @Binh Pham has used it and may have some takeaways
thankyou 1
I was able to setup dagster-databricks and got it to run successfully following the simple_pyspark example: https://github.com/dagster-io/dagster/tree/master/examples/legacy_examples/dagster_examples/simple_pyspark But errors happening on the databrick's cluster was not being sent back to dagster, which is unfortunate because I wanted to use dagster for monitoring purposes. Unsure if this a limitation with dagster-databricks or databricks run now api. I ended up using databricks-connect and creating a simple resource for it:
Copy code
from pyspark.sql import SparkSession

class PySparkResource(object):
    def __init__(self):
        self.spark_session = SparkSession.builder.getOrCreate()

def pyspark_resource(_):
    return PySparkResource()
databricks-connect requires that you don't have any other version of pyspark though: https://docs.databricks.com/dev-tools/databricks-connect.html#step-1-install-the-client So to able to use intermediate storage and provide a serialization plan for PySpark DataFrames, I just copied dagster_pyspark/types.py into my project. https://github.com/dagster-io/dagster/blob/master/python_modules/libraries/dagster-pyspark/dagster_pyspark/types.py So far this has worked for me. But happy to hear if there is a better way to get the best of both worlds. 🙂
thankyou 2
Awesome, thank you @alex, @sandy and @Binh Pham! That’s plenty of info for me to get started with