https://dagster.io/ logo
Title
b

Ben Andersen-Waine

10/18/2022, 11:39 AM
Are there any concrete examples of using
dagster_databricks
and the
databricks_pyspark_step_launcher
? I can see one for EMR: https://github.com/dagster-io/dagster/blob/1.0.13/examples/with_pyspark_emr/with_pyspark_emr/repository.py It would be nice to see an example of the config required to pass to:
"pyspark_step_launcher": databricks_pyspark_step_launcher.configured(        {
            # ??? 
        }
    ),
This feels a bit like hard work 😄
😅 1
c

chris

10/18/2022, 6:55 PM
Hey Ben - apologies, this is definitely one of the less-documented areas of the codebase. Are there specific questions you have about the config schema here?
b

Ben Andersen-Waine

10/19/2022, 8:33 AM
I was actually able to get it working in the end and to be fair the hints on which API calls to use to get valid values to plugin to some of the Databricks specific config params where really helpful. One small piece of feedback is that the param:
secrets_to_env_variables
is listed as optional but causes an error unless you pass at least an empty list. A request for the future would be a minimal worked example on producing a run on a new job cluster and on an existing persistent cluster.
All that aside I'm now running databricks jobs in AWS from my workstation. Moving between a testing loop and then actually running a job on sample data is a workflow game changer. Thanks Dagster team 😄
c

chris

10/19/2022, 4:53 PM
this is excellent feedback, and really really appreciate you linking all this information. Glad it all worked out 🙂