hi everyone! i'm going to be developing a spark ba...
# announcements
b
hi everyone! i'm going to be developing a spark based data processing system soon and would absolutely love to choose dagster over airflow or alternatives. we're fully Azure based at my company so will be using Azure Databricks, and i'm not quite sure how that fits into the current dagster-pyspark model since as far as i can tell databricks jobs really need submitting using their API which is effectively spark-submit, rather than getting an interactive spark session which dagster can use. is there something I'm missing on how dagster could be used to orchestrate these jobs (while making proper use of dagster's abstractions and testing functionality)?
one option could be to use databricks connect, which (i think) provides similar functionality to the EMR pyspark resource, but it's limited and uses the more expensive 'data analytics' pricing
s
Hey Ben, I'm finishing up some improvements to Dagster's PySpark EMR integration: https://dagster.phacility.com/D2578 That integration essentially spark-submits to EMR, i.e. it doesn't rely on an interactive shell. I believe using the Databricks APIs to spark-submit to Azure would fit the same model well
b
ah interesting! that is a large change 🙂 i'll take a look and keep an eye out for the merge! thanks Sandy