https://dagster.io/ logo
#ask-community
Title
# ask-community
j

Joseph McCartin

07/14/2022, 6:36 PM
Hi there, I’m having some issues using the
databricks_pyspark_step_launcher
- I am able to submit an op to the databricks cluster on Azure, but after some initialisation, the run quickly crashes before any logs can be produced, so I get a
RESOURCE_DOES_NOT_EXIST
error from the databricks API due to the missing stdout file in dbfs. I’ve posted a picture of the logs in dagster, but they don’t provide any meaningful info. Does anyone have an idea about where I can look to debug the problem a little better?
o

owen

07/14/2022, 6:50 PM
hey @Joseph McCartin! do you have the permissions to access the databricks console? If so, I'd recommend checking to see if there's anything in those logs. otherwise, I can help debug from there
j

Joseph McCartin

07/15/2022, 4:13 PM
I have full access to the databricks console, but nowhere have I looked has yielded any meaningful logs. There’s no mention of any dagster jobs. I’ve also tried looking around in the cluster driver (I can get shell access), but it’s hard to know where to look - I only saw traces of the dagster libraries being installed. Unless you might know where to look which would provide better info of why it crashed before the stdout file could be created, I suspect I’ll need to make a local copy of the dagster_databricks pypi package and add extra logging everywhere. I might be able to redirect those logs to a known path in the driver so I can view what is happening.
It’s worth mentioning that this is a prototype, and I’ve never connected dagster to databricks before. So possibly the code isn’t idiot-proof enough yet and I’ve slipped in between the cracks 🙂
o

owen

07/15/2022, 4:33 PM
step launchers can definitely be tricky to set up, no worries! do you mind sharing how you're configuring the step launcher? no need to share the full config block if there's sensitive stuff in there, but for example are you launching a new cluster per step, or latching onto an existing one?
(my guess, from the logs, is that you're using an existing cluster?)
also, what version of dagster/dagster-databricks are you using?
there was an issue in one of the early 0.15.x versions, but that should be gone in the newest version, so the quickest thing to try would just be upgrading to 0.15.5+ or so
j

Joseph McCartin

07/15/2022, 4:36 PM
yes, existing cluster - just to help debug
👍 1
oh, well actually, i am on 0.15.0
o

owen

07/15/2022, 4:36 PM
ah that would do it
j

Joseph McCartin

07/15/2022, 4:36 PM
so maybe i’ll quickly try and bump the version
o

owen

07/15/2022, 4:40 PM
nice let me know how that goes -- I would expect the error that was happening in the older versions to show up in the driver logs for the cluster (that's really the only place I ever go in the DB console), but it's possible that it might not show up
j

Joseph McCartin

07/15/2022, 4:44 PM
It takes me a while to push and build an image to the container registry I’m using (Azure). But I’ll have a quick go once it’s done to check that theory. Otherwise it’ll have to be monday - it’s almost 6pm where I am 🙂
o

owen

07/15/2022, 4:45 PM
no worries! happy to pick back up the debugging then if need be (although hopefully the upgrade will just work 😅 🤞)
j

Joseph McCartin

07/15/2022, 9:38 PM
well, I checked on the results after upgrading after just getting back to my laptop - it looks like the same error again. this time I am running 0.15.5 for all my dagster libraries, but my helm chart is at 0.15.3 (the latest version). I assume that’s not going to be a problem
so on Monday i’ll go with my original plan - fork the code and add a lot of debugging. Hopefully I can get a bug report or a meaningful Github issue out of this, I’d love to be able to contribute to such an awesome project after all.
o

owen

07/15/2022, 9:41 PM
that would be great! appreciate you digging into it, and i'm happy to help debug or review any change you make
j

Joseph McCartin

07/19/2022, 6:16 PM
Think I found the issue. Had a lot of strange behaviours observed, probably as a result of mixing python versions in the repository and the databricks runtime. But on top of that, there are some traps that can easily be encountered in the dagster-databricks library. I’ll write an issue once I’ve finished with my code, but the gist of it is that two pickle files are loaded before the stdout and stderr files are created. https://github.com/dagster-io/dagster/blob/6365996d04f0671f4615ee11b966095302f6b2d[…]s/dagster-databricks/dagster_databricks/databricks_step_main.py If Something happens in those lines, the code will break before logging can be captured
o

owen

07/20/2022, 9:13 PM
got it, thanks for digging into this, and feel free to tag me on the issue once you write it up 🙂
j

Joseph McCartin

07/26/2022, 5:36 PM
Sorry for the delay in this - it was a little difficult to debug and I was prioritising getting this MVP working in our system. In case you didn’t see it already, I’ve created the issue here: https://github.com/dagster-io/dagster/issues/9039