https://dagster.io/ logo
Title
b

Bernardo Cortez

09/20/2022, 10:49 AM
Hi hi! We are using Dagster as an orchestrator to launch pyspark jobs in a databricks jobs cluster via the databricks Rest API. Every once and a while, we get Faillure messages like the following one. Can you help us understand what is happening? It used to be one every two weeks but is now happening almost on a daily basis.... Here is the error message:
dagster.core.errors.DagsterSubprocessError: During multiprocess execution errors occurred in child processes:
In process 1966950: requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://adb-4333425462262739.19.azuredatabricks.net/api/2.0/dbfs/read?path=%2Fdatafordeals%2Fstaging%2F0c6b1015-b98a-4bc6-9c96-8b10e116504e%2Fcreate_subpayments.create_taxes_subpayments%2Fstdout&length=1048576
Response from server:
{ 'error_code': 'RESOURCE_DOES_NOT_EXIST',
'message': 'No file or directory exists on path '
'/datafordeals/staging/0c6b1015-b98a-4bc6-9c96-8b10e116504e/create_subpayments.create_taxes_subpayments/stdout.'}
Stack Trace:
File "/usr/local/lib/python3.7/site-packages/dagster/core/executor/child_process_executor.py", line 70, in _execute_command_in_child_process
for step_event in command.execute():
File "/usr/local/lib/python3.7/site-packages/dagster/core/executor/multiprocess.py", line 88, in execute
instance=instance,
File "/usr/local/lib/python3.7/site-packages/dagster/core/execution/api.py", line 881, in iter
pipeline_context=self.pipeline_context,
File "/usr/local/lib/python3.7/site-packages/dagster/core/execution/plan/execute_plan.py", line 87, in inner_plan_execution_iterator
for step_event in check.generator(dagster_event_sequence_for_step(step_context)):
File "/usr/local/lib/python3.7/site-packages/dagster/core/execution/plan/execute_plan.py", line 336, in dagster_event_sequence_for_step
raise unexpected_exception
File "/usr/local/lib/python3.7/site-packages/dagster/core/execution/plan/execute_plan.py", line 232, in dagster_event_sequence_for_step
for step_event in check.generator(step_events):
File "/opt/dagster/dagster_home/cashback_spark/resources/databricks_pyspark_step_launcher_sp.py", line 202, in launch_step
self.log_compute_logs(log, run_id, step_key)
File "/opt/dagster/dagster_home/cashback_spark/resources/databricks_pyspark_step_launcher_sp.py", line 209, in log_compute_logs
self._dbfs_path(run_id, step_key, "stdout")
File "/usr/local/lib/python3.7/site-packages/dagster_databricks/databricks.py", line 43, in read_file
jdoc = self.client.dbfs.read(path=dbfs_path, length=block_size) # pylint: disable=no-member
File "/usr/local/lib/python3.7/site-packages/databricks_cli/sdk/service.py", line 504, in read
return self.client.perform_query('GET', '/dbfs/read', data=_data, headers=headers)
File "/usr/local/lib/python3.7/site-packages/databricks_cli/sdk/api_client.py", line 146, in perform_query
raise requests.exceptions.HTTPError(message, response=e.response)
File "/usr/local/lib/python3.7/site-packages/dagster/core/execution/api.py", line 785, in pipeline_execution_iterator
for event in pipeline_context.executor.execute(pipeline_context, execution_plan):
File "/usr/local/lib/python3.7/site-packages/dagster/core/executor/multiprocess.py", line 283, in execute
subprocess_error_infos=list(errs.values()),
EDIT: I forgot to mention that a simple re-execute solves the problem. It is just incovenient because it breaks the power of the automatic dagster's jobs queue
o

owen

09/20/2022, 4:09 PM
hi @Bernardo Cortez! sorry you're running into this -- at a high level, when the external step is launched, dagster needs some place to store the stdout / stderr that's produced in the external step, so it can be sent back to the process that launched the databricks step. Before a somewhat recent change (maybe around a month ago), these files would be created later in the process of setting up the external step, so if anything went wrong during that setup phase, you'd end up with that (unhelpful) error you're seeing. I think in recent releases, you should not get this error anymore (instead getting a more informative error, or no error at all). Can you confirm what version of dagster-databricks you're using? If it's the most recent one, I can look into this a bit more to see why that might still be happening
b

Bernardo Cortez

09/22/2022, 11:00 AM
0.14.6
Which version should update to?
Thanks for the help btw!