https://dagster.io/ logo
#ask-community
Title
# ask-community
s

Sergio Pintaldi

03/14/2023, 5:33 AM
Hi guys I am trying to generate a graph asset by using 2 ops: 1. the first op load a passed folder to Databricks 2. the second op read the folder as CSV using spark and generate the table I get an error in the second op as it cannot find the passed path. I assume because the first op return a Databricks path saved using the local IO manager (local fs IO manager) and therefore that is not present in the code uploaded to Databricks (the second op). That is my interpretation of the error. Please see the code in this thread
🤖 1
this code is in the
test_graph_asset_upload_and_run_on_databricks
folder
and this file is located at the same level of the folder above
Error in `dagit`:
Copy code
dagster._core.errors.DagsterExecutionLoadInputError: Error occurred while loading input "path" of step "dev_srg_test__test_table.spark_read_csv":

  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dagster/_core/execution/plan/execute_plan.py", line 269, in dagster_event_sequence_for_step
    for step_event in check.generator(step_events):
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dagster/_core/execution/plan/execute_step.py", line 346, in core_dagster_event_sequence_for_step
    for event_or_input_value in step_input.source.load_input_object(step_context, input_def):
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dagster/_core/execution/plan/inputs.py", line 520, in load_input_object
    yield from _load_input_with_input_manager(input_manager, load_input_context)
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dagster/_core/execution/plan/inputs.py", line 835, in _load_input_with_input_manager
    value = input_manager.load_input(context)
  File "/usr/lib/python3.9/contextlib.py", line 135, in __exit__
    self.gen.throw(type, value, traceback)
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dagster/_core/execution/plan/utils.py", line 85, in op_execution_error_boundary
    raise error_cls(

The above exception was caused by the following exception:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp0odzuhmj/storage/f763280b-17b8-4732-9a98-173cfa2c5c64/dev_srg_cmn_test__test_table.upload_folder_to_databricks/result'

  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dagster/_core/execution/plan/utils.py", line 55, in op_execution_error_boundary
    yield
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dagster/_core/execution/plan/inputs.py", line 835, in _load_input_with_input_manager
    value = input_manager.load_input(context)
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dagster/_core/storage/upath_io_manager.py", line 195, in load_input
    return self._load_single_input(path, context)
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dagster/_core/storage/upath_io_manager.py", line 150, in _load_single_input
    raise e
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dagster/_core/storage/upath_io_manager.py", line 138, in _load_single_input
    obj = self.load_from_path(context=context, path=path)
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dagster/_core/storage/fs_io_manager.py", line 172, in load_from_path
    with path.open("rb") as file:
  File "/usr/lib/python3.9/pathlib.py", line 1242, in open
    return io.open(self, mode, buffering, encoding, errors, newline,
  File "/usr/lib/python3.9/pathlib.py", line 1110, in _opener
    return self._accessor.open(self, flags, mode)
y

yuhan

03/14/2023, 5:47 AM
where are you running dagster/dagit? is it in dagster cloud?
s

Sergio Pintaldi

03/14/2023, 6:02 AM
no locally
but in prod runs in a container in AWS
y

yuhan

03/14/2023, 6:51 AM
ah then you can use s3 io manager to pass data between ops https://docs.dagster.io/deployment/guides/aws#using-s3-for-io-management
s

Sergio Pintaldi

03/14/2023, 10:24 PM
Thank you! 🙏
2 Views