Kulanjith Deelaka
12/29/2022, 10:28 AMUPathIOManager
the method context.add_output_metadata()
does not work, rather we have to implement UPathIOManager.get_metadata()
. It would be helpful if the docs explicitly stated that calling context.add_output_metadata()
from UPathIOManager.dump_to_path()
does not work.Daniel Gafni
12/29/2022, 1:46 PMcontext.add_output_metadata()
inside dump_to_path
?
also, .get_metadata
had a bug (it crashed in some circumstances) in the last release which is fixed in master and the next release will not have it anymore. Keep this in mind if you are encountering it.Kulanjith Deelaka
12/29/2022, 3:38 PMdagster 1.1.7
, running context.add_output_metadata()
doesn't do anything. Maybe I might be doing something terribly wrong. I will try to send you a Minimal Reproducible Example when I have time however I'll post my custom IO manager in case anyone can notice any mistakes in it:
In the above XML IO Manager, theKulanjith Deelaka
12/29/2022, 3:44 PMget_metadata()
works but then I run into the other issue where the UPathIOManager
class tries to call get_metadata()
on input and it fails with TypeError: Type 'InputContext' cannot be serialized.
,
I assume this is the bug that you had mentioned and I noticed you had filed a PR for it here: https://github.com/dagster-io/dagster/pull/11110/filesDaniel Gafni
12/29/2022, 3:44 PMKulanjith Deelaka
12/29/2022, 4:34 PM1.1.7
codebase that has now been deleted on master
with @Daniel Gafni's latest PR.
On 1.1.7
in UPathIOManager._load_single_input
the method self.get_metadata()
is called incorrectly (Linked here) as depicted below:
custom_metadata = self.get_metadata(obj, context)
The function definition of get_metadata
is as follows def get_metadata(self, context, obj)
however in the above line, however in the above depicted line, the object is passed onto get_metadata
in the first position in place of the context, And the context is passed in the second position in place of the object. This causes the bug where I was experiencing TypeErrors on file loading.
Fixing the line as follows, got rid of the bug:
custom_metadata = self.get_metadata(context=context, obj=obj)
Anyways this should be fixed in the next release but in case anyone else gets TypeErrors on file loading with UPathIOManager
this could be it.Daniel Gafni
12/29/2022, 7:30 PMdef test_upath_io_manager_medatada_from_dump_to_path(tmp_path: Path):
def get_length(obj: Any) -> int:
try:
return len(obj)
except TypeError:
return 0
class MetadataIOManager(UPathIOManager):
def dump_to_path(self, context: OutputContext, obj: Any, path: UPath):
context.add_output_metadata({"test": MetadataValue.text("test")})
return
def load_from_path(self, context: InputContext, path: UPath) -> Any:
return
@io_manager(config_schema={"base_path": Field(str, is_required=False)})
def metadata_io_manager(init_context: InitResourceContext):
assert init_context.instance is not None
base_path = UPath(
init_context.resource_config.get("base_path", init_context.instance.storage_directory())
)
return MetadataIOManager(base_path=base_path)
manager = metadata_io_manager(build_init_resource_context(config={"base_path": str(tmp_path)}))
@asset
def my_asset() -> Any:
return 0
result = materialize(
[my_asset],
resources={"io_manager": manager},
)
handled_output_events = list(filter(lambda evt: evt.is_handled_output, result.all_node_events))
assert handled_output_events[0].event_specific_data.metadata_entries[ # type: ignore[index,union-attr]
0
].entry_data.value == "test"
Kulanjith Deelaka
12/30/2022, 5:42 PMtest_assets.py::test_upath_io_manager_medatada_from_dump_to_path FAILED [100%]2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - RUN_START - Started execution of run for "in_process_materialization_job".
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - ENGINE_EVENT - Executing steps in process (pid: 8804)
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - my_asset - RESOURCE_INIT_STARTED - Starting initialization of resources [io_manager].
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - my_asset - RESOURCE_INIT_SUCCESS - Finished initialization of resources [io_manager].
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - LOGS_CAPTURED - Started capturing logs in process (pid: 8804).
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - my_asset - STEP_START - Started execution of step "my_asset".
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - my_asset - STEP_OUTPUT - Yielded output "result" of type "Any". (Type check passed).
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - my_asset - Writing file at: C:\Users\Dee\AppData\Local\Temp\pytest-of-Dee\pytest-2\test_upath_io_manager_medatada0\my_asset
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - my_asset - ASSET_MATERIALIZATION - Materialized value my_asset.
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - my_asset - HANDLED_OUTPUT - Handled output "result" using IO manager "io_manager"
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - my_asset - STEP_SUCCESS - Finished execution of step "my_asset" in 11ms.
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - ENGINE_EVENT - Finished steps in process (pid: 8804) in 23ms
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - RUN_SUCCESS - Finished execution of run for "in_process_materialization_job".
tk_dagster_tests\test_assets.py:6 (test_upath_io_manager_medatada_from_dump_to_path)
'C:\\Users\\Dee\\AppData\\Local\\Temp\\pytest-of-Dee\\pytest-2\\test_upath_io_manager_medatada0\\my_asset' != 'test'
Expected :'test'
Actual :'C:\\Users\\Dee\\AppData\\Local\\Temp\\pytest-of-Dee\\pytest-2\\test_upath_io_manager_medatada0\\my_asset'
I'll let you know if this is a Windows specific issue by running on Linux (OpenSuse) as well.Daniel Gafni
12/30/2022, 11:06 PMKulanjith Deelaka
01/04/2023, 1:07 PME AssertionError: assert '/tmp/pytest-...ada0/my_asset' == 'test'
E - test
E + /tmp/pytest-of-dee/pytest-0/test_upath_io_manager_medatada0/my_asset
Maybe I'm doing something wrong? I just ran with the pytest
command. Python 3.10 and Dagster version dagster==1.1.7
btw.Daniel Gafni
01/04/2023, 8:56 PMsandy
01/07/2023, 1:52 AMadd_output_metadata
that might make this work nowKulanjith Deelaka
01/10/2023, 3:05 PM1.1.9
, Great work! Also maybe that test can be added to the test suite?sandy
01/10/2023, 4:44 PMsandy
01/10/2023, 4:44 PM