Hello everyone, When using the class `UPathIOManag...
# dagster-feedback
k
Hello everyone, When using the class
UPathIOManager
the method
context.add_output_metadata()
does not work, rather we have to implement
UPathIOManager.get_metadata()
. It would be helpful if the docs explicitly stated that calling
context.add_output_metadata()
from
UPathIOManager.dump_to_path()
does not work.
d
hey, what do you mean by "does not work"? What happens if you do
context.add_output_metadata()
inside
dump_to_path
? also,
.get_metadata
had a bug (it crashed in some circumstances) in the last release which is fixed in master and the next release will not have it anymore. Keep this in mind if you are encountering it.
k
Hi @Daniel Gafni, Thanks for the prompt response. On
dagster 1.1.7
, running
context.add_output_metadata()
doesn't do anything. Maybe I might be doing something terribly wrong. I will try to send you a Minimal Reproducible Example when I have time however I'll post my custom IO manager in case anyone can notice any mistakes in it: In the above XML IO Manager, the
Moving the code to
get_metadata()
works but then I run into the other issue where the
UPathIOManager
class tries to call
get_metadata()
on input and it fails with
TypeError: Type 'InputContext' cannot be serialized.
, I assume this is the bug that you had mentioned and I noticed you had filed a PR for it here: https://github.com/dagster-io/dagster/pull/11110/files
1
d
I think you should be using MetaDataValue.text(“your-text”)
k
In addition, I think I narrowed down a bug in the
1.1.7
codebase that has now been deleted on
master
with @Daniel Gafni's latest PR. On
1.1.7
in
UPathIOManager._load_single_input
the method
self.get_metadata()
is called incorrectly (Linked here) as depicted below:
Copy code
custom_metadata = self.get_metadata(obj, context)
The function definition of
get_metadata
is as follows
def get_metadata(self, context, obj)
however in the above line, however in the above depicted line, the object is passed onto
get_metadata
in the first position in place of the context, And the context is passed in the second position in place of the object. This causes the bug where I was experiencing TypeErrors on file loading. Fixing the line as follows, got rid of the bug:
Copy code
custom_metadata = self.get_metadata(context=context, obj=obj)
Anyways this should be fixed in the next release but in case anyone else gets TypeErrors on file loading with
UPathIOManager
this could be it.
1
d
So I made this simple test and it's passing for me
Copy code
def test_upath_io_manager_medatada_from_dump_to_path(tmp_path: Path):
    def get_length(obj: Any) -> int:
        try:
            return len(obj)
        except TypeError:
            return 0

    class MetadataIOManager(UPathIOManager):
        def dump_to_path(self, context: OutputContext, obj: Any, path: UPath):
            context.add_output_metadata({"test": MetadataValue.text("test")})
            return

        def load_from_path(self, context: InputContext, path: UPath) -> Any:
            return

    @io_manager(config_schema={"base_path": Field(str, is_required=False)})
    def metadata_io_manager(init_context: InitResourceContext):
        assert init_context.instance is not None
        base_path = UPath(
            init_context.resource_config.get("base_path", init_context.instance.storage_directory())
        )
        return MetadataIOManager(base_path=base_path)

    manager = metadata_io_manager(build_init_resource_context(config={"base_path": str(tmp_path)}))

    @asset
    def my_asset() -> Any:
        return 0

    result = materialize(
        [my_asset],
        resources={"io_manager": manager},
    )
    handled_output_events = list(filter(lambda evt: evt.is_handled_output, result.all_node_events))

    assert handled_output_events[0].event_specific_data.metadata_entries[  # type: ignore[index,union-attr]
        0
    ].entry_data.value == "test"
k
Not sure if I'm doing something wrong since I'm somewhat unexperienced with pytest however running the test on my Windows 10 machine, it fails:
Copy code
test_assets.py::test_upath_io_manager_medatada_from_dump_to_path FAILED  [100%]2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - RUN_START - Started execution of run for "in_process_materialization_job".
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - ENGINE_EVENT - Executing steps in process (pid: 8804)
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - my_asset - RESOURCE_INIT_STARTED - Starting initialization of resources [io_manager].
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - my_asset - RESOURCE_INIT_SUCCESS - Finished initialization of resources [io_manager].
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - LOGS_CAPTURED - Started capturing logs in process (pid: 8804).
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - my_asset - STEP_START - Started execution of step "my_asset".
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - my_asset - STEP_OUTPUT - Yielded output "result" of type "Any". (Type check passed).
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - my_asset - Writing file at: C:\Users\Dee\AppData\Local\Temp\pytest-of-Dee\pytest-2\test_upath_io_manager_medatada0\my_asset
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - my_asset - ASSET_MATERIALIZATION - Materialized value my_asset.
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - my_asset - HANDLED_OUTPUT - Handled output "result" using IO manager "io_manager"
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - my_asset - STEP_SUCCESS - Finished execution of step "my_asset" in 11ms.
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - ENGINE_EVENT - Finished steps in process (pid: 8804) in 23ms
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - RUN_SUCCESS - Finished execution of run for "in_process_materialization_job".

tk_dagster_tests\test_assets.py:6 (test_upath_io_manager_medatada_from_dump_to_path)
'C:\\Users\\Dee\\AppData\\Local\\Temp\\pytest-of-Dee\\pytest-2\\test_upath_io_manager_medatada0\\my_asset' != 'test'

Expected :'test'
Actual   :'C:\\Users\\Dee\\AppData\\Local\\Temp\\pytest-of-Dee\\pytest-2\\test_upath_io_manager_medatada0\\my_asset'
I'll let you know if this is a Windows specific issue by running on Linux (OpenSuse) as well.
d
Yeah I’m rushing it from Linux. You can try running my test from docker if you have experience with it. I’m wondering if this bug can be platform-specific. @sandy seems like add_output_metadata may have some wrong behavior on windows (check the thread for more info)
👀 1
k
Small update on this, I ran on Linux and I still get a similar error:
Copy code
E       AssertionError: assert '/tmp/pytest-...ada0/my_asset' == 'test'
E         - test
E         + /tmp/pytest-of-dee/pytest-0/test_upath_io_manager_medatada0/my_asset
Maybe I'm doing something wrong? I just ran with the
pytest
command. Python 3.10 and Dagster version
dagster==1.1.7
btw.
d
Unfortunately I can’t investigate deeper since I’m on vacation. I’ll be able to take a closer look next week. Perhaps someone from the Dagster team can help you ( @sandy )
s
THis week's release included a change to
add_output_metadata
that might make this work now
k
The test now passes on both Linux and Windows with the most recent dagster update
1.1.9
, Great work! Also maybe that test can be added to the test suite?
👍 1
s
awesome
I will look in to adding it
👏 1