Hello everyone When using the class `UPathIOManager` the met dagster #dagster-feedback

Hello everyone, When using the class `UPathIOManag...

Kulanjith Deelaka

12/29/2022, 10:28 AM

Hello everyone, When using the class

UPathIOManager

the method

context.add_output_metadata()

does not work, rather we have to implement

UPathIOManager.get_metadata()

. It would be helpful if the docs explicitly stated that calling

context.add_output_metadata()

from

UPathIOManager.dump_to_path()

does not work.

Daniel Gafni

12/29/2022, 1:46 PM

hey, what do you mean by "does not work"? What happens if you do

context.add_output_metadata()

inside

dump_to_path

? also,

.get_metadata

had a bug (it crashed in some circumstances) in the last release which is fixed in master and the next release will not have it anymore. Keep this in mind if you are encountering it.

Kulanjith Deelaka

12/29/2022, 3:38 PM

Hi @Daniel Gafni, Thanks for the prompt response. On

dagster 1.1.7

, running

context.add_output_metadata()

doesn't do anything. Maybe I might be doing something terribly wrong. I will try to send you a Minimal Reproducible Example when I have time however I'll post my custom IO manager in case anyone can notice any mistakes in it: In the above XML IO Manager, the

Kulanjith Deelaka

12/29/2022, 3:44 PM

Moving the code to

get_metadata()

works but then I run into the other issue where the

UPathIOManager

class tries to call

get_metadata()

on input and it fails with

TypeError: Type 'InputContext' cannot be serialized.

, I assume this is the bug that you had mentioned and I noticed you had filed a PR for it here: https://github.com/dagster-io/dagster/pull/11110/files

➕ 1

Daniel Gafni

12/29/2022, 3:44 PM

I think you should be using MetaDataValue.text(“your-text”)

Kulanjith Deelaka

12/29/2022, 4:34 PM

In addition, I think I narrowed down a bug in the

1.1.7

codebase that has now been deleted on

master

with @Daniel Gafni's latest PR. On

1.1.7

UPathIOManager._load_single_input

the method

self.get_metadata()

is called incorrectly (Linked here) as depicted below:

Copy code

custom_metadata = self.get_metadata(obj, context)

The function definition of

get_metadata

is as follows

def get_metadata(self, context, obj)

however in the above line, however in the above depicted line, the object is passed onto

get_metadata

in the first position in place of the context, And the context is passed in the second position in place of the object. This causes the bug where I was experiencing TypeErrors on file loading. Fixing the line as follows, got rid of the bug:

Copy code

custom_metadata = self.get_metadata(context=context, obj=obj)

Anyways this should be fixed in the next release but in case anyone else gets TypeErrors on file loading with

UPathIOManager

this could be it.

➕ 1

Daniel Gafni

12/29/2022, 7:30 PM

So I made this simple test and it's passing for me

Copy code

def test_upath_io_manager_medatada_from_dump_to_path(tmp_path: Path):
    def get_length(obj: Any) -> int:
        try:
            return len(obj)
        except TypeError:
            return 0

    class MetadataIOManager(UPathIOManager):
        def dump_to_path(self, context: OutputContext, obj: Any, path: UPath):
            context.add_output_metadata({"test": MetadataValue.text("test")})
            return

        def load_from_path(self, context: InputContext, path: UPath) -> Any:
            return

    @io_manager(config_schema={"base_path": Field(str, is_required=False)})
    def metadata_io_manager(init_context: InitResourceContext):
        assert init_context.instance is not None
        base_path = UPath(
            init_context.resource_config.get("base_path", init_context.instance.storage_directory())
        )
        return MetadataIOManager(base_path=base_path)

    manager = metadata_io_manager(build_init_resource_context(config={"base_path": str(tmp_path)}))

    @asset
    def my_asset() -> Any:
        return 0

    result = materialize(
        [my_asset],
        resources={"io_manager": manager},
    )
    handled_output_events = list(filter(lambda evt: evt.is_handled_output, result.all_node_events))

    assert handled_output_events[0].event_specific_data.metadata_entries[  # type: ignore[index,union-attr]
        0
    ].entry_data.value == "test"

Kulanjith Deelaka

12/30/2022, 5:42 PM

Not sure if I'm doing something wrong since I'm somewhat unexperienced with pytest however running the test on my Windows 10 machine, it fails:

Copy code

test_assets.py::test_upath_io_manager_medatada_from_dump_to_path FAILED  [100%]2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - RUN_START - Started execution of run for "in_process_materialization_job".
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - ENGINE_EVENT - Executing steps in process (pid: 8804)
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - my_asset - RESOURCE_INIT_STARTED - Starting initialization of resources [io_manager].
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - my_asset - RESOURCE_INIT_SUCCESS - Finished initialization of resources [io_manager].
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - LOGS_CAPTURED - Started capturing logs in process (pid: 8804).
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - my_asset - STEP_START - Started execution of step "my_asset".
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - my_asset - STEP_OUTPUT - Yielded output "result" of type "Any". (Type check passed).
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - my_asset - Writing file at: C:\Users\Dee\AppData\Local\Temp\pytest-of-Dee\pytest-2\test_upath_io_manager_medatada0\my_asset
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - my_asset - ASSET_MATERIALIZATION - Materialized value my_asset.
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - my_asset - HANDLED_OUTPUT - Handled output "result" using IO manager "io_manager"
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - my_asset - STEP_SUCCESS - Finished execution of step "my_asset" in 11ms.
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - ENGINE_EVENT - Finished steps in process (pid: 8804) in 23ms
2022-12-30 23:02:44 +0530 - dagster - DEBUG - in_process_materialization_job - 14a2aadd-bd01-4854-bda0-834468e13045 - 8804 - RUN_SUCCESS - Finished execution of run for "in_process_materialization_job".

tk_dagster_tests\test_assets.py:6 (test_upath_io_manager_medatada_from_dump_to_path)
'C:\\Users\\Dee\\AppData\\Local\\Temp\\pytest-of-Dee\\pytest-2\\test_upath_io_manager_medatada0\\my_asset' != 'test'

Expected :'test'
Actual   :'C:\\Users\\Dee\\AppData\\Local\\Temp\\pytest-of-Dee\\pytest-2\\test_upath_io_manager_medatada0\\my_asset'

I'll let you know if this is a Windows specific issue by running on Linux (OpenSuse) as well.

Daniel Gafni

12/30/2022, 11:06 PM

Yeah I’m rushing it from Linux. You can try running my test from docker if you have experience with it. I’m wondering if this bug can be platform-specific. @sandy seems like add_output_metadata may have some wrong behavior on windows (check the thread for more info)

👀 1

Kulanjith Deelaka

01/04/2023, 1:07 PM

Small update on this, I ran on Linux and I still get a similar error:

Copy code

E       AssertionError: assert '/tmp/pytest-...ada0/my_asset' == 'test'
E         - test
E         + /tmp/pytest-of-dee/pytest-0/test_upath_io_manager_medatada0/my_asset

Maybe I'm doing something wrong? I just ran with the

pytest

command. Python 3.10 and Dagster version

dagster==1.1.7

btw.

Daniel Gafni

01/04/2023, 8:56 PM

Unfortunately I can’t investigate deeper since I’m on vacation. I’ll be able to take a closer look next week. Perhaps someone from the Dagster team can help you ( @sandy )

sandy

01/07/2023, 1:52 AM

THis week's release included a change to

add_output_metadata

that might make this work now

Kulanjith Deelaka

01/10/2023, 3:05 PM

The test now passes on both Linux and Windows with the most recent dagster update

1.1.9

, Great work! Also maybe that test can be added to the test suite?

👍 1

sandy

01/10/2023, 4:44 PM

awesome

sandy

01/10/2023, 4:44 PM

I will look in to adding it

👏 1

63 Views

Open in Slack

Previous Next