https://dagster.io/ logo
Title
j

Jeff

12/14/2022, 9:49 PM
Hello, I think I have found a bug in
@asset
. Details in 🧵
I have an asset defined like:
@asset(
    metadata={"data1": ["foo", "bar"], "data2": []},
)
def example_asset(context: OpExecutionContext):
I have an IO manager that gets the metadata from an upstream asset like:
metadata = context.metadata if isinstance(context, OutputContext) else context.upstream_output.metadata
When I get the field
data1 = metadata["data1"]
the value of
data1
is a string
[list] (unserializable)
This works when using the IO manager for output, but doesn’t when getting upstream data for a downstream asset.
j

jamie

12/14/2022, 9:57 PM
hey @Jeff thanks for bringing this up. it might be related to another IO manager + metadata bug I’m investigating right now. I’ll keep you posted
:ty-spinny: 2
j

Jeff

12/19/2022, 5:41 PM
Hey @jamie, do you have any update on this? Thanks!
j

jamie

12/19/2022, 5:41 PM
i’m still debugging, but will keep you updated
👍 1
after digging into the other bug i was working on, i think it’s unrelated to this one. So i’ll work on debugging this next. just want to confirm the behavior you’re seeing - when you materialize an asset downstream of
example_asset
in the
load_input
function of your IO manager, the value of the metadata
data1
is the literal string
"[list] (unserializable)"
? and in other cases you get the list of strings as expected?
j

Jeff

12/19/2022, 8:18 PM
Yep. When I materialize
data1
on its own, the metadata is correct. This should be unique to
context.upstream_output.metadata
in
load_input
usage cases.
j

jamie

12/19/2022, 8:19 PM
ok great. will take a look at it soon
:ty-spinny: 1
I’m having trouble replicating this. Here’s the test i wrote
def test_metadata_io_manager():
    expected_metadata = {"data1": ["foo", "bar"], "data2": []}
    @asset(
        metadata=expected_metadata,
    )
    def upstream():
        return 1

    @asset
    def downstream(upstream):
        return upstream + 1


    class MyIOManager(IOManager):
        def __init__(self):
            self.values = dict()

        def handle_output(self, context, obj):
            keys = tuple(context.get_identifier())
            self.values[keys] = obj

        def load_input(self, context):
            assert context.upstream_output is not None
            metadata = context.metadata if isinstance(context, OutputContext) else context.upstream_output.metadata
            assert metadata == expected_metadata
            keys = tuple(context.get_identifier())
            return self.values[keys]


    materialize([upstream, downstream], resources={"io_manager":IOManagerDefinition.hardcoded_io_manager(MyIOManager())})
is there something i’m missing in my setup compared to what you’re doing? what version of dagster are you running?
j

Jeff

12/19/2022, 8:47 PM
I’m on dagster 1.1.7. I’m also using partitions and multi-partitions, although I don’t think that’s part of the problem? Let me see if there is anything else that could be missing.
j

jamie

12/19/2022, 8:48 PM
yeah i wouldn’t think partitions would be part of the problem
would you be able to copy this code snippet and confirm it’s working for you? just to base-line check that there isn’t something super funky going on
👍 1
j

Jeff

12/19/2022, 8:51 PM
Can confirm that your snippet works.
Just reran mine on same env and it failed with the unserializable error.
j

jamie

12/19/2022, 8:55 PM
ok. are you able to share the code you’re running? or a slimmed-down replication?
you can also dm if you don’t want to share the code publicly
j

Jeff

12/19/2022, 8:56 PM
I will need to ask compliance and I might have to rewrite a replication, will let you know in a bit.
j

jamie

12/19/2022, 8:56 PM
ok no worries
following up on this! so it turns out we are overriding list metadata with that string, i talked with the other engineer who worked on the metadata system, and there isn’t a good reason that we do this. we’ll update it in the next release so that list metadata is fully supported. In the meantime, if you need the metadata quickly, you could instead use a dictionary with a single key with the value that’s the list
@asset( 
metadata={"data1": {"val": ["foo", "bar"]}}
)
j

Jeff

12/21/2022, 4:50 PM
Thanks for the update and thanks for looking into this for me! To confirm, this release would be on 1/5, correct?
j

jamie

12/21/2022, 4:50 PM
yeah, 1/5
👍 1
j

Jeff

01/09/2023, 4:41 PM
Hey Jamie, I tested this on the new version and it doesn’t work still. I looked at the source code and the only thing I found that could create the string
[list] (unserializable)
is
normalize_metatdata
in
_core/definitions/metadata/__init__.py
; it appears that
normalize_metadata_value
does not support lists, so I suspect this is the source of the issue.
j

jamie

01/09/2023, 4:43 PM
hey - yeah the change to fix this didn’t get into last week’s release, and this thread was totally lost in my history so i couldn’t find it to let you know (sorry!). but it will be in this week’s release!
:ty-spinny: 1