how can I debug this stacktrace I upgraded my dagster instan dagster #ask-community

how can I debug this stacktrace? I upgraded my dag...

geoHeil

07/19/2022, 4:34 PM

how can I debug this stacktrace? I upgraded my dagster instance from 0.15.5 to 0.15.6 and have a hard time figuring out the details:

Copy code

/lib/python3.9/site-packages/dagster/core/workspace/context.py:554: UserWarning: Error loading repository location dagster.core.errors.DagsterInvalidMetadata: Could not resolve the metadata value for "key" to a known type. Its type was <class 'list'>. Consider wrapping the value with the appropriate MetadataValue type.

Stack Trace:
  File "/lib/python3.9/site-packages/dagster/grpc/server.py", line 485, in _get_serialized_external_repository_data
    external_repository_data_from_def(recon_repo.get_definition())
  File "/lib/python3.9/site-packages/dagster/core/host_representation/external_data.py", line 791, in external_repository_data_from_def
    external_asset_graph_data=external_asset_graph_from_defs(
  File "/lib/python3.9/site-packages/dagster/core/host_representation/external_data.py", line 887, in external_asset_graph_from_defs
    normalize_metadata(metadata=metadata_by_asset_key[asset_key], metadata_entries=[]),
  File "/lib/python3.9/site-packages/dagster/core/definitions/metadata/__init__.py", line 100, in normalize_metadata
    return [
  File "/lib/python3.9/site-packages/dagster/core/definitions/metadata/__init__.py", line 101, in <listcomp>
    package_metadata_value(k, v)
  File "/lib/python3.9/site-packages/dagster/core/definitions/metadata/__init__.py", line 145, in package_metadata_value
    raise DagsterInvalidMetadata(

jamie

07/19/2022, 4:42 PM

hi @geoHeil are you manually attaching metadata where this error is being thrown? seeing that (or an equivalent structure if you don't want to share your exact code) would be helpful for debugging

geoHeil

07/19/2022, 4:42 PM

Unfortunately, this is not part of the stacktrace - and I am currently searching the codebase for a suitable location

geoHeil

07/19/2022, 4:43 PM

I do have someting like: metadata_entries=[ MetadataEntry.float(1.0, "completeness"), MetadataEntry.md(markdown_schema_of_dataframe, "schema"), ],

owen

07/19/2022, 4:44 PM

I think this is the result of a bug introduced in that release, which should be a quick patch for this week -- is that metadata attached to a SourceAsset or a regular asset?

geoHeil

07/19/2022, 4:44 PM

regular multi_asset

owen

07/19/2022, 4:45 PM

got it, thank you

geoHeil

07/19/2022, 4:45 PM

is there any workaround for now?

geoHeil

07/19/2022, 4:45 PM

or should I downgrade?

owen

07/19/2022, 4:45 PM

I think downgrading would be the easiest -- I don't see a convenient workaround

owen

07/19/2022, 4:45 PM

sorry about that!

geoHeil

07/19/2022, 4:56 PM

confirmed - a downgrade makes it work again

geoHeil

07/19/2022, 4:56 PM

is there an issue I can track?

Jordan

07/19/2022, 4:59 PM

Hi ! Same issue with

None

type : https://dagster.slack.com/archives/C01U954MEER/p1657872953312049

owen

07/19/2022, 5:05 PM

https://github.com/dagster-io/dagster/issues/8944 just threw up an issue, and am currently looking into a fix

owen

07/19/2022, 5:21 PM

alright I have a fix out for this, and it should get in for the next release. thanks for the reports!

geoHeil

07/19/2022, 5:23 PM

Great. Can you explain:

allow_invalid

does this mean I should define the metadata in a different way?

owen

07/19/2022, 5:32 PM

good question -- by default (regardless of if

allow_invalid

is set to True or False), dagster will try to convert each element of a metadata dictionary to a

MetadataValue()

instance. For example, if you have

{"a": 1}

as your metadata, this will get converted to

{"a": <http://MetadataValue.int|MetadataValue.int>(1)}

. This normalization applies to a bunch of different standard classes like strings, bools, dictionaries, etc. It happens that the two specific data types you both are running into issues with (

None

and

List

) are not handled by this automatic conversion, although it's possible that they should be.

geoHeil

07/19/2022, 5:33 PM

So you disabled the check? Should I specify my types differently in the future? Or continue using the API like I am doing right now?

owen

07/19/2022, 5:36 PM

there's some fallback behavior that gets enabled when allow_invalid is set to true (basically it just converts the unknown entity into a string). In the near-ish future, you should probably convert to using the

metadata={"label1": value1, "label2":value2}

format for your metadata, as I believe we're deprecating the

metadata_entries=[MetadataEntry.foo(value1, "label1")]

format in

1.0.0

owen

07/19/2022, 5:38 PM

you can continue using the current format until then. also, if you can track down a point in your code where you're specifying list-type metadata, you can likely fix the error you're seeing in 0.15.6 by converting that list to a string (although this will be unnecessary after the fix goes out in 0.15.7)

geoHeil

07/19/2022, 5:40 PM

What about more complex types like the markdown formatting it only seems to work (to get the nice display in dagit) when using this specific one and not the generic string.

owen

07/19/2022, 5:40 PM

for those cases, you can do

{"md_label": MetadataValue.md("metadata string here")}

geoHeil

07/19/2022, 8:34 PM

I have replaced the list with metadatavalue - but still cannot get rid of this error

owen

07/19/2022, 8:35 PM

hm is it the same exact error message still? (

Could not resolve the metadata value for "key" to a known type. Its type was <class 'list'>

)

owen

07/19/2022, 8:35 PM

and what's the original / new code?

geoHeil

07/19/2022, 8:36 PM

I still do have some

yield <http://MetadataEntry.int|MetadataEntry.int>(value=row_count, label="row_count")

of these snippets. - but they do not contain a list

geoHeil

07/19/2022, 8:36 PM

In the fixed version it looks like: metadata_entries={ "completeness": 1.0, "schema":MetadataValue.md(markdown_schema), },

geoHeil

07/19/2022, 8:37 PM

before the fix: https://dagster.slack.com/archives/C01U954MEER/p1658249033023989?thread_ts=1658248443.566109&cid=C01U954MEER

owen

07/19/2022, 8:39 PM

hm could there be metadata being added anywhere else? my interpretation of the error is that there's a metadata entry with the label "key", and a value of type list.

geoHeil

07/19/2022, 9:00 PM

found it

geoHeil

07/19/2022, 9:01 PM

metadata={ "foo": ["foo"], "baz": [SOME_VARIABLE], },

geoHeil

07/19/2022, 9:02 PM

it is already in the dict syntax and deliberately meant to process the list (configuration)

geoHeil

07/19/2022, 9:02 PM

Is there a better way to read this configuration for an asset (in current releases of dagster) than from the metadata?

owen

07/19/2022, 9:11 PM

metadata is definitely the right place to put this, and I think we should really just be coercing lists to a MetadataValue.json() (the same thing we do for dictionaries). A bit of a hacky workaround (if you're interested), would be to do

metadata={"foo": {"list": ["foo"]}, "bar": {"list": [SOME_VARIABLE]}}

, then just accessing metadata["foo"]["list"] instead of metadata["foo"]. Dagster can automatically convert dictionaries into metadata values, while it can't do the same for lists.

geoHeil

07/19/2022, 9:25 PM

Will this work then again with this weeks update?

owen

07/19/2022, 9:27 PM

yep the update will restore the previous behavior, so regular lists will not cause errors. they'll show up a bit strangely in dagit I believe (although their dagit representation will be identical from 0.15.5 to 0.15.7, so if it wasn't an issue before, then it won't be after the update)

geoHeil

07/21/2022, 5:29 AM

@owen would it be possible to pass a python function in the metadata? I.e. kind of as a higher order function to the IO manager?

geoHeil

07/21/2022, 5:29 AM

Also do you think this would be sensible?

geoHeil

07/21/2022, 5:32 AM

My usecase is as follows: I am using spark to export a dataframe with a JSON column to postgres. Sadly, spark does not have a native JSON datatype and can only interoperate with postgres on a

text

datatype column. As a result the IO manager needs to 1) write to postgres 2) fixup the datatype to

jsonb

by executing some sql statements. However, most assets which are materialized by the IO manager do not need this treatment. I was thinking about enabling a generic higher order cleanup function in case other assets need special treatment as well in the future.

geoHeil

07/21/2022, 12:08 PM

Could not resolve the metadata value for "foo_function" to a known type. Its type was <class 'function'>. Consider wrapping the value with the appropriate MetadataValue type

geoHeil

07/21/2022, 1:40 PM

@owen furthermore I think you have more bugs in 0.15.6: I switched over to the list hotfix you suggested. But: Param "metadata_entries" is not one of ['frozenlist', 'list'] which is type <class 'dict'>. it now fails with a different error when trying to run the job

geoHeil

07/21/2022, 1:51 PM

This time, the error resides in a different location though: In the yielded

Output

object of the

MultiAsset

. There, I needed to move back to

MetadataValue

style notation of the metadata. Didn`t you tell me that this is meant to be obsolete? What are your plans for the future? And why does it sound like input definition and output metadata could be potentially inconsistent?

owen

07/21/2022, 4:41 PM

@geoHeil there are two different arguments for list-based vs. dictionary-based metadata: the

metadata_entries

argument accepts a list (this will be deprecated), while the

metadata

argument accepts a dictionary (this is the recommend argument going forward). I think changing the argument name should resolve the issue you're seeing

owen

07/21/2022, 4:42 PM

Where are you defining the metadata_entries argument in this case? Is it on the Output object or somewhere else?

geoHeil

07/21/2022, 4:43 PM

correct. On

Output

. Let me swap this back to dict then.

owen

07/21/2022, 4:48 PM

By 1.0, metadata is meant to just be metadata that is sensible to represent in the UI (so things that can be serialized and displayed, essentially), so having a function as metadata is not ideal. If the cleanup functions are part of the functionality of the IOManager, it might make more sense to put those functions inside the IOManager class itself, and "index" into those functions with some string-based metadata (i.e. you have some metadata

{"fixup_type": "jsonb"}

, then the IOManager reads the

fixup_type

value to execute one of a few possible cleanup functions).

🌈 1

Open in Slack

Previous Next