how can I debug this stacktrace? I upgraded my dag...
# ask-community
g
how can I debug this stacktrace? I upgraded my dagster instance from 0.15.5 to 0.15.6 and have a hard time figuring out the details:
Copy code
/lib/python3.9/site-packages/dagster/core/workspace/context.py:554: UserWarning: Error loading repository location dagster.core.errors.DagsterInvalidMetadata: Could not resolve the metadata value for "key" to a known type. Its type was <class 'list'>. Consider wrapping the value with the appropriate MetadataValue type.

Stack Trace:
  File "/lib/python3.9/site-packages/dagster/grpc/server.py", line 485, in _get_serialized_external_repository_data
    external_repository_data_from_def(recon_repo.get_definition())
  File "/lib/python3.9/site-packages/dagster/core/host_representation/external_data.py", line 791, in external_repository_data_from_def
    external_asset_graph_data=external_asset_graph_from_defs(
  File "/lib/python3.9/site-packages/dagster/core/host_representation/external_data.py", line 887, in external_asset_graph_from_defs
    normalize_metadata(metadata=metadata_by_asset_key[asset_key], metadata_entries=[]),
  File "/lib/python3.9/site-packages/dagster/core/definitions/metadata/__init__.py", line 100, in normalize_metadata
    return [
  File "/lib/python3.9/site-packages/dagster/core/definitions/metadata/__init__.py", line 101, in <listcomp>
    package_metadata_value(k, v)
  File "/lib/python3.9/site-packages/dagster/core/definitions/metadata/__init__.py", line 145, in package_metadata_value
    raise DagsterInvalidMetadata(
j
hi @geoHeil are you manually attaching metadata where this error is being thrown? seeing that (or an equivalent structure if you don't want to share your exact code) would be helpful for debugging
g
Unfortunately, this is not part of the stacktrace - and I am currently searching the codebase for a suitable location
I do have someting like: metadata_entries=[ MetadataEntry.float(1.0, "completeness"), MetadataEntry.md(markdown_schema_of_dataframe, "schema"), ],
o
I think this is the result of a bug introduced in that release, which should be a quick patch for this week -- is that metadata attached to a SourceAsset or a regular asset?
g
regular multi_asset
o
got it, thank you
g
is there any workaround for now?
or should I downgrade?
o
I think downgrading would be the easiest -- I don't see a convenient workaround
sorry about that!
g
confirmed - a downgrade makes it work again
is there an issue I can track?
j
o
https://github.com/dagster-io/dagster/issues/8944 just threw up an issue, and am currently looking into a fix
alright I have a fix out for this, and it should get in for the next release. thanks for the reports!
g
Great. Can you explain:
allow_invalid
does this mean I should define the metadata in a different way?
o
good question -- by default (regardless of if
allow_invalid
is set to True or False), dagster will try to convert each element of a metadata dictionary to a
MetadataValue()
instance. For example, if you have
{"a": 1}
as your metadata, this will get converted to
{"a": <http://MetadataValue.int|MetadataValue.int>(1)}
. This normalization applies to a bunch of different standard classes like strings, bools, dictionaries, etc. It happens that the two specific data types you both are running into issues with (
None
and
List
) are not handled by this automatic conversion, although it's possible that they should be.
g
So you disabled the check? Should I specify my types differently in the future? Or continue using the API like I am doing right now?
o
there's some fallback behavior that gets enabled when allow_invalid is set to true (basically it just converts the unknown entity into a string). In the near-ish future, you should probably convert to using the
metadata={"label1": value1, "label2":value2}
format for your metadata, as I believe we're deprecating the
metadata_entries=[MetadataEntry.foo(value1, "label1")]
format in
1.0.0
.
you can continue using the current format until then. also, if you can track down a point in your code where you're specifying list-type metadata, you can likely fix the error you're seeing in 0.15.6 by converting that list to a string (although this will be unnecessary after the fix goes out in 0.15.7)
g
What about more complex types like the markdown formatting it only seems to work (to get the nice display in dagit) when using this specific one and not the generic string.
o
for those cases, you can do
{"md_label": MetadataValue.md("metadata string here")}
g
I have replaced the list with metadatavalue - but still cannot get rid of this error
o
hm is it the same exact error message still? (
Could not resolve the metadata value for "key" to a known type. Its type was <class 'list'>
)
and what's the original / new code?
g
I still do have some
yield <http://MetadataEntry.int|MetadataEntry.int>(value=row_count, label="row_count")
of these snippets. - but they do not contain a list
In the fixed version it looks like: metadata_entries={ "completeness": 1.0, "schema":MetadataValue.md(markdown_schema), },
o
hm could there be metadata being added anywhere else? my interpretation of the error is that there's a metadata entry with the label "key", and a value of type list.
g
found it
metadata={ "foo": ["foo"], "baz": [SOME_VARIABLE], },
it is already in the dict syntax and deliberately meant to process the list (configuration)
Is there a better way to read this configuration for an asset (in current releases of dagster) than from the metadata?
o
metadata is definitely the right place to put this, and I think we should really just be coercing lists to a MetadataValue.json() (the same thing we do for dictionaries). A bit of a hacky workaround (if you're interested), would be to do
metadata={"foo": {"list": ["foo"]}, "bar": {"list": [SOME_VARIABLE]}}
, then just accessing metadata["foo"]["list"] instead of metadata["foo"]. Dagster can automatically convert dictionaries into metadata values, while it can't do the same for lists.
g
Will this work then again with this weeks update?
o
yep the update will restore the previous behavior, so regular lists will not cause errors. they'll show up a bit strangely in dagit I believe (although their dagit representation will be identical from 0.15.5 to 0.15.7, so if it wasn't an issue before, then it won't be after the update)
g
@owen would it be possible to pass a python function in the metadata? I.e. kind of as a higher order function to the IO manager?
Also do you think this would be sensible?
My usecase is as follows: I am using spark to export a dataframe with a JSON column to postgres. Sadly, spark does not have a native JSON datatype and can only interoperate with postgres on a
text
datatype column. As a result the IO manager needs to 1) write to postgres 2) fixup the datatype to
jsonb
by executing some sql statements. However, most assets which are materialized by the IO manager do not need this treatment. I was thinking about enabling a generic higher order cleanup function in case other assets need special treatment as well in the future.
Could not resolve the metadata value for "foo_function" to a known type. Its type was <class 'function'>. Consider wrapping the value with the appropriate MetadataValue type
@owen furthermore I think you have more bugs in 0.15.6: I switched over to the list hotfix you suggested. But: Param "metadata_entries" is not one of ['frozenlist', 'list'] which is type <class 'dict'>. it now fails with a different error when trying to run the job
This time, the error resides in a different location though: In the yielded
Output
object of the
MultiAsset
. There, I needed to move back to
MetadataValue
style notation of the metadata. Didn`t you tell me that this is meant to be obsolete? What are your plans for the future? And why does it sound like input definition and output metadata could be potentially inconsistent?
o
@geoHeil there are two different arguments for list-based vs. dictionary-based metadata: the
metadata_entries
argument accepts a list (this will be deprecated), while the
metadata
argument accepts a dictionary (this is the recommend argument going forward). I think changing the argument name should resolve the issue you're seeing
Where are you defining the metadata_entries argument in this case? Is it on the Output object or somewhere else?
g
correct. On
Output
. Let me swap this back to dict then.
o
By 1.0, metadata is meant to just be metadata that is sensible to represent in the UI (so things that can be serialized and displayed, essentially), so having a function as metadata is not ideal. If the cleanup functions are part of the functionality of the IOManager, it might make more sense to put those functions inside the IOManager class itself, and "index" into those functions with some string-based metadata (i.e. you have some metadata
{"fixup_type": "jsonb"}
, then the IOManager reads the
fixup_type
value to execute one of a few possible cleanup functions).
🌈 1