Philippe Laflamme
02/16/2023, 10:05 PMUPathIOManager
than can provide pandas.DataFrame
or pyarrow.Table
This generally works fine and I can switch back and forth, but when I have a downstream asset that requires multiple upstream partitions, I get type mismatch error i.e.: DagsterTypeCheckDidNotPass
. The only way I can get it to work is to have the downstream asset use upstream: Dict[str, X]
where X
is the type annotation on the IO manager’s def load_from_path(...) -> X
method. I can’t seem to find the way to annotate my methods to make this dynamic.owen
02/16/2023, 10:30 PMPhilippe Laflamme
02/16/2023, 10:51 PMload_from_path
method in such a way that type checking works for both upstream: Dict[str, pd.DataFrame]
and upstream: Dict[str, pa.Table]
?Philippe Laflamme
02/16/2023, 10:52 PMload_from_path() -> Union[pd.DataFrame, pa.Table]
but that doesn’t type checkowen
02/16/2023, 11:00 PMDict[str, A]
, where A
is exactly equal to the return annotation of load_from_path
.owen
02/16/2023, 11:01 PMload_from_path()
as Any
owen
02/16/2023, 11:01 PMPhilippe Laflamme
02/16/2023, 11:01 PMPhilippe Laflamme
02/16/2023, 11:04 PMCheckError
in this caseowen
02/16/2023, 11:04 PMPhilippe Laflamme
02/16/2023, 11:04 PMPhilippe Laflamme
02/16/2023, 11:05 PMdagster._check.CheckError: Failure condition: Received `typing.Dict[str, polars.internals.dataframe.frame.DataFrame]` type in input of DagsterType <dagster._core.types.python_dict._TypedPythonDict object at 0x7fff811c2800>, but `<bound method PartitionedParquetIOManager.load_from_path of <core.parquet_io_manager.PartitionedParquetIOManager object at 0x7fff80f9b670>>` has typing.Any type annotation for obj. They should be both specified with type annotations and match. If you are loading multiple partitions, the upstream asset type annotation should be a typing.Dict.
owen
02/16/2023, 11:07 PMPhilippe Laflamme
02/16/2023, 11:27 PMDict[str, X]
from the downstream asset gives me this:
dagster._check.CheckError: Failure condition: Inputs of type <dagster._core.types.dagster_type._Any object at 0x7fffe974afb0> not supported. Please specify a valid type for this input either on the argument of the @asset-decorated function.
Philippe Laflamme
02/16/2023, 11:27 PMPhilippe Laflamme
02/16/2023, 11:30 PM-> Any
from load_from_path
produces this:
dagster._check.CheckError: Failure condition: Received `typing.Dict[str, polars.internals.dataframe.frame.DataFrame]` type in input of DagsterType <dagster._core.types.python_dict._TypedPythonDict object at 0x7fff811c2800>, but `<bound method PartitionedParquetIOManager.load_from_path of <core.parquet_io_manager.PartitionedParquetIOManager object at 0x7fff80f9f670>>` has <class 'inspect._empty'> type annotation for obj. They should be both specified with type annotations and match. If you are loading multiple partitions, the upstream asset type annotation should be a typing.Dict.
owen
02/16/2023, 11:41 PMload_from_path() -> Union[pandas.DataFrame, pyarrow.Table]
@asset
def downstream_asset(inp: Dict[str, Union[pandas.DataFrame, pyarrow.Table]])
might do the trickowen
02/16/2023, 11:41 PMPhilippe Laflamme
02/17/2023, 4:15 AMowen
02/17/2023, 6:22 PMPhilippe Laflamme
02/17/2023, 6:58 PM