Mycchaka Kleinbort
03/16/2023, 12:43 PMmy_metadata(x)
that returns the metadata I want for each object type x:
Eg. if x is a table it gives me the column names
Eg. if x is an sklearn RandomForest it gives me the features hyper-parameters
Eg, if x is a LogisticRegression it gives me the features & weights
Etc...
The implementation is basically:
def my_metadata(asset, extra_metadata=dict()):
if isinstance(asset, pl.DataFrame):
return Output(asset, metadata=metadata_from_polars(asset)|extra_metadata)
elif isinstance(asset, list):
return Output(asset, metadata=metadata_from_list(asset)|extra_metadata)
elif isinstance(asset, CatBoost):
return Output(asset, metadata=metadata_from_catboost(asset)|extra_metadata)
elif isinstance(asset, str):
return Output(asset, metadata=metadata_from_str(asset)|extra_metadata)
elif ...
...
else:
print(f'No metadata will be written for asset of type {type(asset)}')
return asset
Howver, most of my software defined assets now have to call this function:
@asset(...)
def some_asset():
x_ans = ...
return my_metadata(x_ans)
Is there a way to make the my_metadata
implicit? or to add it to the IO-manager or somewhere else?Malo PARIS
03/16/2023, 12:58 PMIOManager
to catch automatically assets
metadata.
Here an example of IOmanager
class with handle_output
method:
class MSSQLAlchemyIOManager(IOManager):
def _init_(self, connection_string):
self.engine = create_engine(connection_string, fast_executemany=True)
def handle_output(self, context, obj):
asset_key = context.asset_key
table_name = f"{asset_key.path[-1]}"
if isinstance(obj, pd.DataFrame):
context.add_output_metadata(
{"num_rows": len(obj),
"table_name": table_name,
"preview": MetadataValue.md(obj.head().to_markdown())})
Mycchaka Kleinbort
03/16/2023, 1:03 PM