Hi, somewhat of a simple question - how do I set e...
# ask-community
m
Hi, somewhat of a simple question - how do I set extra attributes to an io manager that inherits from UPathManager? I am trying to set it up with a local cachedir so my final signature will be something like
Copy code
python
'io_manager': RemoteIOManagerWithLocalCache(base_dir='<az://dgster-assets/>',  cachedir='<c://dagster-cachedir/>')
c
This is something I tested out a while ago, so I'm not sure the version of Dagster this was made for and it will probably need to be updated. However, for what it is worth:
Copy code
import pandas as pd
from upath import UPath

from dagster import (
    Field,
    InitResourceContext,
    InputContext,
    OutputContext,
    UPathIOManager,
    io_manager,
)


class PandasParquetIOManager(UPathIOManager):
    extension: str = ".parquet"

    def dump_to_path(self, context: OutputContext, obj: pd.DataFrame, path: UPath):
        with path.open("wb") as file:
            obj.to_parquet(file)

    def load_from_path(self, context: InputContext, path: UPath) -> pd.DataFrame:
        print("loading file...")
        with path.open("rb") as file:
            return pd.read_parquet(file)



@io_manager(config_schema={"base_path": Field(str, is_required=False)})
def local_pandas_parquet_io_manager(
    init_context: InitResourceContext,
) -> PandasParquetIOManager:
    assert init_context.instance is not None  # to please mypy
    base_path = UPath(
        init_context.resource_config.get(
            "base_path", init_context.instance.storage_directory()
        )
    )
    return PandasParquetIOManager(base_path=base_path)
If this still works, you'd need to add
cachedir
to the
config_schema
and then edit your dump_to_path and load_from_path according to the cachedir logic you want.
m
I ended up snuggling in the cachedir as a storage option to the UPath - works well enough
I just didn't want to override the innit of UPathManager to track the optional cachedir
Though I probably will soon
👍 1