Brian Stroh
02/14/2024, 7:44 PMScout
02/14/2024, 7:44 PMLocalFileSystemIOManager
that is designed to handle the output of assets by saving them as CSV files on the local filesystem. Here's a simplified version of the example from the documentation: python import os import pandas as pd from dagster import AssetKey, ConfigurableIOManager class LocalFileSystemIOManager(ConfigurableIOManager): """Translates between Pandas DataFrames and CSVs on the local filesystem.""" def _get_fs_path(self, asset_key: AssetKey) -> str: rpath = os.path.join(*asset_key.path) + ".csv" return os.path.abspath(rpath) def handle_output(self, context, obj: pd.DataFrame): """This saves the dataframe as a CSV.""" fpath = self._get_fs_path(context.asset_key) obj.to_csv(fpath) def load_input(self, context): """This reads a dataframe from a CSV.""" fpath = self._get_fs_path(context.asset_key) return pd.read_csv(fpath)
This IO manager has methods for both writing (handle_output
) and reading (load_input
) Pandas DataFrames as CSV files. The _get_fs_path
method constructs the file path where the CSV file will be stored or read from, based on the asset key. The handle_output
method uses the Pandas to_csv
method to write the DataFrame to a CSV file, and the load_input
method uses the Pandas read_csv
method to read a CSV file into a DataFrame.Brian Stroh
02/14/2024, 8:43 PMScout
02/14/2024, 8:43 PM_get_path()
or any other internal method, are not intended to be called directly by user code during the execution of an asset or op. The IO manager's methods handle_output
and load_input
are called by the Dagster framework to manage the storage and retrieval of data. If you need to access the path of an upstream asset's data within another asset, you typically would not do this by calling a method on the IO manager directly. Instead, you would rely on the IO manager to automatically load the required data when the asset is executed. However, if you have a specific use case where you need to construct the path to an asset's data outside of the IO manager's standard methods, you would need to ensure that the logic for constructing the path is accessible in a way that is consistent with how the IO manager constructs it. This might involve duplicating some of the path construction logic in a utility function that can be used both by the IO manager and by your user code. It's important to note that directly accessing or manipulating the paths used by an IO manager is not a common pattern in Dagster and could lead to issues with maintainability and clarity of your data pipeline. If you find yourself needing to do this, it may be worth reconsidering the design of your pipeline or exploring other features of Dagster that could better suit your needs.