Aasim Sani
02/09/2024, 9:20 PMclass S3DatasetItem:
"""Simple dataclass for the s3 csv file"""
data: pd.core.frame.DataFrame
filename: str
Scout
02/09/2024, 9:20 PMpython from dataclasses import dataclass import pandas as pd from dagster import asset @dataclass class S3DatasetItem: """Simple dataclass for the s3 csv file""" data: pd.DataFrame filename: str @asset def my_asset() -> S3DatasetItem: # Perform some computation to get a DataFrame data = pd.DataFrame(...) filename = "my_data.csv" # Return an instance of S3DatasetItem return S3DatasetItem(data=data, filename=filename)
In this example, my_asset
is an asset function that returns an S3DatasetItem
instance. The IOManager responsible for storing the output of this asset would need to know how to serialize and deserialize S3DatasetItem
objects. If you're using a custom IOManager, you would implement the handle_output
method to serialize the S3DatasetItem
to persistent storage and the load_input
method to deserialize it when used as an input to another asset or operation. Please note that the above example is conceptual and does not include the implementation details for the IOManager, which would be necessary for this to work in a real Dagster pipeline. If you need to implement a custom IOManager for your dataclass, you would refer to the documentation on IO Managers for guidance on how to do so.