Hello, I'm trying to load an asset to s3 but at th...
# ask-community
h
Hello, I'm trying to load an asset to s3 but at the same time I want to pass the Pandas Dataframe generated in the asset-A function to asset-B for further processing. Just wondering how could this be done.
s
I don’t get 100% what you meant and I’m even not a dagster support guy. But I would say you can define job dependencies to do so. https://docs.dagster.io/concepts/ops-jobs-graphs/jobs
c
Hi Hassan. You could set asset A's IO manager to persist the output to s3 (maybe
s3_pickle_io_manager
). After asset A finishes executing, its output will be written to s3. Then, you can define asset B to depend on A (https://docs.dagster.io/concepts/assets/software-defined-assets#defining-basic-managed-loading-dependencies). When asset B executes, it will load asset A from s3 and execute from there
h
thank you both. So I realized that there is an issues with Pandas Dataframe when exported as a csv file object (or json) and loaded to S3. The generated file includes some extra characters which makes it unusable for asset B downstream. :(