Hi, I have a question regarding IO manager. I have...
# ask-community
Hi, I have a question regarding IO manager. I have an
in the raw data zone, and a
asset in the cleansed data zone, and both assets should be materialized as files in S3. When storing upstream asset in S3, I want to have a timestamp (~= a version) in the file path, like
so that each time that asset is materialised, the old data is not overwritten. However, when building the downstream asset, only the latest version of the upstream asset should be used. I have two questions here: 1. Is there any thing designed for IO-managers to store that version thing, like a state, which is stored in
, and read in
? 2. When I execute my job in one single process, is there anyway that I can avoid loading the upstream asset from S3 for building the downstream asset (i.e. having something like in-memory IO-manager but still have the upstream asset materialised in S3)? Thanks!
We're tracking functionality that would make this easier here: https://github.com/dagster-io/dagster/issues/8521. In that issue, there's a suggestion for a workaround
Thanks @sandy. How about my 2nd question? Are there any recommendations for doing that?
Sorry I missed your second question. We don't have out-of-the-box support for that pattern, but it would be possible to write your own IO manager that does that
thanks @sandy