dpad
05/26/2022, 5:50 AMAssetGroup.build_job().execute_in_process()
. I don't want to set up a dagster instance yet.
I have an asset that takes a long amount of time to generate. I want to be able to configurably use the memoized results of this asset, preferably only when the user says so (i.e. by default, run the long computation every time to generate the asset). This asset later gets used by some other downstream assets.
• I've tried to use the memoization API but had various problems with it (among which was having to set up a temporary dagster instance with Instance.local_temp()
, but also some other issues that I don't fully remember).
• An easy option would be to provide a configuration option to the asset, but it seems that @asset
does not support configuration/config schemas?
• Another option would be to have a configurable resource that feeds into the asset, but this seems like a strange workaround?
• Previously I had two separate jobs built manually, and had 2 assets with the same asset name (one to do the long computation, one to simply read the latest stored results), so downstream assets didn't care which of the 2 assets I had picked. However, now I'm using AssetGroup.from_package
, but it can't accept two assets with the same name, so I have to still set up separate AssetGroups manually listing all their upstream and downstream assets, which is a bit of a pain.
Just wanted to see if there were any recommendations on doing something like this, or which of the above options feels cleanest?sean
05/26/2022, 11:29 AMdpad
05/27/2022, 1:10 AMsean
05/27/2022, 12:33 PM