Travis DePriest
04/14/2023, 3:52 PMQwame
04/14/2023, 4:43 PMclaire
04/14/2023, 5:45 PMpdfs_asset
downstream of the csv asset that is partitioned with dynamic partitions. Then, in your sensor that evaluates the successful completion of the csv asset, you could:
1. Evaluate all of the pdfs, creating dynamic partitions for the pdfs that don't exist
2. Yield a partitioned run request for each new pdfpdfs_asset
to early exit if the PDF already exists.
In terms of using ops versus assets, I think with both it's possible to re-request PDFs accidentally, i.e. by kicking off additional runs. But with assets you'll get additional observability like knowing if you've already requested one PDF, whereas with ops you can't see when/if it's been run for a given PDF. So I'd advocate for using assets instead, and baking in the file check if the sensor approach doesn't offer you a full guarantee.Travis DePriest
04/14/2023, 6:28 PMclaire
04/18/2023, 9:00 PMDefinitions.load_asset_value
function: https://docs.dagster.io/concepts/assets/software-defined-assets#loading-asset-values-outside-of-dagster-runs
If loading the asset ends up being an expensive operation, you could also add metadata (e.g. the new file paths) to the csv asset's materialization, then within your sensor load the attached metadata.