Travis DePriest04/14/2023, 3:52 PM
Qwame04/14/2023, 4:43 PM
claire04/14/2023, 5:45 PM
downstream of the csv asset that is partitioned with dynamic partitions. Then, in your sensor that evaluates the successful completion of the csv asset, you could: 1. Evaluate all of the pdfs, creating dynamic partitions for the pdfs that don't exist 2. Yield a partitioned run request for each new pdf
to early exit if the PDF already exists. In terms of using ops versus assets, I think with both it's possible to re-request PDFs accidentally, i.e. by kicking off additional runs. But with assets you'll get additional observability like knowing if you've already requested one PDF, whereas with ops you can't see when/if it's been run for a given PDF. So I'd advocate for using assets instead, and baking in the file check if the sensor approach doesn't offer you a full guarantee.
Travis DePriest04/14/2023, 6:28 PM
claire04/18/2023, 9:00 PM
function: https://docs.dagster.io/concepts/assets/software-defined-assets#loading-asset-values-outside-of-dagster-runs If loading the asset ends up being an expensive operation, you could also add metadata (e.g. the new file paths) to the csv asset's materialization, then within your sensor load the attached metadata.