Alexander Buck
08/23/2023, 8:17 PMSummary
-> Use a @sensor
2. Start a job with that file that creates several assets -> Create a job
that the @sensor
triggers with the file in the RunConfig
and an appropriate run_key
that identifies this particular Summary
file from other summary files that exist in other folders.
3. One of those assets is a list of other files. Start an instance of a given job for every file in this list. -> ?? Not sure how an asset can trigger another job. (@asset_sensor?
).
1. These jobs can fail because those other may not yet exist, i.e. the file upload process is ongoing and these files may not have copied yet. I’d like this to keep retrying until they do exist.
2. The assets in this second job set are all dependent on assets materialized in the first job.
I just don’t know how to do step 3. If this is not at all the right way of tackling this problem with Dagster, I’m open to suggestions on how to re-think this task.Alexander Buck
08/23/2023, 8:20 PM@asset_sensor
doesn’t seem like the right fit, because it runs once for a new materialization. If the additional files don’t yet exist (i.e. they’re still copying into the object storage) then their RunRequests will fail and never get re-attempted.jamie
08/23/2023, 8:31 PMAlexander Buck
08/23/2023, 8:57 PM@run_status_sensor
I'll give that a look! Thanks! I think using assets with IO managers to handle the database transactions should be the same as an op that interacts with the table directly.
Writing the list of subsequent files to parse to a database was an idea a coworker had. A second sensor would query that table and try to dispatch jobs on those. That run status sensor could come in handy here so I can update that table when a job runs successfully and the file is no longer needed in the table!