Chris Comeau
07/25/2023, 10:42 PMcontext.<http://node_handle.to|node_handle.to>_string()
for unpartitioned assets, and f"{context.<http://node_handle.to|node_handle.to>_string()}__{context.asset_partition_key_for_output()}"
if partitioned.
I'm wondering if I'm missing something fundamental about the execution model here. It looks like I would need to add a unique concurrency limit for every asset name to achieve this in the experimental setup, which doesn't seem right (like it's so surprising that Dagster doesn't do this already that I feel like I'm missing something basic)
In the screenshot here, just showing what happens if I repeatedly click "materialize" on the same asset - those runs+ops all start overlapping execution and lead to various messes in the target table depending on timing (e.g. one drops while another is inserting, etc). It might make sense to have this asset-level op concurrency as a sensible default when the global concurrency limits are enabled, so these asset ops would have queued instead, just as a result of being the same asset op (not pooling for other reasons like resource utilization, just avoiding this kind of data processing error with concurrent execution of the same thing)
My approach with the semaphore has this situation covered in the mean time. Ideally I think this would be a default safeguard when the global limits are enabled, so all assets would avoid concurrent execution with themselves without needing to add those asset-level keys for all of them. It looks like it doesn't work that way yet, since this still happens with the experimental feature enabled, but maybe it could.sandy
07/25/2023, 11:12 PMprha
07/25/2023, 11:17 PM1
by default?Chris Comeau
07/25/2023, 11:23 PMAbhishek Agrawal
07/26/2023, 12:43 AM