Does Dagster include any logic for improving effic...
# integration-snowflake
Does Dagster include any logic for improving efficiency of simultaneous asset materializations? For instance, if I have two jobs that run at the same time and each has 10 assets that can be materialized independently, does Dagster track performance associated with simultaneous materializations? For instance, Asset A and Asset B can be materialized together faster than Asset B and Asset C, etc... so that, over time, the execution of the jobs is more efficient?
I'm working in a Snowflake context, which is why I asked here, but I suppose it is not a Snowflake-specific question. In my scenario, all of the assets are being materialized through SQL queries on the same Snowflake Warehouse, so the "limiting reagent" to job completion is the warehouse capacity. Some assets are better to materialize alone because of the load they put on the warehouse, but others can easily be materialized together, etc.
For Dagster to try to optimize and fine tune concurrency, I think it would have to know the warehouse size and other parameters (auto scaling, max concurrency level, cache, etc.) These parameters could eventually be changed by the user when facing performance issues (or to reduce costs), and if Dagster doesn't know them, its concurrency "optimization" would be off
I have not tried it but maybe
show parameters in warehouse <warehouse_name>;
would provide these parameters
For sure there are potential problems. I wonder if there's a way that query complexity can be derived from an EXPLAIN statement and then be used in conjunction with warehouse parameters. This all is probably beyond Dagster scope. 🙂
You could potentially add it as metadata to the asset observation so that you could visualize over time?