Reid Beels
10/19/2022, 1:34 AMAssetDefinition.from_graph
no longer have independent keys and I don’t see an obvious way to set them.
e.g. this results in multiple named {env}_db
assets due to the use of configured
refresh_assets = [
AssetsDefinition.from_graph(
refresh_db.configured(
{"environment_name": env}, name=f"{env}_db"
),
resource_defs={
"db_engine": resources.dev_db_engine,
},
group_name="dev_env_databases",
)
for env in ENVS
]
this results in duplicate asset keys because there’s no way to set a name
or key
in from_graph
refresh_assets = [
AssetsDefinition.from_graph(
refresh_db,
resource_defs={
"db_engine": resources.dev_db_engine.configured({"environment_name": env}),
},
group_name="dev_env_databases",
)
for env in ENVS
]
using with_resources
instead of the experimental resource_defs
also doesn’t seem to provide a way modify the asset name/key
anything I’m missing?chris
10/19/2022, 5:12 AMassets.py
or something; and then you might have two different repos dev
and prod
with definitions like so:
@repository
def dev_repo():
from .assets import refresh_assets
return *with_resources(refresh_assets, resource_defs=...)
Does that make sense?Reid Beels
10/19/2022, 6:48 AMAssetsDefinition.from_graph
is the place where the SDA is being defined and would provide a way to set the asset key independently from the graph name. Similar APIs like <http://graph.to|graph.to>_job
support naming, so why not here?
I can fake it out, of course, by giving refresh_db
a no-op config map and calling configured
to create separate named copies of the graph. This kinda works, but then I run into another issue that probably indicates I’m going about this wrong 😉
Once I have separate graph names and asset keys, I hit this error:
Conflicting versions of resource with key 'db_engine' were provided to different assets. When constructing a job, all resource definitions provided to assets must match by reference equality for a given key.
I am indeed trying to pass different configured resources to different assets under the db_engine
key, but I don’t believe I’m ever constructing a single job that references multiple assets.
All of this is coming as a follow up on this thread (https://dagster.slack.com/archives/C01U954MEER/p1665533857788599) and the suggestion I received there from @yuhan.
I need to generate a temporary database name and have that available both within ops and within a failure hook. The suggestion was to use a resource to pass this value along the chain and access that resource from the hook.
The resource that I’m passing along constructs a temporary DB name on initialization, based on the environment_name
passed in the resource config and the run_id
on the init_context
.
If resources can’t be configured per-asset, what’s the point of resource configuration?environment_name
config back to the graph/op level
2. pass that environment name to methods on the resource at runtimechris
10/20/2022, 9:01 PMI would think thatSo SDAs don't have names in the same way that jobs do. While a name uniquely identifies a job within a repository, the computation that produces a particular asset key is not necessarily uniquely identified by that asset key. A graph for example, can produce an arbitrary number of asset keys (specified here byis the place where the SDA is being defined and would provide a way to set the asset key independently from the graph name. Similar APIs likeAssetsDefinition.from_graph
support naming, so why not here?<http://graph.to|graph.to>_job
keys_by_output_name
arg in from_graph
). So it wouldn't make sense to have just a single asset_key
argument, since the graph can produce multiple. What you can do, however, is change which asset keys are mapped to a particular output, and in that way produce multiple different software-defined assets for the same graph.Once I have separate graph names and asset keys, I hit this error:
Conflicting versions of resource with key 'db_engine' were provided to different assets. When constructing a job, all resource definitions provided to assets must match by reference equality for a given key.
I am indeed trying to pass different configured resources to different assets under theThis is a known and unfortunate incompatibility, and stems from how the repository functions under the hood. Ideally, yes, you should be able to pass different resources for the same key for each asset, but in order to power the global materialization button, we need to be able to construct jobs from any combination of assets specified on the repository, and we don't currently have asset-level scoping of resources. No super clean workaround there aside from having separate repos that house the different resource sets unfortunately.key, but I don’t believe I’m ever constructing a single job that references multiple assets.db_engine
Reid Beels
10/28/2022, 4:47 PMfrom_graph
call.
I didn’t consider keys_by_output_name
, because it seemed like that required actually using outputs?