Uri Laserson
12/24/2022, 3:58 PMsandy
12/27/2022, 4:56 PMUri Laserson
12/28/2022, 2:27 AMAssetMaterialization
? Would this be a workaround for the current limitation, where I could explicitly tell Dagster in some arbitrary parametrized job that I just created a new asset?
Another somewhat related question: how does this interact with "staleness" of assets? If I can bump a code_version which makes assets stale, does that only happen upon dagit reloading the code? (Is this a common/lightweight operation?) Would another workaround here be that I can keep some sort of config object in the git repo that lists all the sequencing runs we have (and generates assets from them). So when we perform another experiment, we'd add it to the config and reload the code?
Though currently we track lots of experiments in a Notion database. It would be awesome if dagit could just read from it and define all of our assets from it.sandy
12/28/2022, 5:59 PMIf I understand the docs correctly, this "limitation" of static partitions applies only to "assets", but not to non-asset jobs, is that right? With a non-asset job, I basically configure a run with whatever the new sequencing run identifier would be?Exactly
If that's all correct, then what is the role ofExactly. This is purely for observability. You can attach metadata entries to these AssetMaterializations and view them in the asset catalog. You could also put your sequencing run in the? Would this be a workaround for the current limitation, where I could explicitly tell Dagster in some arbitrary parametrized job that I just created a new asset?AssetMaterialization
partition
field, which would allow them to show up nicely when we eventually ship runtime asset partitions.
Another somewhat related question: how does this interact with "staleness" of assets? If I can bump a code_version which makes assets stale, does that only happen upon dagit reloading the code? (Is this a common/lightweight operation?)Right. And yes, it's pretty lightweight - it should generally happen whenever you git push to master.
Would another workaround here be that I can keep some sort of config object in the git repo that lists all the sequencing runs we have (and generates assets from them). So when we perform another experiment, we'd add it to the config and reload the code?Yeah - that's sometimes what we recommend in your situation. You can trigger a reload over GraphQL if you want to automate it. If that's an option for you, I would probably build a
StaticPartitionsDefinition
that contains all the sequencing runs, rather than have an asset for each sequencing run, so that the asset graph doesn't get too unwieldy.One more though: would another approach be to make one of my assets the list of experiments? Every time I rematerialize it, it pulls the latest data from my resource. Basically, every other asset would have this one as an upstream dependency. Perhaps this would just defeat the purpose because every time I rematerialize the upstream list of experiments, every single downstream asset (i.e., all of them) would be marked stale?When you say "Basically, every other asset would have this one as an upstream dependency.", is there some circularity there? Because the question of what downstream assets (or asset partitions) even exist would depend on that list-of-experiments asset?
Uri Laserson
12/28/2022, 8:19 PMis there some circularity there? Because the question of what downstream assets (or asset partitions) even exist would depend on that list-of-experiments asset?Interesting. Isn't that also kinda how it works with date partitions? There is something that computes which assets should exist based on some information outside of the asset definition itself, no? (i.e., the current date)
This is purely for observability. You can attach metadata entries to these AssetMaterializations and view them in the asset catalog.So, if I create an asset through emitting an
AssetMaterialization
, is there a way for me to write a software-defined asset that is downstream of it?sandy
12/29/2022, 12:09 AMThere is something that computes which assets should exist based on some information outside of the asset definition itself, no? (i.e., the current date)
Right - but that's basically a special case that’s built into the framework
And relatedly, is the set of asset materializations etc (and I guess all the metadata tracked by dagit) persisted somewhere and across reloads of dagit?Yes - Dagit runs on top of a database, SQLite by default, usually Postgres in production.
So, if I create an asset through emitting anYeah, you can do that, is there a way for me to write a software-defined asset that is downstream of it?AssetMaterialization
Uri Laserson
12/29/2022, 12:14 AMsandy
12/29/2022, 12:21 AM