Hi, I am trying to use the lazy loading repository...
# ask-community
s
Hi, I am trying to use the lazy loading repository and I am checking this class: class RepositoryData(ABC): But I don't see a method like "get_all_assets" similar to. "get_all_pipelines", "get_all_jobs", "get_all_sensors", "get_all_schedules" What method can I use to load my list of assets? ex: [asset, asset, asset]
c
Hmm...
RepositoryData
is an abstract base class, does the subclass have the assets definitions you need? The default subclass is
CachingRepositoryData
, which has a
get_assets_defs_by_key()
method
s
I was trying to follow this example. Do you have link on your doc what is proper way to load assets via lazy loading ?
CachingRepositoryData does not appear on your docs
I tried also this way but adding "assets" keyword but it didn't work. If you can guide me it will be great
c
Unfortunately there isn't a way to lazy load assets at the moment. Though we recently released a (very much experimental) feature which allows you to split up your AssetsDefinition creation into two steps -- step one creates a serializable version of your definition, step two takes that and turns it into the normal AssetsDefinition class. The first step runs only when the repository is loaded, so slow processes such as pinging an external API can happen in there and not slow down actual execution. If that's of value to you, I can connect you to someone who will be able to help. Otherwise, I'd recommend filing an issue for your use case.
s
Thanks so much for your help. I will discuss with the team about it
Hi @claire , Can we connect with the person in order to see a simple example of this "experimental feature"? Thanks in advance
c
Yep! @owen knows the details about being able to cache assets definitions
o
hi @Saul Burgos ! just taking a step back -- what are you trying to accomplish with the lazy-loading behavior, and what issues are you running into with the default approach? In most cases, I'd recommend against building on top of an experimental/undocumented feature so I first want to make sure that there aren't other options
s
We are in the latest version 1.1.20 , Use case: We are using the "RepositoryData" to load jobs and sensors because all these elements need to be loaded from external API. Now we want to load assets using this same pattern. But we have found that this "RepositoryData" does not support assets. @claire told me: "Unfortunately there isn't a way to lazy load assets at the moment." we would like to check if this "experimental feature" can work. Maybe you have better alternative to this "experimental feature" before try to use it.
o
ah I see -- the
RepositoryData
base class has a
get_assets_defs_by_key
function, which probably should be called in that
load_all_definitions
call, but isn't. So you should be able to implement that method (and maybe override load_all_definitions to call it)
s
Hi. Am I doing something wrong here? I am overriding "`load_all_definitions`" and "`get_assets_defs_by_key`". I am printing a log to check the execution. But as you can see the method "`load_all_definitions`" is only execute one time at the beginning and the new method call added "`get_assets_defs_by_key`" is also called once. Additionally the dummy assets is not loaded in dagit and when a click on "*reload definitions*" button on dagit only "`get_all_pipelines`" and "`get_all_sensors`" are executed. I had this believe that when I press "reload definitions" button on dagit the method "`load_all_definitions`" it will be executed
o
I think at least part of what you're seeing (the fact that
111111
is only printed a single time) might be related to this issue: https://github.com/dagster-io/dagster/issues/12581. As for why the asset isn't showing up, I think this is due to a quirk in how we forward the asset information to dagit. In particular, any asset needs to be part of at least one job to show up. When using the default CachingRepositoryData object, we do this for you automatically (we build some jobs implicitly so that all assets are covered). Similarly (I believe you'll run into this soon), once you start using
define_asset_job
, those will need to be resolved into regular JobDefinitions and returned from
get_all_jobs()
. You can see the full implementation of how the default CachingRepositoryData does all of that resolution here: https://sourcegraph.com/github.com/dagster-io/dagster/-/blob/python_modules/dagster/[…]tory_definition/repository_data_builder.py?L37:5&subtree=true, but the most important parts are centered around calling unresolved_asset_job.resolve() and get_base_asset_jobs(). Certainly going to be a bit of an adventure to get working (blobl grimace) but should be doable -- definitely let me know if you run into roadblocks.
❤️ 1