Peter Rietzler
03/06/2023, 4:27 PMload_definitions
function in this example is executed for every single materialization of any of the given assets ? I see the startup_time
is different for the assets if I materialize both of them in a single run and I would just like to verify if I can rely on this behaviour in any execution case (locally, one pod to materialize all assets, one pod per asset materialization, CacheableAssetsDefinition etc.). Thank you very much for your help!
startup_time = datetime.datetime.utcnow()
def load_definitions() -> Definitions:
@asset
def asset_one(context):
return repr({"startup_time 1": startup_time})
@asset
def asset_two(context):
return repr({"startup_time 2": startup_time})
return Definitions(
assets=[asset_one, asset_two]
)
defs = load_definitions()
Peter Rietzler
03/06/2023, 4:33 PMload_definitions
is executed ? Is there an environment variable, parameter or anything else available ?chris
03/06/2023, 9:56 PMload_definitions
to be called once per process not necessarily once per asset. For example, if you switch to using the in_process_executor, then load_definitions
would not be guaranteed to load for every single asset materialization, likewise if you use a graph-backed asset which contains multiple underlying ops, it may call load_definitions
multiple times for the execution of that graph-backed asset.chris
03/06/2023, 10:04 PMPeter Rietzler
03/07/2023, 7:51 AMload_definitions
would also be called in the case CacheableAssetsDefinition
is used ? As far as I understood, CacheableAssetsDefinition
should be used in the case that load_definition
takes a lot of time or resources to actually compute the asset graph ? I've not used it by now and thus wonder how this would behave ?Peter Rietzler
03/07/2023, 9:04 AMMhd Mousa Hamad
03/07/2023, 9:41 AMclass MyDataPipeline:
def __init__(self):
self.service = ...
def load_definitions(self) -> Definitions:
startup_time = datetime.datetime.utcnow()
@asset
def asset_one(context):
# Can we reliably use `self.service` here without creating a resource for it
self.service.do_somthing()
return repr({"startup_time 1": startup_time})
@asset
def asset_two(context):
return repr({"startup_time 2": startup_time})
return Definitions(
assets=[asset_one, asset_two]
)
pipeline = MyDataPipeline()
dfs = pipeline.load_definitions()
Is it still an acceptable practice to use the service
as defined compared to defining a resource for it? Any known pros/cons for this approach?
Thank you!owen
03/07/2023, 10:07 PMCacheableAssetsDefinition
is an internal api and generally not recommended to be subclassed by users (as the interface may be broken even on minor releases). But to give a bit more context, its use does not impact how and when load_definitions
will be called. When cacheable assets are present, information about the cacheable assets definitions is generated/serialized when your user code server starts up. This serialized information is then passed along to subprocesses when runs are launched. These subprocesses will take a slightly different internal code path (reading from the serialized data to generate their assets rather than having to create that serialized data from scratch). This abstraction is useful mostly when the act of generating that serialized data may take a long time (i.e. it hits some slow API).
As for the most recent message, the resource abstraction is useful in cases where a) you might want to reuse a resource for multiple different assets, and have them all share some sort of config or b) you want to substitute out different definitions of the resource between environments (i.e. the local service hits a mock endpoint, and the prod service hits the real thing). If neither of these are useful to you, then there should be no issues in just using the service in line.
However, is there a particular reason that you're going with this class-based approach? In general, the Definitions object will map to the entire collection of assets / jobs in a code location, and so its scope should usually be larger than a single data pipeline.Mhd Mousa Hamad
03/10/2023, 12:47 PMMhd Mousa Hamad
03/10/2023, 12:50 PM