Philip Gunning
05/23/2022, 1:23 PMrun_id
on our @graph
decorated master job list. The run_id
is cast to a str
on creation in the utilities for dagster, I don't think we want to implement amended libraries as that can get messy, but we really do need that value to be a UUID
type. the OpExecutionContext
is available in each @op
but there are dozens of those that we use and do not want to implement casting manually in each one.
Is there a way to:-
• access this at the graph level?
• set this to be defined as a UUID type at time of pipeline run?
• or perhaps set our own UUID run_id and leave the behaviour of the library as is?
Broad pseudocode here:
@op
def build_list(context: OpExecutionContext, arg1: pd.DataFrame, arg2: pd.DataFrame) -> pd.DataFrame:
run_id = context.run_id # Where we need a UUID, but there are many ops
df = Dataframe with run_id in each row
return df
@graph
def load_master():
...
many ops
...
build_list(a1, a2)
@repository
def pipeline_local():
return [ load_master, other_graphs ]
sean
05/23/2022, 6:25 PMPhilip Gunning
05/24/2022, 11:51 AMsean
05/25/2022, 7:48 PMrun_id
using Python’s uuid.uuid4
, but converts this to a string, where you need the actual UUID type.
Since this can be accomplished by just doing uuid.UUID(context.run_id)
, I’m a little unclear on what you’re looking for-- is it just that you prefer to write context.run_id
instead of uuid.UUID(context.run_id)
? Have I understood you correctly?
If so, I’m afraid the best I can do is recommend you write a utility function:
def run_id_as_uuid(context: OpExecutionContext) -> uuid.UUID:
return uuid.UUID(context.run_id)
And use run_id_as_uuid(context)
in place of context.run_id
. I don’t think this is unreasonable, it’s common to need to convert data types when managing metadata from an external library in your own data structures.
But pls let me know if I didn’t fully understand you-- your post contains some references to code that doesn’t appear in your snippet (e.g. EPHEMERAL
test str).Philip Gunning
05/26/2022, 2:02 PMget_op_metadata
function which includes run_id
and allows us to do some specific handling within it without the minutia of casting on every op that we have.sean
05/26/2022, 2:10 PM