Hi guys, relatively new to dagster pipelines. We ...
# ask-community
p
Hi guys, relatively new to dagster pipelines. We have quite a large set of connected jobs and graphs in a repo. All runs great, we have a small issue with the
run_id
on our
@graph
decorated master job list. The
run_id
is cast to a
str
on creation in the utilities for dagster, I don't think we want to implement amended libraries as that can get messy, but we really do need that value to be a
UUID
type. the
OpExecutionContext
is available in each
@op
but there are dozens of those that we use and do not want to implement casting manually in each one. Is there a way to:- • access this at the graph level? • set this to be defined as a UUID type at time of pipeline run? • or perhaps set our own UUID run_id and leave the behaviour of the library as is? Broad pseudocode here:
Copy code
@op
def build_list(context: OpExecutionContext, arg1: pd.DataFrame, arg2: pd.DataFrame) -> pd.DataFrame:
    run_id = context.run_id # Where we need a UUID, but there are many ops
    df = Dataframe with run_id in each row
    return df

@graph
def load_master():
    ...
    many ops
    ...
    build_list(a1, a2)

@repository
def pipeline_local():
    return [ load_master, other_graphs ]
🤖 1
s
Hi Philip, I’m checking with some others on our team about our philosophy here and a suggested solution for you. My instinct says that we’re unlikely not going to change our internal run ID typing, but it may be possible there’s a solution for you at the context level.
❤️ 1
p
Our workaround so far has been to do manual casting in the affected ops, then patch the "EPHEMERAL" test str to a nominal value in pytest. But still curious about a better integrated solution if you had one.
Hey @sean any further ideas on this?
s
So I did a little research and some thinking on this. As I understand it, you are tracking `run_id`s in dataframes that use UUID and not string. Dagster generates
run_id
using Python’s
uuid.uuid4
, but converts this to a string, where you need the actual UUID type. Since this can be accomplished by just doing
uuid.UUID(context.run_id)
, I’m a little unclear on what you’re looking for-- is it just that you prefer to write
context.run_id
instead of
uuid.UUID(context.run_id)
? Have I understood you correctly? If so, I’m afraid the best I can do is recommend you write a utility function:
Copy code
def run_id_as_uuid(context: OpExecutionContext) -> uuid.UUID:
    return uuid.UUID(context.run_id)
And use
run_id_as_uuid(context)
in place of
context.run_id
. I don’t think this is unreasonable, it’s common to need to convert data types when managing metadata from an external library in your own data structures. But pls let me know if I didn’t fully understand you-- your post contains some references to code that doesn’t appear in your snippet (e.g.
EPHEMERAL
test str).
p
Thanks for getting back, we have ended up wrapping a few different metrics together in a
get_op_metadata
function which includes
run_id
and allows us to do some specific handling within it without the minutia of casting on every op that we have.
s
Great