Calling an op from another op It turns out that it would be dagster #ask-community

Calling an op from another op It turns out that i...

Stefan Adelbert

03/19/2023, 10:27 PM

Calling an op from another op It turns out that it would be really useful in some limited cases to call an op from another op, mainly to help with the execution order of loosely coupled ops. Something like this,

Copy code

import dagster

@dagster.op
def child_op(context):
    context.log.debug("child op")

@dagster.op
def parent_op(context):
    context.log.debug("parent op")
    child_op() # Compute function of op 'child_op' has context argument, but no context was provided when invoking.
    #child_op(context) # Compute function of op 'child_op' has context argument, but no context was provided when invoking.

@dagster.graph
def graph():
    parent_op()

job = graph.to_job()
job.execute_in_process()

But I'm getting an error,

Copy code

dagster._core.errors.DagsterInvalidInvocationError: Compute function of op 'child_op' has context argument, but no context was provided when invoking.

File ~/.cache/pypoetry/virtualenvs/westhaven-notebooks-b3nFTxXM-py3.10/lib/python3.10/site-packages/dagster/_core/definitions/op_definition.py:410, in OpDefinition.__call__(self, *args, **kwargs)
    408 if len(args) > 0:
    409     if args[0] is not None and not isinstance(args[0], UnboundOpExecutionContext):
--> 410         raise DagsterInvalidInvocationError(
    411             f"Compute function of {node_label} '{self.name}' has context argument, "
    412             "but no context was provided when invoking."
    413         )
    414     context = args[0]
    415     return op_invocation_result(self, context, *args[1:], **kwargs)

A solution could be for

child_op

to be a plain function, but in reality

child_op

needs a context to be able to log and access resources. Any thoughts?

dagster bot responded by community 1

Oliver Sellwood

03/19/2023, 10:59 PM

you could make the child op just accept context and pass it in for the solution you mentioned Though I think there is probably a better solution but not enough info in your example to really make a call. What is the actual use case here? you can see some examples of different dependency patterns here https://docs.dagster.io/0.14.4/concepts/ops-jobs-graphs/jobs-graphs#examples

Stefan Adelbert

03/19/2023, 11:19 PM

@Oliver Sellwood Thanks for the response. I'll give some more details of my use case. I have ops that access a headless web browser and scrape data. The login and logout ops get used all over the place and should ideally be generic. But the "extract data" op is use case specific. All three ops need access to the same web browser (resource) instance, as there is implicit state, so the ordering of the execution of those three ops is important, even though there is no data dependency between these ops.

Stefan Adelbert

03/19/2023, 11:20 PM

Those three ops should function as a unit in the context of a larger job.

Stefan Adelbert

03/19/2023, 11:32 PM

• White rounded rects are ops • Grey circles are resources If you "zoom out" a bit, the overall job looks more like this. Data is scraped from a website and other data is extracted from an API. The two datasets are transformed together and the result if loaded into a database. From a data perspective, • the load op depends on the transform op • the transform op depends on both the extract ops • but the logout op has no dependants, so there is no guarantee that the logout op will be performed before the transform op My real use case is still more complex this, but suffice to say, I am looking for a way to group the login-extract-logout ops such that I can ensure execution ordering, in particular that the logout op is executed before the transform op. And this while also maintaining reusability of the ops, e.g. login and logout. I have tried various different strategies, including `graph`s.

Oliver Sellwood

03/20/2023, 4:22 AM

hm, can the log in/log out operations be performed within the resource? something like this https://docs.dagster.io/concepts/resources#context-manager-resources

Stefan Adelbert

03/20/2023, 4:58 AM

Yes, in theory they could. In fact, that's probably quite a nice way of doing it. There could be a session resource for each site (which handles set up and break down, like login and logout), which in turn depends on a browser resource. The would need to be a browser resource per session resource in the case a single job having to access two different site, i.e. the browser resource instance could not be recycled.

252 Views

Open in Slack

Previous Next