https://dagster.io/ logo
#dagster-feedback
Title
# dagster-feedback
m

Médéric Hurier

11/20/2021, 12:25 PM
Hello. I started to use dagster and I have a question regarding the
context
argument. Using
context
seems to "break" Python functions, as you have to provide this argument when you call the function directly. On the other hand, this argument is not explicit when you call an op from a graph/job. It's an important buy-in to migrate an existing code base to Dagster. Is there a way around it? Would it be better to use context as an optional keyword argument instead of a positional argument ?
s

sandy

11/20/2021, 10:01 PM
Hi mederic - you’re not required to define a context argument when defining an op. Eg you can do @op def my_op(): …
m

Médéric Hurier

11/21/2021, 8:44 AM
Yes, but if I do that, I can't use the configuration/resource system properly. It's either "adapt all your code to get more features" or "keep it as is, but you don't get our features".
I thought of limiting the use of the context argument to graphs and jobs, but I haven't tried it yet.
This is similar to what it showed on the testing doc, either you call the function as-is, or you have to call the build_op_context to mock this argument: https://docs.dagster.io/concepts/testing#testing-ops
and if I want to move from one approach to the other (no context -> context), I need to adapt all my code / test to the new model
s

Stefan Adelbert

11/22/2021, 2:07 AM
Hi @Médéric Hurier I'm evaluating dagster as a framework for our business process automations. We have been using out homegrown framework for this for a couple of years, but we have new requirements that dagster can probably help with. I'm about to start migrating existing (python) business processes to dagster. Interestingly, our own framework was broken down into
tasks
and
functions
, where the
functions
are like
ops
and take a single
context
parameter. Because of this similarity, I'm hoping that the migration process will be relatively easy... In your case, where your existing code is quite different, maybe it would make sense to do a gradual approach where you write dasgter
ops
which interact with the context, getting the logger and relevant resources and then call into your existing functionality - i.e. the
ops
would be thin wrappers which conform your existing code, like shims.
m

Médéric Hurier

11/22/2021, 7:27 AM
Thanks for your feedback @Stefan Adelbert. Indeed, I think it makes more sense to clearly separate business functions from dagster ops. Even if dagster ops behaves 90% likes plain Python functions, they have subtle differences that could generate conflicts.
On one hand, I think the dagster team can be proud for making a DSL so close to the familiar concepts of a Python developer. On the other hand, I understand that ops have more things to do that plain Python functions, like handling configs, resources, ops dependencies ...
To be honest, I struggled on how developers should mix existing codebases with dagster. The documentation advertises you can use dagster ops like plain Python functions, but I find it "deceptive".
❤️ 1
And I would say that beyond that point, all the abstractions and orchestrations mechanisms of dagster seems great 🙂
s

sandy

11/22/2021, 3:38 PM
@Médéric Hurier - is there a way you envision ops could work that would allow them to work more like plain Python functions?
m

Médéric Hurier

11/22/2021, 4:27 PM
I see 3 possibilities: 1) pass the context object as an optional keyword argument to limit refactoring (e.g., def f(a, b, context=None)) 2) get the context object inside the function body (like in flask) and the new dagster logging 0.13.0 or 3) prevent the use of context objects inside ops altogether, and rely either on 3.1) parameters to pass config arguments (like prefect) or 3.2) let jobs manage configurations as this is the only object launched with a config in the end.
❤️ 1
I think I understand better what the dagster team achieved with ops and context: provide a software defined workflow many times more powerful than airflow, but with a syntax as nice a plain Python
At some point, the team had to make a trade-off to allow more powerful features (type checking, configs, context passing, ...) and break the compatibility of ops with Python callable.
The consequence is that you need to develop small functions, and then wrap them in ops and jobs. There are duplication in code and parameters, but this may be unavoidable if you want all the powers of ops.
The only alternative I see is to completely separate ops from functions, similar to what Kedro does with Nodes. However, this is more verbose, (and I think Kedro is too prescriptive on its approach compared to dagster overall)
s

sandy

11/22/2021, 4:49 PM
Thanks for the detailed response - that's helpful. The way we've been thinking about it, there are two situations that any particular op can be in: • The body of the op uses configuration and/or resources. In this case, the op needs a context argument, because that's the way for it to access them (without doing flask-style magic). • The body of the op does not use configuration or resources. In this case, the ops doesn't need a context argument. When you talk about context as an optional keyword argument, are you envisioning a middle-ground where some ops will use configuration and/or resources if they're provided, but can get by without them if they're not provided?
m

Médéric Hurier

11/23/2021, 7:38 AM
@sandy Indeed, the two situations are well explained in the documentation. However, there is no easy transition from one to the other. On one hand, you have to refactor the code outside jobs (testing, debugging, ...). On the other hand, I find it "magical" to automatically pass the context argument in jobs, whever the ops use a context or not.
The trick would be to declare
def f(a, b context=None)
, but never let it be None in reality. The op decorator could always pass an empty context, or warn the user before calling the user function if the context is None/empty and there are required fields.
As a dagster novice, I was really puzzled by the use of context as the first positional argument in the beginning. In my current code base, I decided to always declare a context to let my ops be more consistent.
Another alternative to what we discussed would be to use a functional-oriented approach for context-free ops, and an object-oriented approach for context ready ops. The dagster team could pass the context as an init parameter of the parent class.
But I'm not a big fan of the object-oriented approach. Too verbose and cumbersome, in my opinion.
Option 2) (flask context) would be the most familiar to end user and the easier to implement (I think) since you laid out the ground in v0.13 with loggers. Option 3.1) and 3.2) might be the most future-proof, but they would require more refactoring and investment.