Hello :wave:, I am reading about the new pythonic ...
# ask-community
f
Hello 👋, I am reading about the new pythonic Resources (the ergonomic improvements are very exciting!) and have some questions and feedbacks about them. The section Defining resources which require state is making a questionable statement and recommendation :
"Since ConfigurableResource is a dataclass meant to encapsulate config, it is not a good fit for managing state. The recommended pattern in this case is to build a separate class which manages the state of the resource which is provided by the ConfigurableResource class."
To begin with, it's not clear how the proposed pattern is more stateful than just using a ConfigurableResource alone? I'm guessing that in both cases, the class(es) (and their "state") are preserved during the duration of the run? For multiprocess executors there might be a nuance, like the one described in the Legacy Resource guide, but the new concept page does not give explanation. Moreover, the example shows that the second class is used to initialize the resource (i.e., the api_token is fetched and stored in the init method of the second class). For any special initialization of the resource, I don't think the proposed pattern is necessary. I would prefer to keep only one class and use (i.e., override) its init method like so
Copy code
def __init__(self, **kwargs): 
    super().__init__(**kwargs)
    self._api_token = ...
It's a common pattern with Pydantic models. I'm interested to know the reasons behind the different approach you are proposing. Unfortunately from the docs alone, there's not really an explanation. The section about stateful Resources does not make a lot of sense to me. I think that further clarifications and explanations would be beneficial. Thank you! 😃
d
I could be reading this wrong, but it looks like custom init methods are not supported in pydantic v2, which is also maybe bad since I think dagster itself uses this for its own pydantic subclasses -
Copy code
Custom __init__ overrides won't be called. This should be replaced with a @root_validator.
https://docs.pydantic.dev/blog/pydantic-v2-alpha/
👌 1
a
I’ve been using
PrivateAttr
from Pydantic for storing some internal state, eg. session for a resource that wraps an API:
Copy code
from pydantic import PrivateAttr
from dagster import ConfigurableResource
import requests

class MyResource(ConfigurableResource):
    _session: requests.Session | None = PrivateAttr(None)

    @property
    def session(self):
        if self._session is None:
            self._session = requests.Session()
            # add headers and default parameters to session, etc.

        return self._session

    def query(self, url, params):
        self.session.get(url, params=params)
❤️ 1
@Félix Tremblay you can also do similar by adding your own
__enter__
and
__exit__
methods to the resource, it worked for me in several cases (for the previous example, you can create your session in
__enter__
and tear it down in
__exit__
).
❤️ 1
f
Thank you Daniel for the info about Pydantic V2!
Thank you Andras for the 2 solutions. I like them both
@Andras Somi, when you define the
enter
and
exit
methods for the ConfigurableResource, does Dagster automatically call these methods for you, or do you have to use the
with
yourself in your op/asset?
With Legacy Resources, you could implement them with a context manager, and Dagster would automatically initiate &terminate them https://docs.dagster.io/concepts/resources-legacy#context-manager-resources
c
You would have to use with in your op/asset - in general I think the plan is to move away from the system managing that state for you
b
Hi Felix, thanks for your feedback! A few pieces of clarification which should hopefully help explain how we’re thinking about this First, on context managers, our goal here was to move away from the world where Dagster would implicitly manage contexts under the hood for users, since this behavior was often poorly defined (e.g. when and how often is the context opened and closed across an in-process vs multiprocess run?) We believe that moving this management into “user-space” is more explicit:
Copy code
@op
def my_op(my_context_resource: MyContextResource):
   with my_context_resource.open():
       pass
Second, on managing state in a
ConfigurableResource
. We initially shipped a constrained API with the aim of clearly separating the resource config schema and stateful implementation, with the knowledge that we could relax this constraint later on. Based on feedback, for many users this seems like a heavy-handed prescription and is something we’re re-examining. In particular, we are considering support for an initialization method which would allow users to set private fields on a resource directly.
f
Hi @ben thank you very much for your detailed response. I personally think there are valid use cases for context managers (managed by Dagster) : e.g. when a resource 1) needs to be initialized and closed, 2) is used in multiple ops or assets, 3) the user wants to avoid initializing and closing it for every ops/assets, but ensure that it is closed when the run is finished or failed
b
Hi Felix, we’ve gotten a lot of feedback along these lines and are considering changing our recommendations around state management + providing lifecycle hooks which would hopefully fit better for your use-case. The RFC is here: https://github.com/dagster-io/dagster/pull/13938 Feel free to leave any feedback about whether you prefer these changes
f
Hi @ben, I've read the RFC and I believe it's a big step in the right direction. The introduction of Lifecycle hooks is particularly useful, and I appreciate the clarity provided by the three methods (setup_for_execution, teardown_after_execution, and yield_for_execution). The method names are a bit verbose but very clear, although setup_before_execution might have been slightly more coherent with teardown_after_execution. The part about PrivateAttr is also great. However, I still see some added friction compared to the old resources, though it's relatively minor. We can mutate attributes, but in order to do so, we need to 1) explicitly set it as a PrivateAttr (one more thing to learn during the learning curve) and 2) lose the ability to configure this attribute. In contrast, with the old resources, any attributes, even configurable attributes, could be mutated, and doing so did not require any additional specification (e.g. adding PrivateAttr). Despite this minor friction, I think the overall proposal offers significant improvements. That being said, I do believe it would be beneficial for Dagster to provide some arguments to explain the advantages of making resource attributes immutable and explain WHY users should defer state management to a separate class. This would help users better understand the reasoning behind this design choice and potentially alleviate some concerns.