How should one think about resources vs assets fro...
# ask-community
a
How should one think about resources vs assets from a conceptual/philosophical perspective? E.g., right now, we have an
@asset
that pulls stock prices daily from an API. Should that be a resource instead? When should we make it a resource instead? CC: @Son Do
a
The way we use them, resources wrap exernal deps (external API's, DB connections, etc.). They are important to give isolation during testing. In fact a lot of the split in responsibility becomes evident when trying to test your pipelines. Resources are mocked fairly easily and are then tested on their own independently. Also, resources can be used by multiple assets at once, so you can apply synchronization primitives to them, caching, or any global state for tracking resource usage. Assets (for us) use resources. Resources also provide a lot of shared functionality that we use accross asset pipelines. Calling external deps in the asset may work for now, but as you write more pipelines, you'll probably see common error handling and cache opportunities related to the external api's that you'd want to pull out.
Also state in assets is torn down once the asset is done, while resource persist state as long as your dagster instance is up. Making them viable for caching for example, or api tracking & client-side throttling
c
^ exactly what Abhinav said. Resources are for providing an interface for interacting with external environment, which you might want to switch in / out in testing, staging, prod, etc. Whereas assets should encapsulate your actual business logic
a
Awesome thanks
z
@Abhinav Dhulipala / @chris - is it true now that you can share resources across ops and jobs to track global state? This would seem to mean that they now have a lifecycle independent of ops / assets - seemingly for the lifecycle of an entire deployment. I thought resources only lived as long as the op / asset that they were tied to, and are constructed / garbage collected for each op / asset, but perhaps things have changed.
c
hey - apologies not the case yet zach - although resource lifecycle hooks are something being actively considered
z
Okay yeah that would be a pretty big change, figured it would've been better signaled
Thanks!
a
That's a misunderstanding on my part zach, I apologize. I saw some examples of api caching and assumed that it cached across ops. Also the way we pass pydantic resources instances into our definitions contributed to my confusion
z
No worries! It's a question that comes up once or twice a week in #dagster-support , a lot of people assume resources are shared across assets / ops
a
Has anyone heard what the roadmap looks like regarding global state for resources? Currently hitting a wall myself where an API only allows one session at a time and will immediately sign out if another connection attempt is made. Obviously this makes it challenging when multiple assets are being materialized simultaneously since I can't just tap into the existing session as part of the resource.
z
No idea - I gather it's a pretty hard problem though due to the difficulty in sharing state across process boundaries in python in general. Seems like it might require something like a grpc server specifically for resources through which you'd interact with them