How should one think about resources vs assets from a concep dagster #ask-community

How should one think about resources vs assets fro...

Anthony Yim

07/11/2023, 5:45 PM

How should one think about resources vs assets from a conceptual/philosophical perspective? E.g., right now, we have an

@asset

that pulls stock prices daily from an API. Should that be a resource instead? When should we make it a resource instead? CC: @Son Do

Abhinav Dhulipala

07/11/2023, 5:50 PM

The way we use them, resources wrap exernal deps (external API's, DB connections, etc.). They are important to give isolation during testing. In fact a lot of the split in responsibility becomes evident when trying to test your pipelines. Resources are mocked fairly easily and are then tested on their own independently. Also, resources can be used by multiple assets at once, so you can apply synchronization primitives to them, caching, or any global state for tracking resource usage. Assets (for us) use resources. Resources also provide a lot of shared functionality that we use accross asset pipelines. Calling external deps in the asset may work for now, but as you write more pipelines, you'll probably see common error handling and cache opportunities related to the external api's that you'd want to pull out.

Abhinav Dhulipala

07/11/2023, 5:51 PM

Also state in assets is torn down once the asset is done, while resource persist state as long as your dagster instance is up. Making them viable for caching for example, or api tracking & client-side throttling

chris

07/11/2023, 5:54 PM

^ exactly what Abhinav said. Resources are for providing an interface for interacting with external environment, which you might want to switch in / out in testing, staging, prod, etc. Whereas assets should encapsulate your actual business logic

Anthony Yim

07/11/2023, 5:58 PM

Awesome thanks

Zach

07/12/2023, 5:23 PM

@Abhinav Dhulipala / @chris - is it true now that you can share resources across ops and jobs to track global state? This would seem to mean that they now have a lifecycle independent of ops / assets - seemingly for the lifecycle of an entire deployment. I thought resources only lived as long as the op / asset that they were tied to, and are constructed / garbage collected for each op / asset, but perhaps things have changed.

chris

07/12/2023, 5:29 PM

hey - apologies not the case yet zach - although resource lifecycle hooks are something being actively considered

Zach

07/12/2023, 5:36 PM

Okay yeah that would be a pretty big change, figured it would've been better signaled

Zach

07/12/2023, 5:36 PM

Thanks!

Abhinav Dhulipala

07/12/2023, 6:04 PM

That's a misunderstanding on my part zach, I apologize. I saw some examples of api caching and assumed that it cached across ops. Also the way we pass pydantic resources instances into our definitions contributed to my confusion

Zach

07/12/2023, 6:09 PM

No worries! It's a question that comes up once or twice a week in #dagster-support , a lot of people assume resources are shared across assets / ops

AJ Floersch

07/18/2023, 8:18 PM

Has anyone heard what the roadmap looks like regarding global state for resources? Currently hitting a wall myself where an API only allows one session at a time and will immediately sign out if another connection attempt is made. Obviously this makes it challenging when multiple assets are being materialized simultaneously since I can't just tap into the existing session as part of the resource.

Zach

07/18/2023, 8:35 PM

No idea - I gather it's a pretty hard problem though due to the difficulty in sharing state across process boundaries in python in general. Seems like it might require something like a grpc server specifically for resources through which you'd interact with them

10 Views

Open in Slack

Previous Next