In the case I need to run multiple Dagit instances to scale dagster #announcements

In the case I need to run multiple Dagit instances...

Xu Zhang

11/13/2020, 9:53 PM

In the case I need to run multiple Dagit instances to scale out, is there a way to configure those instances to store everything (including intermediate results) to the same DB so that the results of a solid with same parameters can be shared across multiple Dagit instances in different hosts?

sashank

11/13/2020, 9:56 PM

Yes, just make sure they’re all configured with the same

dagster.yaml

configuration: https://docs.dagster.io/overview/instances/dagster-instance#instance-configuration-yaml

sashank

11/13/2020, 9:57 PM

And make sure you use a intermediate store that can work across machines, such as the s3 intermediate store

sashank

11/13/2020, 9:59 PM

https://docs.dagster.io/deploying/aws#using-s3-for-intermediates-storage

Xu Zhang

11/13/2020, 10:01 PM

I don’t think i have access to s3 and the intermediate results are not really files in my cases. They are most likely only Python dictionaries/JSON.

sashank

11/13/2020, 10:02 PM

Do you care about being able to re-execute pipelines? Or just being able to see pipeline runs across all your Dagit instances

sashank

11/13/2020, 10:04 PM

The reason you’d need to use something like the s3 intermediate store is we need some remote place to store serialized intermediates. We don’t support a DB backed intermediate store today

Xu Zhang

11/13/2020, 10:06 PM

I see. So I don’t really care about re-execution, I care more about the later one you mentioned and also the shareable results, so the same solid with same arguments got executed in another pipeline can return the result instantly

Xu Zhang

11/13/2020, 10:06 PM

Like a cache

sashank

11/13/2020, 10:07 PM

cc @sandy @yuhan re: memoized solids

yuhan

11/13/2020, 10:32 PM

Hi @Xu Zhang that’s a limitation in the current system. We are working on improving it (e.g. able to config intermediates to be stored/retrieved to/from a db), which will be released in our next major release (early Dec).

Xu Zhang

11/13/2020, 10:36 PM

Awesome! I guess in the meantime, I can at least memorize it in the local instance using @cache provided by native python

56 Views

Open in Slack

Previous Next