In the case I need to run multiple Dagit instances...
# announcements
x
In the case I need to run multiple Dagit instances to scale out, is there a way to configure those instances to store everything (including intermediate results) to the same DB so that the results of a solid with same parameters can be shared across multiple Dagit instances in different hosts?
s
Yes, just make sure they’re all configured with the same
dagster.yaml
configuration: https://docs.dagster.io/overview/instances/dagster-instance#instance-configuration-yaml
And make sure you use a intermediate store that can work across machines, such as the s3 intermediate store
x
I don’t think i have access to s3 and the intermediate results are not really files in my cases. They are most likely only Python dictionaries/JSON.
s
Do you care about being able to re-execute pipelines? Or just being able to see pipeline runs across all your Dagit instances
The reason you’d need to use something like the s3 intermediate store is we need some remote place to store serialized intermediates. We don’t support a DB backed intermediate store today
x
I see. So I don’t really care about re-execution, I care more about the later one you mentioned and also the shareable results, so the same solid with same arguments got executed in another pipeline can return the result instantly
Like a cache
s
cc @sandy @yuhan re: memoized solids
y
Hi @Xu Zhang that’s a limitation in the current system. We are working on improving it (e.g. able to config intermediates to be stored/retrieved to/from a db), which will be released in our next major release (early Dec).
x
Awesome! I guess in the meantime, I can at least memorize it in the local instance using @cache provided by native python