Is there a comprehensive list of available interme...
# announcements
a
Is there a comprehensive list of available intermediate storage definitions? I am building a distributed pipeline and the s3 intermediate from
dagster_aws
won't be a valid option in production.
y
Hi Andy, what is your use case?
a
We are pushing pipelines through dask as an intermediate for a slurm cluster. In order to push using distributed, I have to supply a ModeDefinition which provides an intermediate storage definition, and it complains and fails if I use the filesystem intermediate. I am OK to use s3 in development, but that won't be an option when we hit production.
y
what would the production be using?
a
That's what we're not sure of yet. I was hoping to find a list of intermediates documentation so that we could make a decision about how we would proceed there.
m
Hey Andy! Just curious: why doesn’t s3 fits your production needs? And how about a self-hosted s3?
a
Thanks @yuhan, much appreciated.
@matas Self-hosted s3 might work, I wasn't aware of such a thing. If we can build an on-prem s3 service that might work.
m
We used minio (https://github.com/minio/minio) and zenko (https://github.com/scality/cloudserver) - both compatible with dagster_aws, self-deployed containerized solutions. Though minio is nicer with its gui, it is 50x slower with dagster_aws compute_logs (https://github.com/dagster-io/dagster/issues/2438) due to some strange boto3 behaviour. So we’ve switched to zenko for now
you can look for a deployment inspiration in our boilerplate repo https://github.com/bestplace/cube. It is quite outdated for now, but still valid about s3 connections
a
Awesome, thanks @matas -- I'll check this out
🔥 1