What are Your experience about how "big" servers (to create kubernetes cluster) we should need for dagster? Which component needs what min/optional/max resources (CPU, RAM)?
I understand that this could depend from operations inside pipelines.
We want to achieve:
-get data from source (postgresql);
-do data tranformations with pandas/numpy ...
-save to db;
10/23/2020, 3:14 PM
You’re right that this depends on your operations within the solids and pipelines, that would be the biggest factor here – the setup and processes that are able to run on your laptop is pretty similar to the one that runs in k8s so Dagster processes themselves are pretty lightweight.
cc @cat or @nate who might have more insight here
I’ve have been fine with using nodes that have low specs (such as
on AWS) , but I’ve also been running light workloads.
10/23/2020, 5:56 PM
yeah, I think @cat has thought about this more than I have, but as Sashank mentioned the overhead of Dagster components is fairly small, so this will mostly be relative to the size of your workloads. Unless you’re working on really massive datasets in pandas/numpy, I think you’ll be able to get away with fairly small nodes