Hi there. I have been trawling through the documen...
# ask-community
h
Hi there. I have been trawling through the documentation but I can’t seem to find much about how to control what environments the run runs in. If I first want to use 100 machines to distribute the preparation of a dataset using a custom docker image and then want to train a neural network on a single multi-gpu machine. How do I define such environments and system requirements? And might this use-case be a bad fit for Dagster?
s
How are you planning the deployment? Kubernetes? Bare metal with docker?
h
Kubernetes is a possibility
s
You can configure the kubernetes jobs associated with each of your dagster jobs to select nodes that meet your criteria. For example, for your large parallel job, choose nodes (using labels or taints) that support autoscaling, for instance. For your training job, configure to go to a node with GPU support. You could ask for some help over at #dagster-kubernetes. If you know who is going to supply your kubernetes service (GCP, AWS, on-prem, etc), that information could be useful to supply.
👍 1
🙏 1
d
You might also find the ops here useful for k8s if you have images that aren’t Dagster ops (or aren’t even in python) that you want to orchestrate in dagster https://docs.dagster.io/_apidocs/libraries/dagster-k8s#ops
👍 1