Hi team! I have to create a simple job, that perio...
# ask-community
m
Hi team! I have to create a simple job, that periodically gets some json data from HTTP resource and put it to AWS S3 with the timestamp in key. My runtime is K8s. At first I created two OPs:
put_data_to_s3(get_data_from_http())
, but in case of K8s there is an overhead for run workers creation per OP and data serialization between ops. Could you provide an advice how to design it better to avoid the overhead and keep the spirit of Dagster there?
d
You can run a job in a single process if you want. Just set the executor to in_process
m
@Daniel Mosesson I'm not sure I understand your answer. I run jobs using K8sRunLauncher. Can I use in_process executor for OPs in this case? I can't find an example in the manual...
v
@Mykola Palamarchuk https://docs.dagster.io/deployment/guides/kubernetes/deploying-with-helm#executor this describes that if you run in_process_executor with k8s run launcher, that the process is ran in a single pod without overhead
Copy code
executor_def=in_process_executor
on your job
d
To expand a bit on the other responses here: The default behavior if you're using the K8sRunLauncher is that there's a single pod for each run, but all the ops within that job execute within that single pod. Setting the k8s_job_executor on your job will make each op happen in its own pod, but you don't need to use that executor if you don't want to.
m
@daniel, is it possible to configure it per OP (or per subgraph)?
d
Right now you have to pick per job
But it's an entirely reasonable feature request to want more granularity there
m
Thank you for your answers! I've tested it and it works.
condagster 2