any tips and tricks how to use dagster for tasks w...
# announcements
any tips and tricks how to use dagster for tasks with different resources specifications, without "idle" workers but with on demand computation env? we are on aws (aws batch, could go aws eks)
our use case is jobs with dependencies between, some jobs need X vcpu, some needs Y vcpu but Z mem; some jobs have different package deps
If all you need is to have a single step of a pipeline submit a job to aws batch can have a solid submit it using boto3 and the vcpu/mem requirements are config options for that solid that are passed with the client:
Copy code
client = boto3.client('batch')
response = client.submit_job(jobDefinition=context.solid_config['job_definition'],
                                 containerOverrides={'vcpus': solid_config['vcpus'],
                                                     'memory': solid_config['memory']})
From there you can use a botocore waiter to hold the pipeline up until the batch job is completed or use a dagster sensor to trigger another pipeline to start when the output appears on s3
To make the job more flexible, I pass the location of a bash script on s3 in the env variables in containerOverrides that triggers a ‘fetch and run’ job that’s similar to this:
thanks a lot for your ideas!
that fetch & run idea is cool, but i need to figure out how to manage scripts on S3 via CI/CD
but honestly, if i do my steps in pipelines - each step is some s3 python script and is run in docker container, dont i lose benefits of dagster?
like i can setup k8s cluster autoscaled yada yada but for example i needed 256vcpu 1tb ram for one step and aws batch was brilliant (aside from UI it really sucks)
but i doubt celery can be dynamic as we need
I guess it really depends on how heavy of a workload each solid has to deal with. Ive been working with dagster directly doing the steps with the lighter loads/bookeeping and submitting the heavier steps to batch. But I've been keeping any additional scripts that the batch job needs contained in the docker image that is used for the job and the "fetch and run" script just wraps the script with the location of the inputs. Was a lot easier to update the docker image than constantly push updates to s3