any tips and tricks how to use dagster for tasks with differ dagster #announcements

any tips and tricks how to use dagster for tasks w...

Matej Války

02/23/2021, 12:50 AM

any tips and tricks how to use dagster for tasks with different resources specifications, without "idle" workers but with on demand computation env? we are on aws (aws batch, could go aws eks)

Matej Války

02/23/2021, 12:52 AM

our use case is jobs with dependencies between, some jobs need X vcpu, some needs Y vcpu but Z mem; some jobs have different package deps

Cameron Gallivan

02/23/2021, 6:12 AM

If all you need is to have a single step of a pipeline submit a job to aws batch can have a solid submit it using boto3 and the vcpu/mem requirements are config options for that solid that are passed with the client:

Copy code

client = boto3.client('batch')
response = client.submit_job(jobDefinition=context.solid_config['job_definition'],
                                 jobName=context.solid_config['job_name'],
                                 jobQueue=context.solid_config['job_queue'],
                                 containerOverrides={'vcpus': solid_config['vcpus'],
                                                     'memory': solid_config['memory']})

From there you can use a botocore waiter to hold the pipeline up until the batch job is completed or use a dagster sensor to trigger another pipeline to start when the output appears on s3

Cameron Gallivan

02/23/2021, 6:16 AM

To make the job more flexible, I pass the location of a bash script on s3 in the env variables in containerOverrides that triggers a ‘fetch and run’ job that’s similar to this: https://aws.amazon.com/blogs/compute/creating-a-simple-fetch-and-run-aws-batch-job/

Matej Války

02/23/2021, 12:39 PM

thanks a lot for your ideas!

Matej Války

02/23/2021, 12:40 PM

that fetch & run idea is cool, but i need to figure out how to manage scripts on S3 via CI/CD

Matej Války

02/23/2021, 12:42 PM

but honestly, if i do my steps in pipelines - each step is some s3 python script and is run in docker container, dont i lose benefits of dagster?

Matej Války

02/23/2021, 12:43 PM

like i can setup k8s cluster autoscaled yada yada but for example i needed 256vcpu 1tb ram for one step and aws batch was brilliant (aside from UI it really sucks)

Matej Války

02/23/2021, 12:44 PM

but i doubt celery can be dynamic as we need

Cameron Gallivan

02/23/2021, 4:55 PM

I guess it really depends on how heavy of a workload each solid has to deal with. Ive been working with dagster directly doing the steps with the lighter loads/bookeeping and submitting the heavier steps to batch. But I've been keeping any additional scripts that the batch job needs contained in the docker image that is used for the job and the "fetch and run" script just wraps the script with the location of the inputs. Was a lot easier to update the docker image than constantly push updates to s3

99 Views

Open in Slack

Previous Next