Hi Dagster folks ! We have questions about hybrid cloud using ECS Agent. we are wondering if we can configure the specific worker (EC2 instance type, docker image, and python dependencies) for a specific op ?
Hi Said - we don't currently have this functionality but there's a feature request issue tracking it here: https://github.com/dagster-io/dagster/issues/9671
Is there any visibility on when this will be implemented ?
Is the docker image that you want to run in the specific op running Dagster python code?
basically we want the op to be ran in an isolated env. Configure it’s python dependencies and EC2 instance type …
so you would build and deploy multiple images, one the 'main' one for loading your jobs and code, and the other for running this particular op?
No, every op can run on a different image. with its own dependencies and EC2 configuration (instance type, region …).
How would you reference the op in the graph if its defined in a different image?
I’m still familiarizing myself with Dagster concepts, coming from airflow background. Conceptually what I’m attempting to solve is how the operations that will produce a single or multiple Assets can run using a specific docker while other assets can be produced using a different docker image. From Airflow POV every workflow can run in a particular docker image to produce multiple datasets.
absolutely - this is a workflow that we're thinking about making better right now actually. Today there are two main options (and both of them have much better support in kubernetes than ECS currently): • Have the body of op run the underlying container directly - in kubernetes there is a k8s_job_op that is very similar to Airflow's KubernetesPodOperator - at some point we'll likely have an ecs_task_op that behaves similarly. This is extremely generic but loses many of the unique benefits of Dagster since its really just running an arbitrary image. • Customize the image that's used for the op while keeping it as a regular Dagster op that's part of the same graph - this can work but referencing it in the graph can be tricky for the reasons i mentioned above - it still needs to be imported as part of the main graph, which can defeat the whole point of wanting a separate Python environment. We're looking right now into a third way that gives you the benefits of both options - where you can reference ops or assets within the graph that can be defined in a totally different Python environment or image. I can keep you posted when we have more to share about the details of that approach
Hi @daniel and @Mark Fickett, I have few of question regarding
1. What are those unique benefits of
we will loose since its just running an arbitrary image, Can you please explain more on this ? 2. How to pass parameter to the k8s_job_op op definition 3. This
can only be used with K8sRunLauncher not with CeleryK8sRunLauncher? 4. How assets will be written and consumed downstream ? 5. Sharing of config map or environment values from repository image to the k8s_job_op image? 6. Will this graph and job will work ?
def step1():
   return True

first_op = k8s_job_op.configured(
        "image": "busybox",
        "command": ["/bin/sh", "-c"],
        "args": ["echo HELLO"],

def my_graph():
	bool_val = step1()
	first_op() how we can pass bool_val to first_op ?

def full_job():
Would it be possible to make a new post with these questions? That will ensure that the right people see it
Also I think those questions sound like they're for the Dagster team, I'm just a Dagster user.
