Hi Dagster folks ! We have questions about hybrid ...
# dagster-plus
s
Hi Dagster folks ! We have questions about hybrid cloud using ECS Agent. we are wondering if we can configure the specific worker (EC2 instance type, docker image, and python dependencies) for a specific op ?
d
Hi Said - we don't currently have this functionality but there's a feature request issue tracking it here: https://github.com/dagster-io/dagster/issues/9671
s
Is there any visibility on when this will be implemented ?
d
Is the docker image that you want to run in the specific op running Dagster python code?
s
Yes
basically we want the op to be ran in an isolated env. Configure it’s python dependencies and EC2 instance type …
d
so you would build and deploy multiple images, one the 'main' one for loading your jobs and code, and the other for running this particular op?
s
No, every op can run on a different image. with its own dependencies and EC2 configuration (instance type, region …).
d
How would you reference the op in the graph if its defined in a different image?
s
I’m still familiarizing myself with Dagster concepts, coming from airflow background. Conceptually what I’m attempting to solve is how the operations that will produce a single or multiple Assets can run using a specific docker while other assets can be produced using a different docker image. From Airflow POV every workflow can run in a particular docker image to produce multiple datasets.
d
absolutely - this is a workflow that we're thinking about making better right now actually. Today there are two main options (and both of them have much better support in kubernetes than ECS currently): • Have the body of op run the underlying container directly - in kubernetes there is a k8s_job_op that is very similar to Airflow's KubernetesPodOperator - at some point we'll likely have an ecs_task_op that behaves similarly. This is extremely generic but loses many of the unique benefits of Dagster since its really just running an arbitrary image. • Customize the image that's used for the op while keeping it as a regular Dagster op that's part of the same graph - this can work but referencing it in the graph can be tricky for the reasons i mentioned above - it still needs to be imported as part of the main graph, which can defeat the whole point of wanting a separate Python environment. We're looking right now into a third way that gives you the benefits of both options - where you can reference ops or assets within the graph that can be defined in a totally different Python environment or image. I can keep you posted when we have more to share about the details of that approach
ty thankyou 4
plus1 1
l
Hi @daniel and @Mark Fickett, I have few of question regarding
k8s_job_op
1. What are those unique benefits of
Dagster
we will loose since its just running an arbitrary image, Can you please explain more on this ? 2. How to pass parameter to the k8s_job_op op definition 3. This
k8s_job_op
can only be used with K8sRunLauncher not with CeleryK8sRunLauncher? 4. How assets will be written and consumed downstream ? 5. Sharing of config map or environment values from repository image to the k8s_job_op image? 6. Will this graph and job will work ?
Copy code
@op
def step1():
   return True

first_op = k8s_job_op.configured(
    {
        "image": "busybox",
        "command": ["/bin/sh", "-c"],
        "args": ["echo HELLO"],
    },
    name="first_op",
)


@graph
def my_graph():
	bool_val = step1()
	first_op() how we can pass bool_val to first_op ?

@job
def full_job():
	my_graph()
d
Would it be possible to make a new post with these questions? That will ensure that the right people see it
👍 1
m
Also I think those questions sound like they're for the Dagster team, I'm just a Dagster user.
🙌 1
l
Done @daniel