I have a graph of assets where one of the assets needs to mo dagster #ask-community

I have a graph of assets where one of the assets n...

Alec Koumjian

07/11/2023, 3:45 AM

I have a graph of assets where one of the assets needs to mount a Kubernetes PVC. Currently, we do this by using

define_asset_job

and using assigning the

"dagster-k8s/config"

tag to specify the necessary

container_config

pod_spec_config

in order to mount the volume. I'd like to use AutoMaterialization for this graph. How can I set the asset to use the correct

tags

op_tags

doesn't appear to operate at that level, unless I'm mistaken.

Alec Koumjian

07/11/2023, 2:39 PM

If I use the

k8s_job_executor

for the entire code location, then it appears to work, which is encouraging. Is this an unintended workaround? More generally, is there a reason that executors cannot be defined at the

asset

level, when they are able to be defined at the

job

and

op

level? Using

job

doesn't seem to really be appropriate, since all of the sensor, freshness policy, and auto-materialization features operate at the

asset

level.

chris

07/11/2023, 5:05 PM

Hey Alec, this is a factor of the way we execute assets together - every asset in a single run currently needs to use the same executor - although this is something we’ve been thinking about loosening the restriction on

chris

07/11/2023, 5:05 PM

If you have an asset that you really need to execute in a different way, as in your case, the best way to do it actually is to encapsulate in a job

Alec Koumjian

07/11/2023, 5:09 PM

I see, this is an efficiency thing for how the graph automatically groups related asset runs together, yes? Is it correct that jobs don't take advantage of freshness policies and auto-materialization? I may end up moving my whole project to use the

k8s_job_executor

. While I dislike the overhead of spinning up the containers, this let's me avoid having to define jobs, sensors, and schedules and instead use auto-materialization and

op_tags

in order to specify pod resource requirements at the asset level.

Philippe Laflamme

07/11/2023, 5:14 PM

FWIW: I had a similar question a while back. I ended up writing my own executor that looks at the tags on the steps of a run to pick the executor to use for the whole run. It defaults to

multiprocess

executor unless at least one step has a custom tag on it that picks the

k8s

executor. It’s still for the whole run, but this at least allows having the normal case not pay the extra cost of the exceptional cases (launching k8s jobs per step). The file is ~~100~~ 70 lines long, so it’s not too bad; but it does use “private” methods from a few dagster packages, so it may break when upgrading

Alec Koumjian

07/11/2023, 5:16 PM

Thanks @Philippe Laflamme, that makes a lot of sense.

Philippe Laflamme

07/11/2023, 5:20 PM

But I fully agree with your original point: we can specify when an

@asset

should be materialized in a declarative way using freshness policies and auto-materialization. This is great and avoids the need to think about ops and jobs entirely. But when you want to specify how the asset should be materialized, then you’re forced to declare this elsewhere which is unfortunate.

Alec Koumjian

07/12/2023, 3:17 PM

Yes, I think for now it may work for us to live with the overhead for assets that don't absolutely need separate containers to use them. Using

op_tags

with the k8s executor is really nice since I can declare cpu/ram/disk individual assets need.

Open in Slack

Previous Next