https://dagster.io/ logo
Title
j

Jack Zeitoun

05/21/2023, 8:31 PM
Hi all, is there an easy way to configure the
local_artifact_storage
for the entire instance when deploying on kubernetes through helm?
d

daniel

05/22/2023, 5:27 PM
Hi Jack - can you share more about the use case here? Typically the filesystem isn't shared between different pods in kubernetes - so i'd be worried that even if this was an option it would be very easy to lose the artifacts that are stored there, unless you are doing some fancy things with volumes to persist them
j

Jack Zeitoun

05/22/2023, 5:36 PM
Hi Daniel. We are processing large image data (~300GB). Our data lives on a distributed storage server (ceph), but is copied to our worker nodes for processing so that we can make use of local NVMe drives. I have set up a custom run launcher that will allocate a new PVC at the start of each run and mount it in the job pod. I want to set the local artifact storage path for each of these jobs to the mounted PVC path. There doesn't seem to be an easy way to do this so I settled on just setting the dagster home globally.
d

daniel

05/22/2023, 5:39 PM
I see - another thing you could do is configure the IO manager in code to point at the directory that you want?
j

Jack Zeitoun

05/22/2023, 5:40 PM
Yeah that's an option, but it's preferred to be able to configure this at the deployment level
d

daniel

05/22/2023, 5:41 PM
makes sense - I don't think we expose local_artifact_storage in the helm chart currently - but if you file an issue here we can incorporate it into future improvements: https://github.com/dagster-io/dagster/issues
j

Jack Zeitoun

05/22/2023, 5:43 PM
Yeah I haven't seen it in the templates but I'll go ahead and file that. I also reported an issue recently regarding the dagster home path in the user deployment being overridden by the global dagster path. https://github.com/dagster-io/dagster/issues/13669
It would also be great to get some more documentation on the config data being passed to the run launcher. For instance,
dagster_home
is passed in, but I'm not clear why as I would expect the configured dagster home in the user deployment to take precedence
d

daniel

05/22/2023, 5:48 PM
I think that dagster_home parameter may not actually be used anymore by the run launcher anymore
j

Jack Zeitoun

05/22/2023, 5:50 PM
I think it is. I have to double check but I believe the value I passed there showed up as the value for the DAGSTER_HOME env variable on the job. But strange enough, the job still used the global dagster home path
d

daniel

05/22/2023, 5:52 PM
It may set the environment variable, but I don’t think anything in the run worker will read from it when loading the instance - the instance is serialized and passed into the run worker command instead
j

Jack Zeitoun

05/22/2023, 5:54 PM
Ok and when you say instance, you're talking about the dagster.yaml config?
d

daniel

05/22/2023, 5:55 PM
That’s right
j

Jack Zeitoun

05/22/2023, 6:00 PM
Ok I guess this the point of confusion. Should the global instance configuration be used in the run? I can see certain values being necessary, like the postgres config. But things like resource requests and storage paths should l, imo, be determined by the code location / user code deployment
d

daniel

05/22/2023, 6:01 PM
There are docs on how to set up configuration for certain fields here: https://docs.dagster.io/deployment/guides/kubernetes/deploying-with-helm#step-61-configure-the-deployment
But local artifact storage locations can’t currently be set per code location in the helm chart
j

Jack Zeitoun

05/22/2023, 6:06 PM
Gotcha. Yeah I've been through those docs. Maybe I'm misinterpreting what's going on. Is the config for the user code deployment merged with the global config when the job launches?
d

daniel

05/22/2023, 6:07 PM
that's right - the run launcher config is the base, and locations can override it
j

Jack Zeitoun

05/22/2023, 6:14 PM
Ok that makes sense. I'm gonna run a quick test to confirm whether that's working as expected.
Ah this just reminded me of another issue lol. Is there a way to set the job-specific configuration for the code location?
d

daniel

05/22/2023, 6:17 PM
There's not currently a way to do that, no
that needs to go in @job tags currently
j

Jack Zeitoun

05/22/2023, 6:18 PM
Gotcha. So seems like, at the moment, most job specific configuration should go in the job definition right?
d

daniel

05/22/2023, 6:18 PM
that's right, yeah
j

Jack Zeitoun

05/22/2023, 6:22 PM
Ok cool. Well you've answered my many questions. Thanks! The only unexplained issue I still see is the dagster home problem. If that were fixed then the local artifact storage would be less of a concern since that defaults to the dagster home path
d

daniel

05/22/2023, 6:23 PM
i'm still a little unclear what's going in your runs that requires DAGSTER_HOME to be set to any particular value
could you use a different environment variable instead that you set?
j

Jack Zeitoun

05/22/2023, 6:25 PM
So each if our ops produced output that needs to be saved to disk. We use the op context to determine the save path. That path will point to the location determined by the local artifact storage (dagster home by default)
d

daniel

05/22/2023, 6:25 PM
Ah ok i think that's the confusion - changing the value of DAGSTER_HOME in the run worker won't change that path.
(since the run worker doesn't check DAGSTER_HOME at all - for that or any other reason)
j

Jack Zeitoun

05/22/2023, 6:27 PM
Gotcha. So what is the purpose of DAGSTER_HOME in the code location deployment?
d

daniel

05/22/2023, 6:27 PM
I think it has no purpose.
And is only around for backwards compatibility
j

Jack Zeitoun

05/22/2023, 6:28 PM
Lol oh. Well that would explain the behavior
Well, I think it would make more sense for the run to use the value configured in the user code location. That would keep the behavior consistent with the base config + location config behavior
d

daniel

05/22/2023, 6:29 PM
Fair enough - but I think the most likely next step would be to remove the environment variable altogether, since it isn't used
as opposed to tweaking the way in which it is set in order to not be used
j

Jack Zeitoun

05/22/2023, 6:31 PM
That's fine. And would that be replaced with an option to configure local artifact storage at the code location?
d

daniel

05/22/2023, 6:31 PM
potentially - I don't think i've heard that particular request before, but certainly to start being able to set it on the instance at all from the helm chart makes sense
j

Jack Zeitoun

05/22/2023, 6:32 PM
Cool. I'll open an issue then so it's on your radar.
🙏 1
Last question: If we wanted to bundle dagster in a desktop app, is there a recommended way to do that?
d

daniel

05/22/2023, 6:33 PM
would it be possible to make a new post for that one
j

Jack Zeitoun

05/22/2023, 6:33 PM
Yup, np