Hi all, is there an easy way to configure the `loc...
# ask-community
j
Hi all, is there an easy way to configure the
local_artifact_storage
for the entire instance when deploying on kubernetes through helm?
d
Hi Jack - can you share more about the use case here? Typically the filesystem isn't shared between different pods in kubernetes - so i'd be worried that even if this was an option it would be very easy to lose the artifacts that are stored there, unless you are doing some fancy things with volumes to persist them
j
Hi Daniel. We are processing large image data (~300GB). Our data lives on a distributed storage server (ceph), but is copied to our worker nodes for processing so that we can make use of local NVMe drives. I have set up a custom run launcher that will allocate a new PVC at the start of each run and mount it in the job pod. I want to set the local artifact storage path for each of these jobs to the mounted PVC path. There doesn't seem to be an easy way to do this so I settled on just setting the dagster home globally.
d
I see - another thing you could do is configure the IO manager in code to point at the directory that you want?
j
Yeah that's an option, but it's preferred to be able to configure this at the deployment level
d
makes sense - I don't think we expose local_artifact_storage in the helm chart currently - but if you file an issue here we can incorporate it into future improvements: https://github.com/dagster-io/dagster/issues
j
Yeah I haven't seen it in the templates but I'll go ahead and file that. I also reported an issue recently regarding the dagster home path in the user deployment being overridden by the global dagster path. https://github.com/dagster-io/dagster/issues/13669
It would also be great to get some more documentation on the config data being passed to the run launcher. For instance,
dagster_home
is passed in, but I'm not clear why as I would expect the configured dagster home in the user deployment to take precedence
d
I think that dagster_home parameter may not actually be used anymore by the run launcher anymore
j
I think it is. I have to double check but I believe the value I passed there showed up as the value for the DAGSTER_HOME env variable on the job. But strange enough, the job still used the global dagster home path
d
It may set the environment variable, but I don’t think anything in the run worker will read from it when loading the instance - the instance is serialized and passed into the run worker command instead
j
Ok and when you say instance, you're talking about the dagster.yaml config?
d
That’s right
j
Ok I guess this the point of confusion. Should the global instance configuration be used in the run? I can see certain values being necessary, like the postgres config. But things like resource requests and storage paths should l, imo, be determined by the code location / user code deployment
d
There are docs on how to set up configuration for certain fields here: https://docs.dagster.io/deployment/guides/kubernetes/deploying-with-helm#step-61-configure-the-deployment
But local artifact storage locations can’t currently be set per code location in the helm chart
j
Gotcha. Yeah I've been through those docs. Maybe I'm misinterpreting what's going on. Is the config for the user code deployment merged with the global config when the job launches?
d
that's right - the run launcher config is the base, and locations can override it
j
Ok that makes sense. I'm gonna run a quick test to confirm whether that's working as expected.
Ah this just reminded me of another issue lol. Is there a way to set the job-specific configuration for the code location?
d
There's not currently a way to do that, no
that needs to go in @job tags currently
j
Gotcha. So seems like, at the moment, most job specific configuration should go in the job definition right?
d
that's right, yeah
j
Ok cool. Well you've answered my many questions. Thanks! The only unexplained issue I still see is the dagster home problem. If that were fixed then the local artifact storage would be less of a concern since that defaults to the dagster home path
d
i'm still a little unclear what's going in your runs that requires DAGSTER_HOME to be set to any particular value
could you use a different environment variable instead that you set?
j
So each if our ops produced output that needs to be saved to disk. We use the op context to determine the save path. That path will point to the location determined by the local artifact storage (dagster home by default)
d
Ah ok i think that's the confusion - changing the value of DAGSTER_HOME in the run worker won't change that path.
(since the run worker doesn't check DAGSTER_HOME at all - for that or any other reason)
j
Gotcha. So what is the purpose of DAGSTER_HOME in the code location deployment?
d
I think it has no purpose.
And is only around for backwards compatibility
j
Lol oh. Well that would explain the behavior
Well, I think it would make more sense for the run to use the value configured in the user code location. That would keep the behavior consistent with the base config + location config behavior
d
Fair enough - but I think the most likely next step would be to remove the environment variable altogether, since it isn't used
as opposed to tweaking the way in which it is set in order to not be used
j
That's fine. And would that be replaced with an option to configure local artifact storage at the code location?
d
potentially - I don't think i've heard that particular request before, but certainly to start being able to set it on the instance at all from the helm chart makes sense
j
Cool. I'll open an issue then so it's on your radar.
🙏 1
Last question: If we wanted to bundle dagster in a desktop app, is there a recommended way to do that?
d
would it be possible to make a new post for that one
j
Yup, np
202 Views