Hey all, I was wondering about 3 issues: 1. Is the...
# ask-community
k
Hey all, I was wondering about 3 issues: 1. Is there a way to disable daemons selectively? In my case, I only need the run queue daemon 2. By default, dagster will place all kinds of storages into the DAGSTER_HOME directory. Is there a way to place all of these storages into a custom directory? 3. Similar to 2, but is it even possible to explicitly ask for an ephemeral instance in
dagster.yaml
(such that run concurrencies can be defined, but no storages are persisted to disk)? Thanks a lot in advance!
d
At least for 2. I just came across
custom_path_fs_io_manager
which you can set to put things in a custom path
k
Thanks @Daniel Mosesson! Does that make the folders in the
DAGSTER_HOME
directory disappear as well?
d
As I understand it, what is going on is that when you want to pass data between ops, if you are using the multi process executor, dagster needs to store the "things" you are passing somewhere so the other op can get it. It does not impact anything else. If you want to really remove those files from ever getting created, you might have to use the in_process executor. What other folders are you talking about?
k
I know that the storage locations are configurable, but it seems like I can either have an environment variable OR a path as the
base_dir
. Desirable would be something like, i.e., a mix of environment variable and path. Is sth like that possible?
Copy code
run_storage:
  module: dagster.core.storage.runs
  class: SqliteRunStorage
  config:
    base_dir: 
      env: LOCALAPPDATA/dagster/history
d
I'd be curious to know if it works, but I have no idea 🙂
k
Copy code
What other folders are you talking about?
The folders
.logs_queue
,
history
,
logs
, and
schedules
-- all of them are created in the same directory where my
dagster.yaml
sits and that
DAGSTER_HOME
points to
@owen can you help? Thanks so much 🙂
o
hi @Kobroli -- for the first question, this is not possible unfortunately (it's really just one daemon that does a bunch of things). for the other questions, I'm curious about your use case -- is there a requirement to keep DAGSTER_HOME stable in these scenarios?
for the env var configuration, there's no "in between" state (although I see how that would be useful here). If you really don't need any persisted data, you can probably circumvent this by mimicking the ephemeral dagster instance (as you mention). You can take a look at all the classes the ephemeral instance uses here: https://sourcegraph.com/github.com/dagster-io/dagster/-/blob/python_modules/dagster/dagster/core/instance/__init__.py?L347:9
k
That's helpful, thank you! As for my use case, I'd like to ship an application (that communicates with the dagster instance) to my end users, but I'd like for every user to see their own run history, if any (which is not possible if
DAGSTER_HOME
is set to one directory which every user gets to see, hence the
LOCALAPPDATA
environment variable in
dagster.yaml
which is user-specific)
I'm aware that I could just leave
DAGSTER_HOME
unset, but that would block me from using the
dagster-daemon
which is required in my case. It'd be cool if dagster offered a bit more flexibility in this matter (e.g. one central
base_dir
that all storages use, or the option to choose an ephemeral instance, or an option to configure the daemon and the storages separately), but I totally understand that that might not be high prio 🙂
Copy code
you can probably circumvent this by mimicking the ephemeral dagster instance
This sounds interesting -- do you mean mimicking it via
dagster.yaml
or programmatically? For the latter, it'd be helpful if you could point me to a starting point 🙂
o
ahh I see what you mean -- and now that I think about it, mimicking the ephemeral dagster instance (by which I meant setting values in
dagster.yaml
) wouldn't quite work, as some of the daemon operations depend on having access to the same run storage as dagit has (and if this run storage was in memory, then there would be no way to accomplish this).
on the point of the
$DAGSTER_HOME
thing, I was actually thinking that each user might have their own individual
DAGSTER_HOME
(roughly the same path as whatever
LOCALAPPDATA
was going to be). This would require N copies of that
dagster.yaml
file (or at least some sort of symlink setup), but depending on how much control you have over those things, that might work out ok.
just sort of spewing out options (maybe one will be appealing), but if you really want to customize the configuration for these classes, you could potentially subclass the existing implementations (for example, with the run storage you could override the `from_config_value()` function). probably more effort than it's worth but I figured I'd at least mention it
and yeah this use case is admittedly not super high priority at the moment -- we don't have a ton of users doing this sort of 1 machine: N users thing, and those that do generally just maintain a single dagster instance which has separate repository locations for each user to keep things organized