Andy Carter
03/06/2023, 12:27 PMDAGSTER_HOME
(DH) as I am a little confused. Do I have this right?
- Without DH set, dagster outputs logs to a local tmp directory, which is cleaned up after dagit shuts down. No info regarding runs is persisted when dagit is shut down.
- Not setting DH is useful when you are still developing/breaking things and don't care about logging long term.
- DH should probably be a directory outside of your python project, because...
- dagster.yaml
should not be under change control'; it's something you can customize on the fly for each deployment location, more like a .env
file in that sense.
- It's common to have DH set but NOT create a dagster.yaml as some kind of halfway between testing and deployment.
- DH is both
1) a place to keep persistent data between dagit sessions, AND
2) a place to find dagster.yaml
, which can redirect dagster data & logs that would otherwise go to DH to other IO sinks (postgres/S3 etc).chris
03/06/2023, 9:51 PMVinnie
03/07/2023, 12:13 PMdagster.yaml
shouldn’t be in change control?
Here’s my current setup in full OSS, roughly (might be forgetting a thing or two):
• dagster_deployment
image with dagit, the daemon, dagster.yaml
and workspace.yaml
, DAGSTER_HOME
is set within the container
• project/user code locations don’t have DAGSTER_HOME
set or a dagster.yaml
file, as they just run the gRPC and all logs and storage are managed by the central dagster instance
• docker-compose
for local development that mimics my entire prod/staging environment: overrides my dagster.yaml
from the dagster_deployment
package, especially relevant for the RunLauncher
. Storage is a postgres spun up with docker-compose and persisted in my local machine
• for quicker iteration/whenever I don’t need to see how the whole system will run, I might just run dagster dev
from a project folder, in this case I have my DAGSTER_HOME
set to a location in my disk as well, but only because I needed it for DynamicPartitions
(IIRC)