Rene Czepluch
09/04/2023, 7:56 AMdagster dev
and a production run with dagster-webserver
?Rene Czepluch
09/04/2023, 8:02 AMRene Czepluch
09/04/2023, 8:07 AMresources
and save the logs file there? seems like a good way to split between prod and dev:
https://docs.dagster.io/concepts/resourcesDB
09/04/2023, 4:49 PMdagster_dev.bat
set DAGSTER_HOME=C:\dagster\dev_home
call .venv\scripts\activate
dagster dev ... -p 8080
dagster_prod.bat
set DAGSTER_HOME=C:\dagster\prod_home
call .venv\scripts\activate
dagster dev ... -p 8081
Note that if you want to run both at the same time, you need to use different ports with the -p argument. Now you can navigate to localhost:8080 for dev, and localhost:8081 for prod.
It might also makes sense to keep the venvs separate, and use a package manager like pypoetry
to avoid dependency conflicts, but if that's too involved you can just leave everything in the same venv.
Later on, when constant availability becomes important, you might want to look into setting up docker to run dagster-webserver
, dagster-daemon
, and a gprc code server in separate, restartable containers. You may even want to include a postgres db container for run logs if you have many assets with many materializations, because the default sqlite db can get quite slow.
I cannot go the docker route in my current project due to how my external dependencies are set up, but there is a tutorial in the docs iirc. For now, the dagster_prod.bat
approach works fine for my needs.Rene Czepluch
09/05/2023, 6:42 AMRene Czepluch
09/05/2023, 6:45 AM.env
file? I also see you have a \dev_home and a \prod_home, it seems a little bute-force to have to different dagster installations.jamie
09/05/2023, 2:30 PMDB
09/06/2023, 9:43 AMDAGSTER_HOME
just tells dagster to make two separate run storages, one for dev and one for prod. I think this is exactly what you wanted?Rene Czepluch
09/07/2023, 10:19 AMDB
09/09/2023, 8:49 AMdagster/
โโ dev_home/
โ โโ dagster.yaml
โโ prod_home/
โ โโ dagster.yaml
โโ src/
โ โโ .venv/
โ โโ my_dag.py
โ โโ dagster_dev.bat
โ โโ dagster_prod.bat
If you want do use different data sources in dev and prod, you should look into resources and how to configure them using `EnvVar`: https://docs.dagster.io/concepts/resources
You can then set an additional environment variable in the batch scripts, e.g. SET DAGSTER_DATA_DIR=C:\dagster_data\prod_data
and SET DAGSTER_DATA_DIR=C:\dagster_data\dev_data
and configure your data source accordingly. So you end up with something like
# my_dag.py
# run with dagster dev -f my_dag.py
from dagster import (
AssetExecutionContext,
ConfigurableResource,
Definitions,
EnvVar,
asset,
)
class MyDataSource(ConfigurableResource):
data_dir: str
def read_data(self):
...
@asset
def data_from_source(context: AssetExecutionContext, my_source: MyDataSource):
<http://context.log.info|context.log.info>(f"Reading data from {my_source.data_dir}!")
data = my_source.read_data()
...
defs = Definitions(
assets=[data_from_source],
resources={
"my_source": MyDataSource(data_dir=EnvVar("DAGSTER_DATA_DIR")),
},
)
.venv
simply contains the virtual environment where you installed dagster. If you're not already using virtual environments for different projects, I would suggest having a look here: https://docs.python.org/3/tutorial/venv.htmlRene Czepluch
09/10/2023, 2:01 PM