Hi guys! Thanks for your work on dagster, everything looks really cool š
Me and
@matas are trying to deploy dagster at
bestplace.ai to facilitate pipeline creation and execution for our analysts.
Analysts don't like to write python code, they want to create pipelines described in yaml configuration files š So we want to build system around dagster to make this possible.
We deployed dagster using docker and use such containers:
- dagster-master with dagit process;
- several dagster-worker containers with dagster-celery executor;
- rabbitmq container for communication between master and workers;
- postgres and minio (s3) containers for persistent storage.
Dagster-master and dagster workers use such directory structure:
solids/
pipelines/
repository.py
repository.yaml
solids - directory with solids library, i.e. solids written in python (in future we will add jupyter solids);
pipelines - directory with yaml pipelines, that use solids from solids library;
repository.py contains define_repo() function that reads all yaml files and creates pipelines from them (I used code similar to this:
https://github.com/dagster-io/dagster/blob/master/examples/dagster_examples/dep_dsl/pipeline.py);
repository.yaml refers to define_repo() in repository.py.
Everything works mostly fine but we encountered several problems and have questions that we would like to discuss with you.
1. We want users to add their own yaml pipelines to dagit. Then everyone should be able to explore and execute these pipelines. So what is best way to reload yaml pipelines to dagit? As I understand pipelines are reloaded when user clicks reload button in dagit UI. Is this the best way possible? What is happenning after click on reload button? Will define_repo() function be called as it is called at dagit start?
2. What happens to already running pipelines when user clicks reload button in dagit UI? What if user starts execution of the pipeline and then immediately clicks reload? I tried to do this and it seemed that pipeline is stuck in infinitely running state. I attached screenshots with this problem.
3. Is it possible to inform user about error in his pipeline.yaml through dagit interface? I can catch exceptions that are raised during yaml pipelines creation but how can I show them in dagit?
Thanks