https://dagster.io/ logo
Title
t

Tiri Georgiou

04/30/2021, 1:16 PM
Hi there, I have a couple (quite dense) questions to do with project structure and docker deployment on ECS. I have two projects I want to deploy in their own container (from this idea but extending docker_example_pipelines service in docker-compose to maybe docker_project_A, docker_project_B). The logic here would be to have each project (which would be its own package like here) in its own running container as a gRPC server: Root/ src/ project_A/ my_packageA/ init.py repository.py solids/ pipelines/ … my_package_testA/ setup.py requirements.txt  workspace-projectA.yaml # For local development purposes project_B/ my_packageB/ init.py repository.py solids/ pipelines/ … my_package_testB/ setup.py requirements.txt  workspace-projectB.yaml # For local development purposes dagster.yaml # This will be copied over in the docker images docker-compose.yaml Dockerfile_dagster Dockerfile_projectA Dockerfile_projectB workspace.yaml # This will be copied over in the docker images I suppose my argument here is if we containerise both project_A (Dockerfile_projectA), project_B (Dockerfile_projectB) separately (in ECS say) if someone does a blunder and pushes broken code in project_B my project_A repository with all its pipeline code will still run and not fall apart? Is this a more standard approach to a multiple project based workflow for dagster? I watched a video of your talk

here

(33:40) about reloading repository but I felt the example was too simple having two separate repository modules with all logic within them, doesn’t really extend to larger projects so well. Question 2, extending the above, if we make a change to project_A and add another pipeline say I would have to rebuild the image and push that. I’ve read about some git-sync options with Kubernetes (but not really an option with Kubernetes for us), would an alternative method be to set up a volume like EFS to both containers holding project_A, project_B (read only) and then just have some CI/CD setup to push to EFS for every change? I’m hoping this might solve the problem of having to rebuild the image every time there is a change? Sorry for the longwinded questions. Would really appreciate the feedback. Thanks.
d

daniel

04/30/2021, 2:07 PM
Hi Tiri - for Question 1, the two repositories will be able to operate independently. An error in project_B shouldn't affect project_A as long as project_A doesn't depend on the broken code. For question 2 - You'll need to restart the container whenever the underlying code changes (so that it can reload the code), but there's no requirement in Dagster that you rebuild the image.
t

Tiri Georgiou

04/30/2021, 2:11 PM
So for questions 2, if I have my container with the pipeline code mounted to a volume and I persist data to that volume all I would need to do is refresh in the dagit ui? Thanks for this!
d

daniel

04/30/2021, 2:41 PM
Almost - refreshing in the dagit UI doesn't restart the container. You'd need to restart the container yourself, then dagit should show a prompt that lets you refresh with a single click.
😛artydagster: 1
a

Akshat Nigam

08/04/2021, 6:26 PM
@daniel won’t calling the GraphQL reloadWorkspace mutation will sync up the code? Or even just triggering the pipeline execution. Because I tried it with my gRPC server configuration, and everytime it was fetching all the changes I was doing in my pipeline code, regardless of refreshing it from UI or calling the GraphQL Mutation, it was executing only the latest code in pipeline. If by “underlying code” you mean something else than pipeline code, than my statement above is not correct.
d

daniel

08/04/2021, 6:31 PM
Hi Akshat - my understanding is that if you're running your own gRPC server ( you have
grpc_server:
in your workspace.yaml and are separately running your own server with
dagster api grpc
), and you call reloadWorkspace, it doesn't re-load your code on the server. If you are letting dagit manage your gRPC server (your workspace.yaml contains
python_file:
or
python_package:
) then calling reloadWorkspace will restart the server and thus reload your code. Let me know if that's different than what you're seeing - we'd like to make this part of the system clearer.
a

Akshat Nigam

08/04/2021, 7:26 PM
AFAIK, I am sure it was the first configuration, I did on my local computer, with both of them running on different ports. Question. 1 Also, does dagster spin up its own grpc server internally in the 2nd case you mentioned above? Question. 2 And, if in 2nd approach, restarts the server. Will code reload still occur on calling reloadWorkspace, if the code is on EFS mounted as PV/PVC to Kubernetes Dagster?
d

daniel

08/04/2021, 7:31 PM
Yeah, in the 2nd case dagster runs its own gRPC server, I’m not deeply familiar with EFS but I don’t think that would make a difference. If you have exact repro steps for reloadWorkspace reloading your code despite not restarting your gRPC server I’d be happy to take a closer look
a

Akshat Nigam

08/04/2021, 8:14 PM
Ok. I tried it a while back so can’t give you repro now. But we are going to do the reloadWorkspace approach with EFS and dagit managed grpc with
python_package:
so I can get back to you with the result, if it works or not.
Thanks for the quick responses though.
I have one more question that I will ask in the channel.