03/30/2021, 4:26 AM
Hi-- I'm new to data engineering and I'm looking for a push in the right direction. I have been tasked with writing a system where our users (chemists) will upload a set of experiments to the cloud, where we'll run simulations and report back. I think dagster is a good fit for running the simulation/analysis pipeline. I'm thinking: I'll write a Flask app to upload the data to a GCP bucket, then use the GraphQL api to start the pipeline. The client would then long-poll the GraphQL api until the job completes. Is this roughly the right approach? Any advice?

Rubén Lopez Lozoya

03/30/2021, 6:43 AM
Although this seems like a reasonable approach to me, you might want to have a look at sensors. These are designed so that they can detect new files being added to your bucket and trigger your desired pipeline accordingly:


03/30/2021, 7:01 AM
I agree with @Rubén Lopez Lozoya, sensors would be a great fit here. Dagster will take care of all the polling and run launching for you, and you'll get a specialized monitoring UI for free


03/30/2021, 5:13 PM
Say I upload a file to a bucket from the client-side. How would I notify the client when the pipeline terminates? Is there a way to query for pipeline runs associated triggered by that file?

Agon Shabi

04/01/2021, 10:47 PM
Could you access the file's metadata to identify the original uploader, and hence the target for your notification?


04/02/2021, 1:40 PM
Just an open thought- in your pipeline set up a flag and when there is a successful drop of a file - yoy can trigger a py script to send email of the location/file name or send the same via webhooks to slack I’d that’s your communicator.