Hi-- I'm new to data engineering and I'm looking f...
# announcements
j
Hi-- I'm new to data engineering and I'm looking for a push in the right direction. I have been tasked with writing a system where our users (chemists) will upload a set of experiments to the cloud, where we'll run simulations and report back. I think dagster is a good fit for running the simulation/analysis pipeline. I'm thinking: I'll write a Flask app to upload the data to a GCP bucket, then use the GraphQL api to start the pipeline. The client would then long-poll the GraphQL api until the job completes. Is this roughly the right approach? Any advice?
r
Although this seems like a reasonable approach to me, you might want to have a look at sensors. These are designed so that they can detect new files being added to your bucket and trigger your desired pipeline accordingly: https://docs.dagster.io/concepts/partitions-schedules-sensors/sensors
s
I agree with @Rubén Lopez Lozoya, sensors would be a great fit here. Dagster will take care of all the polling and run launching for you, and you'll get a specialized monitoring UI for free
j
Say I upload a file to a bucket from the client-side. How would I notify the client when the pipeline terminates? Is there a way to query for pipeline runs associated triggered by that file?
a
Could you access the file's metadata to identify the original uploader, and hence the target for your notification?
r
Just an open thought- in your pipeline set up a flag and when there is a successful drop of a file - yoy can trigger a py script to send email of the location/file name or send the same via webhooks to slack I’d that’s your communicator.