Hi I m new to data engineering and I m looking for a push in dagster #announcements

Hi-- I'm new to data engineering and I'm looking f...

03/30/2021, 4:26 AM

Hi-- I'm new to data engineering and I'm looking for a push in the right direction. I have been tasked with writing a system where our users (chemists) will upload a set of experiments to the cloud, where we'll run simulations and report back. I think dagster is a good fit for running the simulation/analysis pipeline. I'm thinking: I'll write a Flask app to upload the data to a GCP bucket, then use the GraphQL api to start the pipeline. The client would then long-poll the GraphQL api until the job completes. Is this roughly the right approach? Any advice?

Rubén Lopez Lozoya

03/30/2021, 6:43 AM

Although this seems like a reasonable approach to me, you might want to have a look at sensors. These are designed so that they can detect new files being added to your bucket and trigger your desired pipeline accordingly: https://docs.dagster.io/concepts/partitions-schedules-sensors/sensors

sashank

03/30/2021, 7:01 AM

I agree with @Rubén Lopez Lozoya, sensors would be a great fit here. Dagster will take care of all the polling and run launching for you, and you'll get a specialized monitoring UI for free

03/30/2021, 5:13 PM

Say I upload a file to a bucket from the client-side. How would I notify the client when the pipeline terminates? Is there a way to query for pipeline runs associated triggered by that file?

Agon Shabi

04/01/2021, 10:47 PM

Could you access the file's metadata to identify the original uploader, and hence the target for your notification?

Raghu

04/02/2021, 1:40 PM

Just an open thought- in your pipeline set up a flag and when there is a successful drop of a file - yoy can trigger a py script to send email of the location/file name or send the same via webhooks to slack I’d that’s your communicator.

2 Views

Open in Slack

Previous Next