Random question: Does dagster support webhook even...
# random
t
Random question: Does dagster support webhook events from 3rd party services? Usecase: My goal is to write a solid to call a fivetran connector and then wait until the fivetran sync (to snowflake) is complete. When I look at the airflow-provider-fivetran sensor, it looks like it uses long polling,
poke_interval: Time in seconds that the job should wait in between each tries
, to check the status of the sync job. My goal: fivetran POST webhooks/events to my dagster application, so dagster is told when a sync is done (instead of long polling to see if a sync is done). This obviously adds complexity to the dagster side: I’d need the solid to store state (in a db), that links the solid instance with the fivetran sync job. Then the dagster “application” would need a webserver to receive the webhook, lookup that state, and then complete the corresponding solid. What this sounds like: the ability to add a small Flask application inside of dagster, to handle custom processing (dare I say “serverless” from the perspective of the data engineer). Does anything close to this exist in dagster? Or should I build a custom app which handles all of this and calls the dagster api?
a
Dagster has its own variant of sensors thats a little different than airflows https://docs.dagster.io/concepts/partitions-schedules-sensors/sensors#sensors but nothing for receiving webooks at this time. You would need to have your own webserver receive the hook and call the dagster apis. Though I am curious, what is the reason polling does not work for your use case?
t
@alex thanks for the quick response! that 100% makes sense. polling - for the mvp (and further), polling will work fine and I’ll probably launch with that. But I know I’m going to get asked “Why is this thing waiting 30 seconds to see if its done? Can we speed it up by reversing the responsibilities?” (The quick fix is just drop the polling interval, but that becomes chatty on the fivetran side) note - I understand there is a ton of logic in the code which receives the webhook (what data do you parse out if it? verifying the signature of it? what happens when you receive the webhook? what state are you storing in the db? etc etc). So I’d be crazy to think there should be some opinionated “fivetran_webhook_sensor” functionality. the question really becomes - if need to write custom “webserver” code, is there a way for that code to live inside my dagster app, which would make life easier from a deployment/monitoring/hosting perspective. a little bit more of a monolithic approach, versus a microservice approach (monolith = flask app inside dagster app, micorservice = sibling app which calls dagster api). btw - thanks for all the hard work. this is my first time really getting into data engineering (coming from product engineering), and I’m pretty stoked on dagster’s philosophy to data platforms. :)
a
yea we’ve brainstormed a bit on the idea in the past. A sibling “instigation policy” to sensors & schedules that would work by providing simple endpoints in
dagit
that could be used for • receiving webooks • dead simple curl / api requests • a autogenerated web form for non technical users
m
Chiming in to say I have a very similar use case. We’re exporting data from Braze, but the API doesn’t support polling. It only allows you to pass in a webhook that will be called once the export is complete.
t
@alex - that makes a lot of sense! i’ll keep you updated on my side with how my flask app works out, and then that’ll be something tangible to look at as one usecase for an “instigation” module
a
let me file a GH issue and you all can chime if you like
t
👍