Should I use sensors or schedules for polling a data source dagster #ask-ai

Join Slack

Should I use sensors or schedules for polling a da...

# ask-ai

Simon Ammer

03/06/2024, 9:23 AM

Should I use sensors or schedules for polling a data source every x minutes?

Scout

03/06/2024, 9:23 AM

When deciding whether to use sensors or schedules for polling a data source at regular intervals, it's important to consider the nature of the task and the capabilities of each tool. Schedules in Dagster are well-suited for tasks that need to occur at predictable time intervals. If you want your data to be updated every morning or at a specific time, for example, schedules are a traditional choice. They allow you to specify how often and when you want a job to run, such as daily, weekly, or hourly, and support any type of cron scheduling. Here's an example from the documentation on how to define a basic schedule:

Copy code

python from dagster import AssetSelection, define_asset_job, ScheduleDefinition  asset_job = define_asset_job("asset_job", AssetSelection.groups("some_asset_group"))  basic_schedule = ScheduleDefinition(job=asset_job, cron_schedule="0 9 * * *")

Sensors, on the other hand, are designed to trigger jobs based on some external change or event, such as a new file in an S3 bucket or a system status change. They periodically check for these changes and can be used to kick off jobs when a specific condition is met. For a use case where you need to poll an API every few minutes, you could technically use either a schedule or a sensor. A schedule would be appropriate if you want to ensure that the polling happens at exact, regular intervals, regardless of whether there is new data. A sensor could be used if you want to check for new data and only trigger a job if there is new data to process. If the polling is simply meant to happen at regular intervals and is not dependent on an external event, a schedule might be more straightforward to implement. However, if you want to add logic to only trigger the job when there is new data, or if the polling needs to be more dynamic based on other conditions, a sensor might be more appropriate. The documentation provides examples of both schedules and sensors, but it does not explicitly state which to use for polling every x minutes. The choice between using a schedule or a sensor for polling a data source every x minutes would depend on the specific requirements of your workflow and whether you need the flexibility to trigger based on conditions other than time.

27 Views

Open in Slack

Previous Next