Shane Kalepp
02/29/2024, 10:17 PMScout
02/29/2024, 10:17 PMget_s3_keys
function provided by the dagster_aws.s3.sensor
module. Here's a step-by-step guide based on the documentation: 1. Import the necessary functions and decorators from Dagster: python from dagster_aws.s3.sensor import get_s3_keys from dagster import sensor, RunRequest, SkipReason
2. Define your sensor function using the @sensor
decorator. Inside the function, use get_s3_keys
to check for new files in the S3 bucket. If new files are found, create RunRequest
objects for each new key and update the sensor's cursor to the last key processed. Here's an example sensor definition from the documentation: python @sensor(job=my_job) def my_s3_sensor(context): since_key = context.cursor or None new_s3_keys = get_s3_keys("my_s3_bucket", since_key=since_key) if not new_s3_keys: return SkipReason("No new s3 files found for bucket my_s3_bucket.") last_key = new_s3_keys[-1] run_requests = [RunRequest(run_key=s3_key, run_config={}) for s3_key in new_s3_keys] context.update_cursor(last_key) return run_requests
In this example: - my_job
is the job that you want to run when new S3 keys are detected. - my_s3_bucket
is the name of the S3 bucket you are monitoring. - since_key
is the cursor that keeps track of the last processed S3 key to ensure idempotence. - new_s3_keys
is a list of new S3 keys detected since the last cursor position. - run_requests
is a list of RunRequest
objects that the sensor will return to trigger runs for each new S3 key. - context.update_cursor(last_key)
updates the sensor's cursor to the last processed key. 3. Attach the sensor to your repository definition so that Dagster knows to evaluate it periodically. Remember to replace my_job
with the actual job you want to trigger and my_s3_bucket
with the name of your S3 bucket. The run_config
in the RunRequest
can be customized to pass specific configurations to the job when it runs. For more examples of sensors, including those that interact with S3, you can refer to the Hacker News example provided in the documentation.