Hi there is it possible to trigger a sensor that would start dagster #integration-dbt

Hi there, is it possible to trigger a sensor that ...

Megan Beckett

06/01/2023, 12:45 PM

Hi there, is it possible to trigger a sensor that would start a run of a dbt pipeline to create some materialized views by checking for new data in a table in the database? The "source tables" from which the materialized views will be created are created and populated from other pipelines, not part of the dbt framework.

dagster bot responded by community 1

Megan Beckett

06/01/2023, 12:47 PM

How can a sensor query the database without having a resource defined, like for an op?

Abhishek Agrawal

06/01/2023, 12:57 PM

In our case, we are setting the bq_client, dataset and table names at the time of creating the sensor.

Copy code

def gen_dbquery_sensor(bq_client, bq_dataset, job_to_run):
    @sensor(
        job=job_to_run,
        name="db_query_sensor"
    )
    def _sensor():
        query = ""
        query_job = bigquery_functions.run_query(bq_client=bq_client,dataset=bq_dataset, query=query)
        query_job.result()
        if <your condition is satisfied>:
            yield RunRequest(
                run_key=<your_unique_run_id>
            )
    return _sensor

Megan Beckett

06/01/2023, 1:44 PM

Thanks for the feedback - so do you then use

gen_dbquery_sensor()

to define the final sensors elsewhere, like in the

Definitions

Abhishek Agrawal

06/01/2023, 2:23 PM

Correct! 🙂

Megan Beckett

06/01/2023, 2:29 PM

Great, thanks. I think my case is a bit more convoluted as I also have different config that I need to supply to the resource to connect to the postgres database (ie. there are 3 databases to update, each with a different config file with connection details). Normally, with schedules, I create a schedule per database and supply the config file with the connection details at run time. Just not sure if this is possible with a sensor either.

Félix Tremblay

06/01/2023, 2:53 PM

Hello @Megan Beckett, are you using new (pythonic) Resources or the legacy Resources?

Félix Tremblay

06/01/2023, 2:59 PM

For new pythonic Resources, you can follow Using resources in sensors in the docs. For legacy Resources, you can check an older version (1.2.7) of the docs.

Félix Tremblay

06/01/2023, 3:02 PM

For legacy resources that need to be configured, you can simply provide a "configured" resource to

build_resources

Megan Beckett

06/05/2023, 9:58 AM

Thanks @Félix Tremblay, that is useful. I am using the legacy resources. I don;t quite understand how to use the

build_resources

. I provide the config for the resurces at run time with a schedule normally using a yaml file. I'm going to investigate the new pythonic Resources. Just not sure if it's still possible to only supply the required config to connect to the resource at run time using a yaml file... ie. I don;t think I want to have to provide the resource to the

Definitions

as the connection details differ depending on the run required (ie. connecting to different databases).

Félix Tremblay

06/05/2023, 6:31 PM

Can you explain the logic behind providing the configs at run time? And many different sets of configs do you have?

Félix Tremblay

06/05/2023, 6:33 PM

Depending on the use case and the number of configs to manage, there would be different approaches to consider

Félix Tremblay

06/05/2023, 6:38 PM

If there are not too many sets of configs, I would create a configured resource for each of them and choose descriptive names. Then in the sensor logic you can choose which resource to "build"

Félix Tremblay

06/05/2023, 6:43 PM

Even if there are many sets of configs, you could read the yaml file(s), then create, configure, and add to the Definition all of the resources programmatically with a few lines of code. Afterwards you can import them in the file where your sensors are defined. If there's a large number of them you could store them in a dictionary to access them more easily

Félix Tremblay

06/05/2023, 6:46 PM

Another different approach would be to design a "master" resource that gets configured once, and can connect to the appropriate destination at run time. However the first approach would be more simple and easier to debug

Félix Tremblay

06/05/2023, 6:47 PM

In any case, I currently don't see why you would want to actually configure the resource at run time

Megan Beckett

06/06/2023, 8:34 AM

We have three different databases - dev, staging and prod - and each need to be updated with the data pull from an external source and then dbt models run. So, I have a yaml config file with the database connection details for each postgresql database (resource) that is supplied to the schedules like so:

Copy code

@schedule(job=update_db_timeseriesdata_dhis2_job, execution_timezone='Africa/Johannesburg', cron_schedule="0 1 28 * *")
def demo_health_timeseriesdata_dev():
    return RunRequest(
        run_key=None,
        run_config=config_from_files(
        [
            file_relative_path(__file__, "config/demo_health/database_config_dev.yaml"),
            file_relative_path(__file__, "config/demo_health/op_config_timeseries.yaml")
        ]
    ),
    )

There is also op config that is supplied at run time as the op config varies depending on the type of data pull. So, I have just found this the easiest way. I want to change these to sensors though, rather than schedules. Hence, my questions. But, it looks like a lot of refactoring to move to the new pythonic resources and the tutorial in the Dagster docs for migrating hasn't been that useful as there are several pieces that aren't explained or are not clear to me yet.

3 Views

Open in Slack

Previous Next