Hi there, is it possible to trigger a sensor that ...
# integration-dbt
m
Hi there, is it possible to trigger a sensor that would start a run of a dbt pipeline to create some materialized views by checking for new data in a table in the database? The "source tables" from which the materialized views will be created are created and populated from other pipelines, not part of the dbt framework.
dagster bot responded by community 1
How can a sensor query the database without having a resource defined, like for an op?
a
In our case, we are setting the bq_client, dataset and table names at the time of creating the sensor.
Copy code
def gen_dbquery_sensor(bq_client, bq_dataset, job_to_run):
    @sensor(
        job=job_to_run,
        name="db_query_sensor"
    )
    def _sensor():
        query = ""
        query_job = bigquery_functions.run_query(bq_client=bq_client,dataset=bq_dataset, query=query)
        query_job.result()
        if <your condition is satisfied>:
            yield RunRequest(
                run_key=<your_unique_run_id>
            )
    return _sensor
m
Thanks for the feedback - so do you then use
gen_dbquery_sensor()
to define the final sensors elsewhere, like in the
Definitions
?
a
Correct! 🙂
m
Great, thanks. I think my case is a bit more convoluted as I also have different config that I need to supply to the resource to connect to the postgres database (ie. there are 3 databases to update, each with a different config file with connection details). Normally, with schedules, I create a schedule per database and supply the config file with the connection details at run time. Just not sure if this is possible with a sensor either.
f
Hello @Megan Beckett, are you using new (pythonic) Resources or the legacy Resources?
For new pythonic Resources, you can follow Using resources in sensors in the docs. For legacy Resources, you can check an older version (1.2.7) of the docs.
For legacy resources that need to be configured, you can simply provide a "configured" resource to
build_resources
m
Thanks @FĂ©lix Tremblay, that is useful. I am using the legacy resources. I don;t quite understand how to use the
build_resources
. I provide the config for the resurces at run time with a schedule normally using a yaml file. I'm going to investigate the new pythonic Resources. Just not sure if it's still possible to only supply the required config to connect to the resource at run time using a yaml file... ie. I don;t think I want to have to provide the resource to the
Definitions
as the connection details differ depending on the run required (ie. connecting to different databases).
f
Can you explain the logic behind providing the configs at run time? And many different sets of configs do you have?
Depending on the use case and the number of configs to manage, there would be different approaches to consider
If there are not too many sets of configs, I would create a configured resource for each of them and choose descriptive names. Then in the sensor logic you can choose which resource to "build"
Even if there are many sets of configs, you could read the yaml file(s), then create, configure, and add to the Definition all of the resources programmatically with a few lines of code. Afterwards you can import them in the file where your sensors are defined. If there's a large number of them you could store them in a dictionary to access them more easily
Another different approach would be to design a "master" resource that gets configured once, and can connect to the appropriate destination at run time. However the first approach would be more simple and easier to debug
In any case, I currently don't see why you would want to actually configure the resource at run time
m
We have three different databases - dev, staging and prod - and each need to be updated with the data pull from an external source and then dbt models run. So, I have a yaml config file with the database connection details for each postgresql database (resource) that is supplied to the schedules like so:
Copy code
@schedule(job=update_db_timeseriesdata_dhis2_job, execution_timezone='Africa/Johannesburg', cron_schedule="0 1 28 * *")
def demo_health_timeseriesdata_dev():
    return RunRequest(
        run_key=None,
        run_config=config_from_files(
        [
            file_relative_path(__file__, "config/demo_health/database_config_dev.yaml"),
            file_relative_path(__file__, "config/demo_health/op_config_timeseries.yaml")
        ]
    ),
    )
There is also op config that is supplied at run time as the op config varies depending on the type of data pull. So, I have just found this the easiest way. I want to change these to sensors though, rather than schedules. Hence, my questions. But, it looks like a lot of refactoring to move to the new pythonic resources and the tutorial in the Dagster docs for migrating hasn't been that useful as there are several pieces that aren't explained or are not clear to me yet.