Marko Obert

03/07/2024, 12:08 PM
@Scout How can I use an duckdb resource in an ops?


03/07/2024, 12:08 PM
We recommend using software-defined assets instead of ops when working with DuckDB in Dagster. However, if you specifically need to use a DuckDB resource within an op, you would define a resource in your Dagster repository and then access it within the body of your op using the context object. Here's an example based on the documentation: First, you would define the DuckDB resource:
python from dagster_duckdb import DuckDBResource  # Configure the DuckDB resource duckdb_resource = DuckDBResource(     database="path/to/my_duckdb_database.duckdb",  # required )
Then, you would define an op that uses this resource:
python from dagster import op  @op(required_resource_keys={"duckdb"}) def my_duckdb_op(context):     # Get a connection from the DuckDB resource     with context.resources.duckdb.get_connection() as conn:         # conn is a duckdb.DuckDBPyConnection         # You can now use this connection to execute SQL queries         conn.execute("YOUR SQL QUERY HERE")
Finally, you would include the resource in your repository definition:
python from dagster import job, repository  @job(resource_defs={"duckdb": duckdb_resource}) def my_job():     my_duckdb_op()  @repository def my_repository():     return [my_job]
In this setup, the
op requires the
resource, which is provided to the job through the
argument. When the op is executed, it uses the DuckDB connection to perform database operations. Please note that while this example shows how to use a DuckDB resource within an op, we encourage the use of assets when working with data in Dagster, as they provide a more declarative and modular approach to data orchestration.