Todd de Quincey
05/24/2022, 11:09 AMfile_handle_to_s3
op.
I am doing an evaluation of the available orchestration tools out there, but finding the Dagster docs a little hard to follow / sparse.
End is to create a quick POC DAG which loads some data into S3, then stages the data into snowflake and then run some DBT transformations.Stephen Bailey
05/24/2022, 1:57 PMops
hard to grok out of the box because they skip some steps. If you're new I would recommend learning about using resources
, which are a really useful abstraction, but writing ops that script things out more or less how you normally would.
For example, here's some pseudo code for how i woudl tackle your problem. You have 2-3 external systems, so in your ops, you would reference them as resources to simplify working with them. At job runtime, you could inject different configurations for those resources:
@op(config_schema={"s3_destination": str, "my_filepath": str})
def upload_file_to_s3(context):
s3 = boto3.client("s3")
result = s3.upload_file(context.op_config["my_filepath"], context.op_config["s3_destination"])
return context.op_config["s3_destination"]
@op(required_resource_keys: {"snowflake"})
def stage_snowflake(context, s3_destination):
some_query = f"blah blah with {s3_destination}"
with context.resources.snowflake.get_connection() as conn:
conn.execute_query(some_query)
return "some_dbt_cloud_job_id"
@op(required_resource_keys: {"dbt_cloud"}):
def run_dbt(context, run_id):
context.resources.dbt_cloud.run_job(run_id)
then, at runtime, you reference the actual resources that are going to be used in those ops
@job(resource_defs={"snowflake": snowflake_resource.configured(...), "dbt_cloud": dbt_cloud_resource.configured(..)})
def run_full_pipeline():
s3_path = upload_file_to_s3()
dbt_cloud_job_to_run = stage_snowflake(s3_path)
run_dbt(dbt_cloud_job_to_run)
Todd de Quincey
05/24/2022, 2:04 PMStephen Bailey
05/24/2022, 2:13 PMresources
to be an easier entrypoint than the pre-built ops
, because the ops can include lots of concepts in a single placeTodd de Quincey
05/24/2022, 2:15 PMStephen Bailey
05/24/2022, 2:26 PMTodd de Quincey
05/24/2022, 2:26 PMTodd de Quincey
05/24/2022, 2:27 PMsean
05/25/2022, 5:08 PM