Hi Are there any solid examples of how to use the aws librar dagster #ask-community

Hi, Are there any solid examples of how to use th...

Todd de Quincey

05/24/2022, 11:09 AM

Hi, Are there any solid examples of how to use the aws library (or any of the libraries really). In particular, the

file_handle_to_s3

op. I am doing an evaluation of the available orchestration tools out there, but finding the Dagster docs a little hard to follow / sparse. End is to create a quick POC DAG which loads some data into S3, then stages the data into snowflake and then run some DBT transformations.

dagster bot responded by community 2

🤖 1

Stephen Bailey

05/24/2022, 1:57 PM

I personally have found some of the pre-built

ops

hard to grok out of the box because they skip some steps. If you're new I would recommend learning about using

resources

, which are a really useful abstraction, but writing ops that script things out more or less how you normally would. For example, here's some pseudo code for how i woudl tackle your problem. You have 2-3 external systems, so in your ops, you would reference them as resources to simplify working with them. At job runtime, you could inject different configurations for those resources:

Copy code

@op(config_schema={"s3_destination": str, "my_filepath": str})
def upload_file_to_s3(context):
   s3 = boto3.client("s3")
   result = s3.upload_file(context.op_config["my_filepath"], context.op_config["s3_destination"])
   return context.op_config["s3_destination"]

@op(required_resource_keys: {"snowflake"})
def stage_snowflake(context, s3_destination):

    some_query = f"blah blah with {s3_destination}"
    with context.resources.snowflake.get_connection() as conn:
      conn.execute_query(some_query)
    return "some_dbt_cloud_job_id"

@op(required_resource_keys: {"dbt_cloud"}):
def run_dbt(context, run_id):
    context.resources.dbt_cloud.run_job(run_id)

then, at runtime, you reference the actual resources that are going to be used in those ops

Copy code

@job(resource_defs={"snowflake": snowflake_resource.configured(...), "dbt_cloud": dbt_cloud_resource.configured(..)})
def run_full_pipeline():
    s3_path = upload_file_to_s3()
    dbt_cloud_job_to_run = stage_snowflake(s3_path)
    run_dbt(dbt_cloud_job_to_run)

Todd de Quincey

05/24/2022, 2:04 PM

Hi Stephen, Thanks for the reply 🙂 Your code snippet makes total sense, and is very easy to read and understand. Which is great. So in your experience, it’s not worth leveraging the built-in ops (aws, snowflake etc)? I come from an Airflow background, so have traditionally relied on lots of the community Hooks and Operators to avoid having to reinvent the wheel for common operations. Looking at the Dagster repo, it looked like I could do the same.

Stephen Bailey

05/24/2022, 2:13 PM

yeah, you totally can! and I do so for a few operations -- for example, the dbt cloud operator has some really nice functionality to automatically generate asset metadata. im suggesting the simpler approach more from a learning standpoint -- there is a lot going on in the dagster framework, and i find

resources

to be an easier entrypoint than the pre-built

ops

, because the ops can include lots of concepts in a single place

Todd de Quincey

05/24/2022, 2:15 PM

Thanks, Stephen. Makes sense. Do you know if there are plans to beef up the documentation at all? I have been evaluating Dagster alongside Prefect, and I was able to get up and running very quickly with Prefect. Have struggled to get a good birds eye view of Dagster. Or is it more of a matter of dumpster diving through the code?

Stephen Bailey

05/24/2022, 2:26 PM

yeah, there's a bit of both. i think it depends on what part of the documentation you're talking about -- i found the deployment docs quite useful, and the integration docs (dagster_snowflake, etc.) the most lacking. i do know that documentation and technical resources are a priority for the team, but you're also going to be doing some learning on your own. if you decide you want to invest time in it post-decision, id be happy to show you how we set up our repos, etc.

Todd de Quincey

05/24/2022, 2:26 PM

Interesting, as I came to the exact same conclusion! Deployment docs were great

Todd de Quincey

05/24/2022, 2:27 PM

Thanks for sparing your time to give me some insights into your experience, greatly appreciated

data party 1

sean

05/25/2022, 5:08 PM

Hi Todd, just wanted to chime in as an Elementl engineer here. We’re well aware that some of our docs, particularly around integrations, are quite thin. Over the next 2 months we’re prioritizing work in documentation and other aspects of Dagster where improvements can lead to decreased ramp-up time. We appreciate your feedback, and you should feel free to post more threads in our support channel if you have other questions while evaluating Dagster.

2 Views

Open in Slack

Previous Next