Hi, Are there any solid examples of how to use th...
# ask-community
t
Hi, Are there any solid examples of how to use the aws library (or any of the libraries really). In particular, the
file_handle_to_s3
op. I am doing an evaluation of the available orchestration tools out there, but finding the Dagster docs a little hard to follow / sparse. End is to create a quick POC DAG which loads some data into S3, then stages the data into snowflake and then run some DBT transformations.
dagster bot responded by community 2
🤖 1
s
I personally have found some of the pre-built
ops
hard to grok out of the box because they skip some steps. If you're new I would recommend learning about using
resources
, which are a really useful abstraction, but writing ops that script things out more or less how you normally would. For example, here's some pseudo code for how i woudl tackle your problem. You have 2-3 external systems, so in your ops, you would reference them as resources to simplify working with them. At job runtime, you could inject different configurations for those resources:
Copy code
@op(config_schema={"s3_destination": str, "my_filepath": str})
def upload_file_to_s3(context):
   s3 = boto3.client("s3")
   result = s3.upload_file(context.op_config["my_filepath"], context.op_config["s3_destination"])
   return context.op_config["s3_destination"]

@op(required_resource_keys: {"snowflake"})
def stage_snowflake(context, s3_destination):

    some_query = f"blah blah with {s3_destination}"
    with context.resources.snowflake.get_connection() as conn:
      conn.execute_query(some_query)
    return "some_dbt_cloud_job_id"

@op(required_resource_keys: {"dbt_cloud"}):
def run_dbt(context, run_id):
    context.resources.dbt_cloud.run_job(run_id)
then, at runtime, you reference the actual resources that are going to be used in those ops
Copy code
@job(resource_defs={"snowflake": snowflake_resource.configured(...), "dbt_cloud": dbt_cloud_resource.configured(..)})
def run_full_pipeline():
    s3_path = upload_file_to_s3()
    dbt_cloud_job_to_run = stage_snowflake(s3_path)
    run_dbt(dbt_cloud_job_to_run)
t
Hi Stephen, Thanks for the reply 🙂 Your code snippet makes total sense, and is very easy to read and understand. Which is great. So in your experience, it’s not worth leveraging the built-in ops (aws, snowflake etc)? I come from an Airflow background, so have traditionally relied on lots of the community Hooks and Operators to avoid having to reinvent the wheel for common operations. Looking at the Dagster repo, it looked like I could do the same.
s
yeah, you totally can! and I do so for a few operations -- for example, the dbt cloud operator has some really nice functionality to automatically generate asset metadata. im suggesting the simpler approach more from a learning standpoint -- there is a lot going on in the dagster framework, and i find
resources
to be an easier entrypoint than the pre-built
ops
, because the ops can include lots of concepts in a single place
t
Thanks, Stephen. Makes sense. Do you know if there are plans to beef up the documentation at all? I have been evaluating Dagster alongside Prefect, and I was able to get up and running very quickly with Prefect. Have struggled to get a good birds eye view of Dagster. Or is it more of a matter of dumpster diving through the code?
s
yeah, there's a bit of both. i think it depends on what part of the documentation you're talking about -- i found the deployment docs quite useful, and the integration docs (dagster_snowflake, etc.) the most lacking. i do know that documentation and technical resources are a priority for the team, but you're also going to be doing some learning on your own. if you decide you want to invest time in it post-decision, id be happy to show you how we set up our repos, etc.
t
Interesting, as I came to the exact same conclusion! Deployment docs were great
Thanks for sparing your time to give me some insights into your experience, greatly appreciated
data party 1
s
Hi Todd, just wanted to chime in as an Elementl engineer here. We’re well aware that some of our docs, particularly around integrations, are quite thin. Over the next 2 months we’re prioritizing work in documentation and other aspects of Dagster where improvements can lead to decreased ramp-up time. We appreciate your feedback, and you should feel free to post more threads in our support channel if you have other questions while evaluating Dagster.