https://dagster.io/ logo
#dagster-support
Title
# dagster-support
e

Emmanuel Ogunwede

01/26/2023, 1:47 PM
Hi @sandy Good Morning! here again to ask about anti patterns! 😅 so I finally implemented a pipeline using assets.... quick question... By merely looking at this diagram would you say my use case for dagster is more suited for task based orchestration (i.e. using ops and creating a job out of them)? CONTEXT • The first step materializes the data from the api • The second step cleans the data and (explicitly) writes it to an s3 bucket, the second step only returns/ materializes an s3 URL (wrong use case of asset??) • The third step picks up the s3 url from the second step and basically stages the data on snowflake and eventually writes/copy the data into a Snowflake table I should also note that i'm using s3 IO manager (can that just be my datalake or I should actually be writing data myself into another s3 bucket as I'm currently doing in my second step?) Thanks in advance!
👀 1
🤖 1
s

sandy

01/26/2023, 3:36 PM
Hi Emmanuel - how do you decide what S3 bucket the second step is going to write to? If you know it ahead of time, I think it's a good use case for assets. If you decide it dynamically at runtime, ops are likely the best way to go
e

Emmanuel Ogunwede

01/26/2023, 3:43 PM
Hey Sandy, I know the bucket name ahead of time. I only generate the filename dynamically at runtime e.g.: • if the run is for
2022-12-01
• The file name will be
2022-12-01.parquet
• The s3 url will become
<s3://known_Bucket_Name/2022-12-01.parquet>
@sandy also, is it okay if an asset represents an S3 url like in my case? I guess a more general way of putting it is can a file path string be an asset or the file itself? (in some sense, they both represent the file I guess?) what are your thoughts?
s

sandy

01/26/2023, 11:02 PM
if you have a run for every day, I'd recommend using partitioned assets for this. have you had a chance to look at those before?
e

Emmanuel Ogunwede

01/27/2023, 1:49 AM
Yup, they're currently partitioned actually...just wanted to be sure I wasn't misusing assets here 🙌🏽
👍 1
s

sandy

01/27/2023, 1:57 AM
would something like this work?
Copy code
from dagster import asset, DailyPartitionsDefinition


partitions_def = DailyPartitionsDefinition(start_date="2020-01-01")


def _make_file_path(context, asset_name):
    return f"<s3://my_bucket_name/{asset_name}/{context.partition_key}>"


@asset(partitions_def=partitions_def)
def asset1(context) -> None:
    output_path = _make_file_path(context, "asset1")
    # write to S3


@asset(partitions_def=partitions_def, non_argument_deps={"asset1"})
def asset2(context) -> None:
    input_path = _make_file_path(context, "asset1")
    # read from S3
we usually recommend against
@asset
-decorated functions returning URLs and paths, because, for it to be a good use case for software-defined assets, normally the path is known ahead of time, so there's no need for the downstream step to find it out dynamically
🙏 1
e

Emmanuel Ogunwede

01/27/2023, 12:38 PM
yes, this makes perfect sense! the "private" function can pick up the actual context from within the asset's compute_fn. thank you!!!