https://dagster.io/ logo
Title
r

Robert Wade

10/27/2022, 9:55 PM
I’m defining a variety of assets with many of them requiring identical config params, such as an s3 bucket or db host value. I plan on creating a single job that materializes these assets using a set schedule. If I use a config yml for my job I will have to duplicate the config params in each asset’s config, so I am looking at using the make_values_resources so I can allow multiple assets to access those “common” values. So when I think about my resource file, would it look like this? Resources: common: config: s3_bucket: “some bucket” db_host: “some db” And would my assets all look similar, like this: @asset(required_resouce_keys={“common”}) def an_asset(context): s3bucket = context.resources.s3_bucket # etc Now what if I wanted to retrieve a non-common value in each asset, such as a filename (each asset would need to retrieve its own unique filename config value). Where would that configuration go? The docs kind of get close to explaining some of this but I don’t think they quite go all the way. Thanks in advance.
j

jamie

10/28/2022, 4:01 PM
hi @Robert Wade a follow up/clarification question: are the common config values you want to provide being used for other resources? for example, you have an
s3_bucket
in your code snippet, are you also using an s3 resource or IO manager? or are you writing code in your asset that uses the s3_bucket name? for specifying a non-common value for each asset, you can still provide that as op/asset config
ops:
   an_asset: 
       config:
          filename: "a_unique_file_name"
   other_asset:
       config:
           filename: "another_filename"
resources:
   <resource config>
within your asset you then access the value like this
@asset(
     config_schema={"filename": str}
)
def an_asset(context):
      filename = context.op_config["filename"]
note that in the config, the asset config is under the
ops
key. this is because assets are built on top of ops
r

Robert Wade

10/28/2022, 4:21 PM
Hi @jamie, thank you for your reply. My asset example wasn't complete - I should have documented that an_asset should take in an upstream asset (a file, perhaps) that an_asset will move to S3, thus making an_asset represent this s3-saved file. The reality is that we have this common pattern: query data from db and then save file to a file on s3 (for use by other downstream processes that aren't relevant to this discussion). So this 2-asset pipeline is repeated over and over. The first asset needs db connection information and the second asset needs s3 information. I am recreating each of these assets in slightly different forms (name, query params, etc) but will also require (nearly) identical config params for each asset. Prior to seeing your response, the solution we constructed is that when we read in our yml, we will programmatically copy the "common" config section to all other sections, thus creating a copy of each config value for use by each asset. This has the same affect as what you displayed in your yml example but doesn't require us to keep the separate ops and resources sections in the yml.