Anyone have tips for how to use non-argument depen...
# ask-community
d
Anyone have tips for how to use non-argument dependencies more safely? 🧵
🤖 1
We have a lot of assets that directly execute Snowflake queries, e.g.
Copy code
@asset(
  non_argument_deps={"bar", "baz"}
  ...
)
def foo(context: OpExecutionContext) -> None:
  query = """
    CREATE OR REPLACE TABLE foo AS (
      SELECT * FROM bar JOIN baz on bar.id = baz.id
    )
  """
  context.resources.snowflake_dagster.execute_query(query, fetch_results=False)
Right now, human vigilance is the only thing that ensures that the
bar
and
baz
(and
foo
, for that matter) in the query match the listed dependencies. (And we recently had a bug because of our vigilance, as you might predict, wasn't foolproof)
Narrowly, I could imagine using variables for the deps and asset name (whether from the context or built myself)
Copy code
FOO_DEPS = {"bar", "baz"}
@asset(
  non_argument_deps={DEPS}
  ...
)
def foo(context: OpExecutionContext) -> None:
  query = """
    CREATE OR REPLACE TABLE {context.asset_name} AS (
      SELECT * FROM {context.deps.bar} as bar JOIN {FOO_DEPS.baz} as baz on bar.id = baz.id
    )
  """
  context.resources.snowflake_dagster.execute_query(query, fetch_results=False)
At least then you could try to never hard-code table names.
I could also imagine a super smart linter helping with this.
But this also has me wondering if there's a deeper structural issue.
We started out using dataframes, but we have a lot of simple-but-large assets that just took too long to load into and out of dataframes.
(A theme I saw mentioned a couple of times as I searched in this Slack, so I would guess it's on the radar)
z
You might also be able to just use the
asset_key
property of the assets you want to have as non-argument deps:
Copy code
from other_assets import assetBar, assetBaz

@asset(non_argument_deps={str(assetBar.asset_key), str(assetBaz.asset_key)})
...
d
or use actual assets and store the table name as the asset output (I'm doing this with directory names)
d
@Daniel Gafni, so you're saying instead of having the asset
foo
return None, have it return the string
"FOO"
, since that's the name of the table it's creating?
And thanks, @Zach - I think your version is strictly better than my idea (since I'm not having to risk typing out the asset's name again)
d
Yeah I'm suggesting that. Not sure if it fits your case tho.
d
I'm not sure either 🙂, but that's an interesting idea. Will ponder!
j
hey @Dan Meyer this is something we are actively working on right now, so i dont have a complete solution to all of the issues raised here, but we do have a couple things that should make some of this a bit easier 1. We recently added a
deps
parameter to
@asset
to replace
non_argument_deps
- deps can take other assets as members of the list so you dont need to rely on string matching
Copy code
@asset 
def foo:
   ...

@asset(
   deps=[foo]
)
def bar():
   ...
2. We’re introducing an
asset_key
parameter on the context to get the name of the current asset. one of those dumb things we all thought existed and then turns out it didnt facepalm it should get released next week. Note that
context.asset_key
will return an
AssetKey
type, so depending on if you use key prefixes, you may have to do some manipulation to get from the
AssetKey
to the name of the table
Copy code
@asset 
def foo:
   ...

@asset(
   deps=[foo]
)
def bar(context):
   query = """
    CREATE OR REPLACE TABLE {context.asset_key.to_user_string()} ...
  """
3. For using table names in your queries, we don’t have a great solution for this right now. It’s on our minds, but we dont have a cohesive solution to implement yet. You could experiment with using the
.asset_key
syntax within your asset functions thought. I haven’t tried this myself yet, so do some experimenting first
Copy code
@asset 
def foo:
   ...

@asset(
   deps=[foo]
)
def bar(context):
   query = """
    CREATE OR REPLACE TABLE {context.asset_key.to_user_string()} AS (
      SELECT * FROM {foo.asset_key.to_user_string()} 
    )
  """
Daniels recommendation of passing around the table name as the output of an asset is also a perfectly valid solution
🙏 1
🔥 3
🙏🏽 2
s
I'm not sure what your use case is, but I have a bunch of URLs that I'm loading to our data warehouse. I use a small constructor function that has the name logic built in. Something like:
Copy code
def asset_factory(spec):
    @asset(name=spec['name']+"base")
    def _asset1():
        ...

    @asset(name=spec['name']+"stage", deps=[spec['name']+"base"])
    def _asset2():
        ...

    @asset(name = ...)
    def _asset3():
        ...

    return [_asset1, _asset2, _asset3]
From there, you can guarantee the name matches, etc. For hundreds of URLs that I want to pull into the data warehouse, I can list the URLs and then construct all the assets programmatically.
d
Thanks @jamie - those are exactly the pieces I was hoping to find to build some super basic guardrails. Those guardrails probably get us to "good enough for now", and then as y'all add more stuff in this area, we'll evolve with that.
@Sean Davis I don't think that fits our current use case, but I can imagine stealing that idea for some upcoming work - thanks! (We're going to be pulling data from a slowly-growing set of our users' CRMs, so this could be a way for us to make assets out of each of them)