In general I feel dagster features the concept of ...
# dagster-feedback
w
In general I feel dagster features the concept of
asset
but itโ€™s strange schedules do not support native asset but we have to convert them to jobs. Sensors recently added support for asset, shall we do the same for schedules?
s
I've seen a lot of our users start doing:
Copy code
from dagster import define_asset_job, ScheduleDefinition, AssetSelection

schedule = ScheduleDefinition(
   name = "run_asset_group",
   job = define_asset_job(AssetSelection.groups("my_group"),
   cron_schedule = "..."
)
I don't love the asset selection thing, but it does make it pretty lightweight to spin up a custom asset schedule
s
hey @William - I recently filed an issue to track adding this: https://github.com/dagster-io/dagster/issues/11055
@Stephen Bailey any particular reasons you don't love the asset selection thing?
s
I don't know how to use it ๐Ÿ˜‚
honestly have tried learning it 5 or 6 times and i can never get it to come out right, and i haven't found any docs on how it works besides these examples
all of our asset keys are lists, and i think that plays into it. we have also historically been multi-repo
s
Hmm good point, lists are poorly documented. I'll fix that. Here's how you do it:
Copy code
AssetSelection.assets(*my_assets_list)
s
if i have two assets:
Copy code
asset_a_key = AssetKey(["snowflake", "db", "schema", "table_a"])
asset_b_key = AssetKey(["snowflake", "db", "schema", "table_b"])
how do i select those two assets?
s
fwiw @Stephen Bailey my experience matches yours. Trying to select assets (esp with schemas as prefixes) is always a PITA I am loathe to do often @sandy when I use dbt to load assets I don't have a python object to reference, so I have to do weird things like
Copy code
AssetSelection.keys([AssetKey["schema", "dbt_model"],AssetKey["schema", "dbt_model2"]) | AssetSelection.assets(python_asset_object)
And I never get it right the first, second, or third time. And I never remember if keys or assets is the string or the object. And whether or not the arguments are a list or a list of lists. The IDE tries to help but the type hint is basically "thing coercible to keys" and IDK what thing or coerce means necessarily in this context The
upstream
,
downstream
stuff is cool
plus2 2
And, similar to Stephen, I find this selection stuff lacking in docs considering its used all over for jobs and sensors
s
@Sean Lopp - anywhere in particular you would expect to see more docs for this?
s
Maybe worth its own section in the assets concept page? BTW Stephen your use case could be:
Copy code
asset_a_key = AssetKey(["snowflake", "db", "schema", "table_a"])
asset_b_key = AssetKey(["snowflake", "db", "schema", "table_b"])

AssetSelection.assets(asset_a_key, asset_b_key) 

#OR 

AssetSelection.assets(asset_a_key) |
AssetSelection.assets(asset_a_key)
Note that in this context
&
is restrictive
|
or is additive You could also probably do some invocation of
AssetSelection.keys
though I'd have to play around to see if I could figure out whether keys is a list of AssetKeys or a list of lists or something else
๐Ÿคฏ 2
s
cc @sandy
s
hmmmmm. I would expect
AssetSelection.keys(*asset_key_list)
to work is it possible that one of your assets used to be partitioned, but is no longer partitioned? or vice versa?
s
hmm, that's possible. I was playing around with specifying different assets, but using the same sensor and just reloading it after changes. One of those was a partitioned asset. Are you thinking that the cursor may have become partitioned, then when it tried to submit new runs, it was parsing it incorrectly?
s
Exactly (though still a case we should handle more gracefully)
s
Ok, well, in that case I'll retract my asset selection complaint... but is there a way for me to test that it's working as expected? Something like
Copy code
def test_selection():
    selection = AssetSelection.group("my_group_with_four_assets")

    assert selection.asset_count == 4
o
One option would be
Copy code
from my_project import my_repo

def test_selection():
    selection = AssetSelection.groups("my_group")
    
    selected_keys = selection.resolve(my_repo.asset_graph)
    assert len(selected_keys) == 4
๐Ÿ™ 1
n
(Sorry to keep derailing the original question.) I'm also frequently having troubles finding the right way to select assets because I'm confused between all the different ways to call them, definitions, keys, asset groups, key prefix... I think it needs some simplification work. I also don't like using hard coded string based relations like
AssetSelection.groups("my_group")
because strings are hard to refactor automatically and if I make a typo, my IDE cannot detect, at coding time, that the object doesn't actually exist. So instead, I try to use things like:
asset_keys=[asset.key for asset in my_assets]
Similarly, I try to avoid those hard coded op/asset names in my job config, so it doesn't break if I decide to change my op/asset names. So I try to use things like:
my_op.name: { "config": {...}}
s
This is useful feedback. FYI, instead of doing
AssetSelection.keys(*[asset.key for asset in my_assets])
, you can do
AssetSelection.assets(*my_assets)
.
๐Ÿ‘ 1
n
This is a good example of the difficulty of discovering the best way to select assets. Maybe we could have only one parameter
asset_selection
instead of both
asset_selection
and
asset_keys
, where we can pass anything that clearly points to assets: AssetSelection, a list of Assets, a list of AssetKeys, or just strings, and then Dagster handles the interpretation of it.
s
Maybe we could have only one parameter
asset_selection
instead of both
asset_selection
and
asset_keys
, where we can pass anything that clearly points to assets: AssetSelection, a list of Assets, a list of AssetKeys, or just strings, and then Dagster handles the interpretation of it
Are there particular functions / classes that you're talking for this?
define_asset_job
?
n
My most recent use case is selecting the assets in an
@multi_asset_sensor()
, but my point is general for all signatures where assets need to be selected.
s
got it - yes, the
@multi_asset_sensor
params especially are a bit of a mess. I filed an issue for addressing this: https://github.com/dagster-io/dagster/issues/11558
I also filed https://github.com/dagster-io/dagster/issues/11559. I think that's a good suggestions
n
Thank you for the tickets! Continuing on the idea of removing abstractions, instead of having to call a specific class
AssetSelection
like you told me:
Copy code
@multi_asset_sensor(
    asset_selection=AssetSelection.assets(*my_asset_sequence),
    job=run_assets,
)
I would expect to be able to just pass my sequence of assets:
Copy code
@multi_asset_sensor(
    asset_selection=my_asset_sequence,
    job=run_assets,
)
Why does my sequence of assets need to go through another abstraction for a sensor to understand it ? If the sensor needs a specific attribute of the asset, it's its job to look for it.
s
The abstraction is useful because other functions use it (
define_asset_job
), and it does allow the mixing and matching selecting via
groups
and
assets
, which could be really useful for more complex cases. (It could also be useful, for example, to have tag matching in a future version.)
n
define_asset_job
could also allow to directly use a sequence of assets. Ok for having an additional abstraction if one requires additional complexity like mixing groups and keys in the selection. But I think the syntax for the default usage of providing a sequence of assets could be easier.
๐Ÿ‘ 1
s
here's a proposed change for multi_asset_sensor: https://github.com/dagster-io/dagster/pull/11567
๐Ÿ‘ 1
and here's one for define_asset_job: https://github.com/dagster-io/dagster/pull/11568
๐Ÿ‘ 1