In general I feel dagster features the concept of `asset` bu dagster #dagster-feedback

In general I feel dagster features the concept of ...

William

12/16/2022, 2:17 AM

In general I feel dagster features the concept of

asset

but it’s strange schedules do not support native asset but we have to convert them to jobs. Sensors recently added support for asset, shall we do the same for schedules?

Stephen Bailey

12/16/2022, 2:44 AM

I've seen a lot of our users start doing:

Copy code

from dagster import define_asset_job, ScheduleDefinition, AssetSelection

schedule = ScheduleDefinition(
   name = "run_asset_group",
   job = define_asset_job(AssetSelection.groups("my_group"),
   cron_schedule = "..."
)

I don't love the asset selection thing, but it does make it pretty lightweight to spin up a custom asset schedule

sandy

12/16/2022, 4:20 PM

hey @William - I recently filed an issue to track adding this: https://github.com/dagster-io/dagster/issues/11055

sandy

12/16/2022, 4:21 PM

@Stephen Bailey any particular reasons you don't love the asset selection thing?

Stephen Bailey

12/16/2022, 4:22 PM

I don't know how to use it 😂

Stephen Bailey

12/16/2022, 4:23 PM

honestly have tried learning it 5 or 6 times and i can never get it to come out right, and i haven't found any docs on how it works besides these examples

Stephen Bailey

12/16/2022, 4:24 PM

all of our asset keys are lists, and i think that plays into it. we have also historically been multi-repo

sandy

12/16/2022, 4:28 PM

Hmm good point, lists are poorly documented. I'll fix that. Here's how you do it:

Copy code

AssetSelection.assets(*my_assets_list)

Stephen Bailey

12/16/2022, 7:33 PM

if i have two assets:

Copy code

asset_a_key = AssetKey(["snowflake", "db", "schema", "table_a"])
asset_b_key = AssetKey(["snowflake", "db", "schema", "table_b"])

how do i select those two assets?

Sean Lopp

12/16/2022, 9:21 PM

fwiw @Stephen Bailey my experience matches yours. Trying to select assets (esp with schemas as prefixes) is always a PITA I am loathe to do often @sandy when I use dbt to load assets I don't have a python object to reference, so I have to do weird things like

Copy code

AssetSelection.keys([AssetKey["schema", "dbt_model"],AssetKey["schema", "dbt_model2"]) | AssetSelection.assets(python_asset_object)

And I never get it right the first, second, or third time. And I never remember if keys or assets is the string or the object. And whether or not the arguments are a list or a list of lists. The IDE tries to help but the type hint is basically "thing coercible to keys" and IDK what thing or coerce means necessarily in this context The

upstream

downstream

stuff is cool

plus2 2

Sean Lopp

12/16/2022, 9:21 PM

And, similar to Stephen, I find this selection stuff lacking in docs considering its used all over for jobs and sensors

sandy

12/16/2022, 9:40 PM

@Sean Lopp - anywhere in particular you would expect to see more docs for this?

Sean Lopp

12/16/2022, 9:46 PM

Maybe worth its own section in the assets concept page? BTW Stephen your use case could be:

Copy code

asset_a_key = AssetKey(["snowflake", "db", "schema", "table_a"])
asset_b_key = AssetKey(["snowflake", "db", "schema", "table_b"])

AssetSelection.assets(asset_a_key, asset_b_key) 

#OR 

AssetSelection.assets(asset_a_key) |
AssetSelection.assets(asset_a_key)

Note that in this context

is restrictive

or is additive You could also probably do some invocation of

AssetSelection.keys

though I'd have to play around to see if I could figure out whether keys is a list of AssetKeys or a list of lists or something else

🤯 2

Stephen Bailey

01/03/2023, 3:40 PM

cc @sandy

sandy

01/03/2023, 9:49 PM

hmmmmm. I would expect

AssetSelection.keys(*asset_key_list)

to work is it possible that one of your assets used to be partitioned, but is no longer partitioned? or vice versa?

Stephen Bailey

01/04/2023, 2:18 AM

hmm, that's possible. I was playing around with specifying different assets, but using the same sensor and just reloading it after changes. One of those was a partitioned asset. Are you thinking that the cursor may have become partitioned, then when it tried to submit new runs, it was parsing it incorrectly?

sandy

01/04/2023, 5:22 AM

Exactly (though still a case we should handle more gracefully)

Stephen Bailey

01/04/2023, 1:57 PM

Ok, well, in that case I'll retract my asset selection complaint... but is there a way for me to test that it's working as expected? Something like

Copy code

def test_selection():
    selection = AssetSelection.group("my_group_with_four_assets")

    assert selection.asset_count == 4

owen

01/04/2023, 9:36 PM

One option would be

Copy code

from my_project import my_repo

def test_selection():
    selection = AssetSelection.groups("my_group")
    
    selected_keys = selection.resolve(my_repo.asset_graph)
    assert len(selected_keys) == 4

🙏 1

Nicolas Parot Alvarez

01/05/2023, 5:04 PM

(Sorry to keep derailing the original question.) I'm also frequently having troubles finding the right way to select assets because I'm confused between all the different ways to call them, definitions, keys, asset groups, key prefix... I think it needs some simplification work. I also don't like using hard coded string based relations like

AssetSelection.groups("my_group")

because strings are hard to refactor automatically and if I make a typo, my IDE cannot detect, at coding time, that the object doesn't actually exist. So instead, I try to use things like:

asset_keys=[asset.key for asset in my_assets]

Similarly, I try to avoid those hard coded op/asset names in my job config, so it doesn't break if I decide to change my op/asset names. So I try to use things like:

my_op.name: { "config": {...}}

sandy

01/05/2023, 6:06 PM

This is useful feedback. FYI, instead of doing

AssetSelection.keys(*[asset.key for asset in my_assets])

, you can do

AssetSelection.assets(*my_assets)

👍 1

Nicolas Parot Alvarez

01/06/2023, 4:05 PM

This is a good example of the difficulty of discovering the best way to select assets. Maybe we could have only one parameter

asset_selection

instead of both

asset_selection

and

asset_keys

, where we can pass anything that clearly points to assets: AssetSelection, a list of Assets, a list of AssetKeys, or just strings, and then Dagster handles the interpretation of it.

sandy

01/06/2023, 4:36 PM

Maybe we could have only one parameter
asset_selection
instead of both
asset_selection
and
asset_keys
, where we can pass anything that clearly points to assets: AssetSelection, a list of Assets, a list of AssetKeys, or just strings, and then Dagster handles the interpretation of it

Are there particular functions / classes that you're talking for this?

define_asset_job

Nicolas Parot Alvarez

01/06/2023, 6:08 PM

My most recent use case is selecting the assets in an

@multi_asset_sensor()

, but my point is general for all signatures where assets need to be selected.

sandy

01/06/2023, 6:31 PM

got it - yes, the

@multi_asset_sensor

params especially are a bit of a mess. I filed an issue for addressing this: https://github.com/dagster-io/dagster/issues/11558

sandy

01/06/2023, 6:32 PM

I also filed https://github.com/dagster-io/dagster/issues/11559. I think that's a good suggestions

Nicolas Parot Alvarez

01/06/2023, 7:16 PM

Thank you for the tickets! Continuing on the idea of removing abstractions, instead of having to call a specific class

AssetSelection

like you told me:

Copy code

@multi_asset_sensor(
    asset_selection=AssetSelection.assets(*my_asset_sequence),
    job=run_assets,
)

I would expect to be able to just pass my sequence of assets:

Copy code

@multi_asset_sensor(
    asset_selection=my_asset_sequence,
    job=run_assets,
)

Why does my sequence of assets need to go through another abstraction for a sensor to understand it ? If the sensor needs a specific attribute of the asset, it's its job to look for it.

Stephen Bailey

01/06/2023, 7:57 PM

The abstraction is useful because other functions use it (

define_asset_job

), and it does allow the mixing and matching selecting via

groups

and

assets

, which could be really useful for more complex cases. (It could also be useful, for example, to have tag matching in a future version.)

Nicolas Parot Alvarez

01/06/2023, 8:07 PM

define_asset_job

could also allow to directly use a sequence of assets. Ok for having an additional abstraction if one requires additional complexity like mixing groups and keys in the selection. But I think the syntax for the default usage of providing a sequence of assets could be easier.

👍 1

sandy

01/06/2023, 10:50 PM

here's a proposed change for multi_asset_sensor: https://github.com/dagster-io/dagster/pull/11567

👍 1

sandy

01/07/2023, 1:46 AM

and here's one for define_asset_job: https://github.com/dagster-io/dagster/pull/11568

👍 1

89 Views

Open in Slack

Previous Next