Is it a good practice to use asset as an op in graph instead dagster #ask-community

Is it a good practice to use asset as an op in gra...

Son Giang

05/09/2022, 2:46 PM

Is it a good practice to use asset as an op in graph instead of using AssetGroup? I want to use ops parallel with assets but also want to bind assets to some ops.

owen

05/09/2022, 5:31 PM

Hi @Son Giang ! In last week's release, we added preliminary support for graph-backed assets which is likely the best pattern to use for workflows like this. You can do something like:

Copy code

# define some complex computation to produce an asset
@graph
def foo_asset():
   x = regular_op(regular_op(...))
   return create_foo_op(x)

my_asset_group = AssetGroup([AssetsDefinition.from_graph(foo_asset)])

👍 1

sandy

05/09/2022, 7:57 PM

@Son Giang - would you be open to telling us a little bit more about what you're aiming to accomplish? E.g. do you want to execute ops after assets or before? What kinds of things are the ops doing?

👍 1

Son Giang

05/10/2022, 2:05 AM

@sandy would you be open to telling us a little bit more about what you’re aiming to accomplish? I wrote the pipeline in ops/graphs first. Then I realize that asset helps me to keep track some of the external entities. However, not all ops in my graph has to be assets. So I want to migrate only several ops to assets, the others still the same ops. The second thing that make me want to use graphs is because it easy for new people (especially non-engineer like data analyst) to come and understand how the pipeline run. The abstraction of asset is too complex and ambiguous somehow (take me days to understand it). E.g. do you want to execute ops after assets or before? I would say both. What kinds of things are the ops doing? Just normal task that don’t produce/manipulate the external entities. I don’t think the graph should contains all assets that linked to external entities. So I think that the ability to use both ops and assets in graph is a good thing.

Son Giang

05/10/2022, 2:07 AM

@owen it seems cool. Can I do something like this?

Copy code

# define some complex computation to produce an asset
@graph
def foo_asset():
   x = op1()
   y = asset1(x)
   z = op2(y)
   return asset2(z)

my_asset_group = AssetGroup([AssetsDefinition.from_graph(foo_asset)])

And can you provide the version of the release? Thank you

Son Giang

05/10/2022, 7:32 AM

@owen I tried that and seems like it cannot deal with job with PartitionedConfig? Is there something that I’m missing? It seems like

build_job

function does not provide the config parameter.

owen

05/10/2022, 9:03 PM

hi @Son Giang ! For your first question, if you're creating multiple assets from a single graph, they must all be outputs of that graph. So that would result in a slight refactor:

Copy code

@graph(out={"asset1": GraphOut(), "asset2": GraphOut()})
def foo_asset():
   x = op1()
   y = asset1(x)
   z = op2(y)
   return {"asset1": y, "asset2": asset2(z)}

👍 1

owen

05/10/2022, 9:05 PM

for your second question, unfortunately you're correct that graph-backed assets don't currently support partitioning (cc @sandy if you have thoughts on if that would be complicated to support)

👍 1

sandy

05/11/2022, 12:32 AM

I'm working on a change that should be able to land in thursday's release that will make this more possible: instead of passing in a partitioned config, you'd supply a partitions definition and use the `context.partition_key`to access it what this could look like:

Copy code

@op
    def my_op(context):
        assert context.partition_key == "a"

    @graph
    def my_graph():
        return my_op()

    assets_def = AssetsDefinition.from_graph(
        graph_def=my_graph, partitions_def=StaticPartitionsDefinition(["a", "b", "c"])
    )

    AssetGroup([assets_def]).build_job("abc").execute_in_process(partition_key="a")

👍 1

2 Views

Open in Slack

Previous Next