Having trouble trying to understand how I could use assets i dagster #ask-community

Having trouble trying to understand how I could us...

Geoffrey Greenleaf

09/15/2022, 9:30 PM

Having trouble trying to understand how I could use assets in this new ETL pipeline I'm working on. And was wondering if I could get some advice on how to approach the problem. Essentially, we have an ETL for a multi tenant app where the transforms will be the same but the underlying source and destination will be slightly different. The process is basically 1. Download .tgz file 2. extract tsvs, split larger files upload to adls 3. run a spark job in databricks (currently spark job inserts into a sql database, and runs a merge sproc as the last step) I can wrap my head around how I would configure these jobs using an op for each step and currently working on doing just that. But would also like to use the new asset stuff if applicable. I'm new to the data engineering process so any tips or advice on how to approach these problems would be awesome. Thanks!

Stephen Bailey

09/16/2022, 12:54 PM

Two thoughts: 1. you can use `AssetsDefinition.from_graph(...)`to take an existing op/graph definition and turn it into a single asset. When you materialize the asset, it will kick off the graph with all the steps in it. 2. you can simply turn your ops into steps in the asset function, sort of like a "fat asset" approach, where it's got a lot of logic in it that isn't split over individual ops. in practice, the only cases I've seen where a process really needs to be split into separate ops is when you need to retry steps or do a dynamic fan out. in either case, one pattern i really like is the asset factory pattern where you define a generator function that returns an asset definition.

Copy code

def create_cool_asset(asset_name, **kwargs):
    
   @asset(name=asset_name, **kwargs)
   def _generated_asset(context):
       # do some stuff
       return some_stuff
   
   return _generated_asset

🌈 1

Geoffrey Greenleaf

09/16/2022, 2:44 PM

Interesting I ideas. I like the asset factory. Was also reading about partitions that may also solve my problem. If I can have two part partitions

Geoffrey Greenleaf

09/16/2022, 2:46 PM

Would need to partition by our tenant and date.

Geoffrey Greenleaf

09/16/2022, 2:46 PM

Thanks for the ideas!

🌈 1

Geoffrey Greenleaf

09/16/2022, 10:12 PM

After thinking about assets a little more and how we are gonna approach this new ETL. I was able to set up a lot of new process using assets. Trying to figure out how to beat make it a multi tenant solution.

👌 1

5 Views

Open in Slack

Previous Next