Having trouble trying to understand how I could us...
# ask-community
g
Having trouble trying to understand how I could use assets in this new ETL pipeline I'm working on. And was wondering if I could get some advice on how to approach the problem. Essentially, we have an ETL for a multi tenant app where the transforms will be the same but the underlying source and destination will be slightly different. The process is basically 1. Download .tgz file 2. extract tsvs, split larger files upload to adls 3. run a spark job in databricks (currently spark job inserts into a sql database, and runs a merge sproc as the last step) I can wrap my head around how I would configure these jobs using an op for each step and currently working on doing just that. But would also like to use the new asset stuff if applicable. I'm new to the data engineering process so any tips or advice on how to approach these problems would be awesome. Thanks!
s
Two thoughts: 1. you can use `AssetsDefinition.from_graph(...)`to take an existing op/graph definition and turn it into a single asset. When you materialize the asset, it will kick off the graph with all the steps in it. 2. you can simply turn your ops into steps in the asset function, sort of like a "fat asset" approach, where it's got a lot of logic in it that isn't split over individual ops. in practice, the only cases I've seen where a process really needs to be split into separate ops is when you need to retry steps or do a dynamic fan out. in either case, one pattern i really like is the asset factory pattern where you define a generator function that returns an asset definition.
Copy code
def create_cool_asset(asset_name, **kwargs):
    
   @asset(name=asset_name, **kwargs)
   def _generated_asset(context):
       # do some stuff
       return some_stuff
   
   return _generated_asset
🌈 1
g
Interesting I ideas. I like the asset factory. Was also reading about partitions that may also solve my problem. If I can have two part partitions
Would need to partition by our tenant and date.
Thanks for the ideas!
🌈 1
After thinking about assets a little more and how we are gonna approach this new ETL. I was able to set up a lot of new process using assets. Trying to figure out how to beat make it a multi tenant solution.
👌 1