https://dagster.io/ logo
#ask-community
Title
# ask-community
j

Jacob Marcil

02/16/2024, 3:31 PM
I don’t know if it’s me but all of my usescases looks like this and I can’t find a good solution to this problem. If anyone have ideas on the best way to achieve this please post in this thread 🙂 1. I need to query an external system using an API 2. I need to query it using as my inputs, data that is store in a table in my data warehouse. a. The stakeholders add rows to that table and for each of those I need to fetch the data on the API. 3. I need to fetch new data each day for each of them. 4. If there’s a new row added to the table, I need to backfill all days for this specific data point.
So at first it seems easy, 1. I create a resource for my API Client 2. I use an Op to get the records and issue a requests for all of those 3. I create a daily partition. 4. I store that into my table each day, The issues here. 1. If I stay with a daily partition I have no way of knowing which rows need to get backfilled. 2. If I create a multipartition Definition I’ll endup with to many partition and that would probably break Dagster UI. (Tried with 60 partitions with hourly of 1 years once and dagster wasn’t able to load the UI because it was calculating partition ~500k partitions were created (24*60*365)) 3. If I store the data for all api calls into the same asset (Which from my point of view is what I should do because I don’t want to create an asset each time a new rows is added to the table and data is of the same nature) I won’t be able to store it using and IO Manager because if any of the external call fails for any of the rows, this would break everything and I would have lost all the queried data. 4. I can use
map
and
collect
and build a graph asset. This would allow me to create a separated process for each of the rows and add retries for each of them. The issue here is that if any of the call fail the collect won’t work and this would broke my pipeline. 5. If I use a graph asset and configure my last Op to store the data directly using my data warehouse resource and inserting the data directly instead of using the IO Manager, I won’t be able to create a graph asset since no data would be returned. 6. If I use a graph this process seems to be working, but I can’t have the concept of asset, and I don’t know how partition would work when I insert new records into the DB. So here I am lost in all of the way I could achieve this. It seems like a process everyone does, but I can’t figure it out 😞
z

Zach

02/16/2024, 5:03 PM
Yeah this one is a bit tricky. Seems like you'll have to make some compromises. I would split it into two different assets; I think I would do an daily partitioned graph-asset using map and collect for the rows that currently exist in the database, using op retries to help mitigate any failures for a given row. If you're really concerned about an op failing and can't just fix those with a re-run from failure or retries, then I'd have each op in the mapping step write directly to your database using a resource. Graph assets don't have to return any data, you can use assets to just model something conceptually. Although you could still use an IO manager for loading the data into a downstream step. For new rows that need to be backfilled I would model them as a separate dynamically-partitioned asset; each time a sensor detects a new row, it runs a backfill for that row
j

Jacob Marcil

02/16/2024, 6:40 PM
Ok so I’m not crazy it’s not that straight forward. Thank for the explanation. Seems like having each map write directly into the table makes more sense. I would lose the concept of
successfull
partition right? Since all my Map would technically mark the same partition as completed while inserting more rows into the table? For the Graph Asset part I was under that impression since if I change my working
@graph
to a
@graph_asset
without changing anything. I receive the error
@graph 'my_graph_asset' has unmapped output 'result'. Remove it or return a value from the appropriate op/graph invocation.
z

Zach

02/16/2024, 6:41 PM
Yeah you have to return something, but it doesn't have to be anything particularly meaningful
😅 1
Sometimes you just have to make the machine happy
😆 1
j

Jacob Marcil

02/16/2024, 6:42 PM
Ahahahah nice ok make sense.
Thank you for all your help again. Have a great friday afternoon.
🎉 1
z

Zach

02/16/2024, 6:50 PM
You too!