https://dagster.io/ logo
#ask-community
Title
# ask-community
h

Harrison Conlin

03/16/2023, 11:24 AM
hi! loving the dynamic partitions feature. I'm working with web APIs to do some governance and oversee the use of the business intelligence platform my team manages at my day job. There are a mix of API calls, some will return all items with basic metadata (e.g. get all the reports in the organisation) but other times I need to query per item (e.g. get the developers for a report ID), As such I've got an asset
reports
which gets all the reports + basic metadata and updates the dynamic partitions for the individual asset
report
.
report
depends on
reports
and I then materialise all the
report
partitions which gets its metadata entry. It's a bit slow and painful. Ideally I'd get of the
reports
asset but I want to keep IO managers, so my plan was to move the API call into a job, loop through the results, create a
OutputContext
via
dagster.build_output_context()
for each report, pass that to the IO managers
handle_output
function and fire off `AssetMaterialization`s. Happy times I was hoping but as build_output_context() doesn't create a step context,
OutputContext.has_asset_partitions
fails. Admittedly I am going down an ugly route but can you see any alternatives?
c

claire

03/16/2023, 11:59 PM
Hi Harrison. Is there a reason why you are explicitly outputting asset materializations? Wondering if it's possible instead for you to: • use a sensor to query the API • update the partitions per report asset • kick off a run request for each new partition
h

Harrison Conlin

03/18/2023, 1:40 AM
Hi @claire, I was explicitly outputting asset materialization as the initial API call has all the data I need and due to some aggressive rate limiting, I can't necessarily afford to launch a new request for every new partition. However I like the sensor idea, would it be un-dagsteric (think pythonic but dagster) to save the output of the API call to a temporary directory and have the report asset read from it, if it exists. That way I can have my report asset call the GetGroup api when needed but if the results of GetGroups is available, it can use that
c

claire

03/20/2023, 9:30 PM
Ahhh I see. I think generally saving the output of the API call to a temp dir may be tricky as you'll also have to find a way to delete the contents after all of the runs conclude. Thinking about this more, I think a cleaner way to do this would be similar to what you initially implemented: • define an unpartitioned
reports
asset that queries for the initial API call, yielding all the data you need as output and creating all the dynamic partitions you need via
context.instance.add_dynamic_partitions(...)
• have each
report
asset with its own dynamic partitions def depend on the
reports
asset • in a schedule, update the
reports
asset as frequently as desired based on the rate limiting • in a sensor, check whenever the
reports
asset is materialized, and then kick off a run request for each different
report
asset This approach I think is cleaner and will allow you to update all of the downstream
report
assets after you update
reports
automatically. And you'll be able to load the result of the latest API call in each
report
asset.
👍 1
h

Harrison Conlin

03/22/2023, 5:35 AM
yeah, I was thinking about it more away from the computer and I think you're right. I think part of me was just trying to find a way to reduce the penalty that is spinning up a new process for each run.
3 Views