Hi Is there any particular reason why the `context instance dagster #ask-community

Hi! Is there any particular reason why the `contex...

Jordan

04/26/2023, 5:56 PM

Hi! Is there any particular reason why the

context.instance.add_dynamic_partitions

method can add multiple keys while the

context.instance.delete_dynamic_partition

method deletes a single key? Maybe

context.instance.delete_dynamic_partitions

could be added

claire

04/26/2023, 6:12 PM

Hi Jordan, yep, we've talked about this too and I agree it would be good to add. Curious--where are you adding/deleting partitions from, is it within a sensor or asset/op?

claire

04/26/2023, 6:13 PM

If it's within a sensor, you could do something like:

Copy code

return SensorResult(dynamic_partitions_requests=[dynamic_partitions_def.build_delete_request(list_of_partition_keys)])

which accepts a list of partition keys

Jordan

04/27/2023, 8:28 PM

Thanks! Yes I use these functions in an asset because I build my partitioning by querying a DB (the query time is sometimes long). This DB fluctuates very little, so I want to update my partitioning every 24 hours and have the possibility to manually update the partitioning from an asset to have more flexibility. In addition to updating my partitioning, I would like the asset to be able to take advantage of the DB call to create a resource that I could use in any asset (even non-partitioned assets for example). Currently I am using a csv file as an intermediary. Do you see another way that would make more use of Dagster concepts to do this? I have tested

Pythonic resources

but I feel like a DB call is made for each run. My current solution with csv file:

Copy code

@asset
def synchronyze(context):
    df = get_df_with_query()
 
    # Update partitioning with df
    ...
 
    df.to_csv(path)
 
@asset
def other_asset(context):
    ...
    df = pd.read_csv(path)
    ...

claire

04/28/2023, 3:06 PM

Hey Jordan--unfortunately resources are constructed once per process. This means that they are constructed in each asset/op step (assuming you are using the multiprocess executor), so a resource can't be initialized to be used globally. Another option would be to yield the dataframe as an output to

synchronyze

, and have

@other_asset

be downstream of

synchronyze

. This would allow all downstream assets to load the latest output of synchronize as an input instead of re-querying the database.

sandy

04/28/2023, 10:18 PM

@Jordan - looking at your code example, neither of those assets are dynamically partitioned. Is there a third asset that's dynamically partitioned? What's the relationship between the asset that you're calling

add_dynamic_partitions

from and the asset that's dynamically partitioned?

125 Views

Open in Slack

Previous Next