https://dagster.io/ logo
Title
g

geoHeil

06/06/2022, 6:19 AM
I recently had a longer and not fully completed discussion around dynamic partitions in #dagster-support. In particular the idea is based around state derived from a recent asset materialization. Think about I.em a timestamp like last update date or so. If i for whatever reason will need to back load the data I would need to delete an old precious materialization. How can I achieve this in cloud?
j

jordan

06/06/2022, 4:42 PM
do you have a link to the original discussion?
s

sandy

06/06/2022, 10:03 PM
@geoHeil do you have a working version of this in open source? Is there a reason that you think it wouldn't work in Cloud?
g

geoHeil

06/07/2022, 3:45 AM
no. also not
I mean: yes I can go the the postgres and delete stuff ... but I think this is not ideal
Also: https://dagster.slack.com/archives/C01U954MEER/p1654525370028899 I think solving this as a configuration from the pipeline would be best
Though, I do not know how quick you will fix/change the two mentioned issues. As a workaround I would still be keen to learn how to delete something in cloud.
s

sandy

06/07/2022, 3:18 PM
I see - in general we recommend not deleting history because it can make it difficult to reason about what happened. I think we should try to figure out whether there's a different way to accomplish the same goal e.g. using
AssetObservations
. You've explained pieces of your goal to me in the past, but would you mind explaining it from the top?
g

geoHeil

06/07/2022, 4:48 PM
Sure: When working with a dynamic partitioning I prefer the term stateful partitions. The intention is that for example a data source does not historize its data (assume a table or spreadsheet) but makes some metadata available when it was last updated (as a timestamp). The pipeline should: 1. if the asset never was materialized ingest the current data 2. if the asset already was materialized ingest only new data i.e. data where the update timestamp is larger than the last seen one. This is the stateful part. I have so far attached the last update timestamp as a metadata entry to the asset observation. 3. however, in some circumstances where perhaps I need to re-trigger the loading of the current data from the source. Normally when dagster is using a cursor that cursor can be set (and overwritten as far as I know). In my case here, I would need to modify (delete) the last asset materialization such taht (1,2) start to work again or alternatiely provide some override mechanism (from the configuration of the asset).
The points discussed here https://dagster.slack.com/archives/C01U954MEER/p1654525370028899 regarding configuration handling would greatly simplify this case. I.e. the config override. But from what I have read and heard also be appreciated by others.
👍 1
s

sandy

06/07/2022, 8:46 PM
Got it - thanks for the context. That makes a lot of sense. Have you considered just using the Materialize button on an asset and/or the Launchpad for a job to kick off a run? Is that what you're trying to use configuration for?
g

geoHeil

06/08/2022, 3:54 AM
yes but this also works as discussed in the other thread. But is inconvenient due to dagit not scaffolding the configuration including the default values.
s

sandy

06/08/2022, 3:15 PM
got it - if I understand correctly, what's annoying is that any time you would want to kick off one of these runs that, you would need to type out the following:
ops:
  my_asset:
    config:
      ignore_last_materialization: True
is that right? if so, we could provide the ability to provide default config when constructing an asset job
ignore_last_materialization_job = AssetGroup.build_job(name=..., selection=..., default_config={"ops": {"my_asset": {"config": {"ignore_last_materialization": True}}}})
would that be helpful for you?
g

geoHeil

06/08/2022, 3:15 PM
yes
ideally the scaffold would fill this
currently when a default value is provided (and no value is missing) it scaffolds '{}' only an empty one
but if the defaults need to be changed the user does not necessarily knows the structure, only knows that some knobs/settings need to be changed.
s

sandy

06/08/2022, 4:59 PM
that makes sense. I think we can provide something like the above within the next couple releases
😛artydagster: 1
❤️ 1
g

geoHeil

06/08/2022, 5:58 PM
please can you share an Issue to track this.
❤️ 1
s

sandy

06/08/2022, 6:29 PM