https://dagster.io/ logo
#ask-community
Title
# ask-community
g

George Pearse

06/30/2022, 1:46 PM
Is there a way to support an additional custom step for backfills (e.g. truncate the SDA)? My pipelines are idempotent but it's still a bit easier to track the process if I empty it all to start with instead of overwriting partition at a time. Will accept design advice if this is silly of me.
c

chris

06/30/2022, 7:55 PM
do you mean like wiping the existing assets for each partition?
g

George Pearse

06/30/2022, 8:55 PM
Yeah just truncate and load style.
c

chris

06/30/2022, 9:11 PM
https://docs.dagster.io/_apidocs/cli#dagster-asset we have the
dagster asset wipe
CLI which will remove the dagster logs of the assets, but if you're talking about deleting the actual code artifacts for old runs, I think you're left to including additional steps at the beginning of runs to do so.
g

George Pearse

06/30/2022, 9:12 PM
Well I'm most after a way to customise a backfill for an asset
c

chris

06/30/2022, 10:06 PM
I don't think there's a way to explicitly add custom additional functionality for backfills only, if I'm correct in that description. Will poll the team for advice on your use case though.
p

prha

06/30/2022, 10:28 PM
The workaround that I can think of is that we currently expose tags (including backfill tags) off of the run on the context. You might be able to add a truncation step to your job and check to see if the currently executing run is a backfill run.
Copy code
@op
def truncate_something(context):
    if not context.pipeline_run.tags.get("dagster/backfill):
        # do nothing
    else:
        # truncate
s

sandy

06/30/2022, 10:37 PM
@George Pearse would you ideally do the entire backfill in a single step, instead of a step per partition?
g

George Pearse

07/01/2022, 7:57 AM
Hey @sandythat is exactly the sort of thing I'm thinking. For backfills I'm likely to want to use a different loading process that would be more optimised for the volume of data + if the query to select the time window is slow, just not using it could probably a save a fair chunk of time? These are all design considerations with trade offs though.
@prha workaround doesn't look horribly work around ish, not sure how it'd fit into my workflow with SDAs though, and get the 'right' metadata output to the Dagit UI to properly represent what I've done
s

sandy

07/01/2022, 3:47 PM
@George Pearse - that makes sense. We've built the internals with an eye towards eventually enable this - we pass around asset partition ranges instead of single partition keys. Here's an issue to track this: https://github.com/dagster-io/dagster/issues/8706.
❤️ 1
15 Views