hi all! had a quick question about the best way to...
# ask-community
h
hi all! had a quick question about the best way to think about using dagster to do efficient incremental updates. basically: i have a collection of N records. i write a transformation pipeline of various steps to load from a source, transform each record (they are independent), and then dump into a destination. the next time i run it, there could be some records added, some records removed, some records chaanged. i want to avoid doing transformation logic on records that did not change, but eventually have the destination updated to reflect the appropriate final step. are there any best examples or resources on how to accomplish this?
s
what's the order of magnitude of the number of records that you're dealing with? do you imagine a separate pipeline run for each record, or handling all the records together?
h
like O(1000) records
some steps may be faster batching, so had been imagining all together, but open to suggestions if one is easier than other
s
got it - I'd basically recommend implementing this incremental update logic within your
@asset
-decorated function this won't work very well with IO managers (because they generally assume you're overwriting an entire asset or partition), so you'd need to use this general way of using Dagster: https://docs.dagster.io/tutorial/managing-your-own-io#tutorial-part-7-managing-your-own-io
h
got it - thanks!