Bruno Grande08/05/2022, 5:42 PM
in a dynamically partitioned non-asset job, but I lose the useful data lineage of SDAs.
owen08/05/2022, 6:38 PM
that actually reads the set of partition keys from a database, i.e.
You could have a separate job that updates the contents of the database so that it stays somewhat up to date. This isn't really a recommended pattern, because it generally means that every time you import this code, you'll need to make a call to a database, so you'd want to make sure that this was a pretty fast call (and probably cache the result).
def get_partitions_def(): all_filenames = call_to_database() return StaticPartitionsDefinition(all_filenames) my_partitions = get_partitions_def()
Bruno Grande08/05/2022, 7:07 PM
owen08/05/2022, 8:43 PM
Bruno Grande08/05/2022, 9:08 PM
that downloads the file from the data repository • An asset for the set of processed outputs from all manifest chunks ◦ This would be backed by a dynamic graph, which would handle the splitting of the manifest and the submission of remote processing jobs ◦ This asset would depend on the first one I think this would help achieve what I’m looking for because if the manifest is updated, then I would want to re-materialize the second asset. Do you know if there’s an easy way to use the file checksum (e.g. MD5) to determine whether it’s “out-of-date”? Or does Dagster only currently determine “out-of-dateness” based on whether upstream assets have been re-materialized or not. I wonder if I could use the asset definition’s metadata for this. 🤔
owen08/05/2022, 9:50 PM
). This method will add it to a particular materialization's metadata. You could then query this value in the next run of the asset with
. This event should have that metadata on it somewhere.
Ben Gatewood08/06/2022, 5:33 AM