Philippe Laflamme
01/27/2023, 5:11 AM%Y/%m/%d.parquet
style for daily partitions and %Y/%m/%d/%H.parquet
for hourly. From my understanding, I can use the fmt
argument on ParitionDefinition
or I can configure a custom IO manager to do this.
If I use fmt
, then users are exposed to how files are organized on disk (in dagit
) which is not what I want: users should only care about “logical” partitions, not files on disk.
If I use a custom IO manager, then it’s somewhat tedious since I have to define a separate one for each style of file organization on disk; which defies the purpose of having separate abstractions.
It’s also somewhat brittle since the IO manager implementation receives partitions as string keys, which are already formatted (using the fmt
argument on PartitionDefinition
). So the partition keys have to be parsed into `date`/`datetime` and then re-formated into a different style.
Am I missing something here? What’s the approach for having finer control over how asset files are organized on disk without exposing this to dagit
users?Peter Davidson
01/27/2023, 8:03 AMPhilippe Laflamme
01/27/2023, 2:48 PMfmt
. So you have to do something like this
datetime.strptime(partition_key, context.asset_partitions_def.fmt)
Which seems brittle to me.
I’ll take a look at that second option you’re suggesting. That seems like a better way to change the IO manager’s behaviour. Thanks!DbIOManager
deals with figuring out which column to use to update / delete partitions from a table; so that was a good source for copy-pasting some code. Thanks again!