https://dagster.io/ logo
#ask-community
Title
# ask-community
p

Philippe Laflamme

01/27/2023, 5:11 AM
I’d like to have control over how assets are organized on disk, e.g.: I’d like to be able to have a
%Y/%m/%d.parquet
style for daily partitions and
%Y/%m/%d/%H.parquet
for hourly. From my understanding, I can use the
fmt
argument on
ParitionDefinition
or I can configure a custom IO manager to do this. If I use
fmt
, then users are exposed to how files are organized on disk (in
dagit
) which is not what I want: users should only care about “logical” partitions, not files on disk. If I use a custom IO manager, then it’s somewhat tedious since I have to define a separate one for each style of file organization on disk; which defies the purpose of having separate abstractions. It’s also somewhat brittle since the IO manager implementation receives partitions as string keys, which are already formatted (using the
fmt
argument on
PartitionDefinition
). So the partition keys have to be parsed into `date`/`datetime` and then re-formated into a different style. Am I missing something here? What’s the approach for having finer control over how asset files are organized on disk without exposing this to
dagit
users?
dagster bot responded by community 1
p

Peter Davidson

01/27/2023, 8:03 AM
can't you do this in a single IOManager, with the _get_path interpreting the partition key to know whether it is dealing with daily or hourly partitions? So it wouldn't be a separate IOManager per partition type, but a few extra lines of logic in the single IOManager?
you could, also, add a config schema to the assets which passes a parameter to the IOManager which definesa which set of filepath logic to apply for that asset
p

Philippe Laflamme

01/27/2023, 2:48 PM
Right, I did that first solution you’re suggesting. But I find it brittle since it’s based on interpreting the partition key as a string which was formatted using
fmt
. So you have to do something like this
Copy code
datetime.strptime(partition_key, context.asset_partitions_def.fmt)
Which seems brittle to me. I’ll take a look at that second option you’re suggesting. That seems like a better way to change the IO manager’s behaviour. Thanks!
🙏 1
So, yeah, this turned out to be exactly what I needed. In fact, it’s also how the
DbIOManager
deals with figuring out which column to use to update / delete partitions from a table; so that was a good source for copy-pasting some code. Thanks again!
❤️ 1