or should i just go with <https://docs.dagster.io/...
# ask-community
update i’ll go with the second approach the gcs file handler seems way too experimental and not easy to work with
s
file managers are used to write data inside ops/assets-- like chris said above, they’re an old pattern. IO managers handling passing data between ops/assets. There is a gcs IO manager
PickledObjectGCSIOManager
which is probably what you want: https://github.com/dagster-io/dagster/blame/master/python_modules/libraries/dagster-gcp/dagster_gcp/gcs/io_manager.py
b
So i’m actually reading the files in that someone else is posting, and the pickeled object gcs manager doesn’t cut it
correct me if i’m wrong
s
ah yes that’s right-- if you want to read arbitrary files I think you should use a custom IO manager-- you can use the file manager here if you like, or you can just use any other client API for GCS
b
Given that the format is also weird i think i just might write a client that loads from the gcs since it doesn’t really fit the bill in the io manager workflow either
Would it be considered “bad practice” if i just used the client from gcs resource and then did the loading of the files from the bucket within an asset ?
s
Plenty of people do I/O inside their assets, though we’re striving to develop our IO management layer to the point that that’s not necessary. What is it about your case that doesn’t fit IO managers? I would think you could model the inputs you want to load as source assets and have use the GCS client inside the
load_input
of a custom IO manager.
b
so the catch is that the files that are being posted to the bucket don’t have the name format that i could be using upfront. the use case is following. Someone at any point in the day posts a file to the bucket that has a timestamp from yesterday but at any point in time of the day. I need to ingest that file, since it’s custom format i also need to parse things out of it.
I could technically from scratch write out a custom io manager, that works with partitions and figures out which file it needs to pull
the processed file needs to be posted into snowflake
s
I don’t see why you couldn’t do this with an IO manager. This patterns makes me think you should use a dynamically partitioned asset to represent the incoming files, with each file corresponding to an asset partition. You can use a sensor to detect and generate new partitions for each file. There is an example here: https://docs.dagster.io/concepts/partitions-schedules-sensors/partitions#dynamically-partitioned-assets Your IO manager would just load the file (with filename given by partition key), then you can do whatever you want with it and write to differently partitioned snowflake-based assets downstream.
b
Gotten sidetracked with other stuff, i’ll start back on this tomorrow - it might just be that it’s my lack of knowledge that’s stopping me - i’ll give it a try
Thanks again Sean, Chris as well - you’re super helpful