or should i just go with <https://docs.dagster.io/...
# ask-community
update i’ll go with the second approach the gcs file handler seems way too experimental and not easy to work with
file managers are used to write data inside ops/assets-- like chris said above, they’re an old pattern. IO managers handling passing data between ops/assets. There is a gcs IO manager
which is probably what you want: https://github.com/dagster-io/dagster/blame/master/python_modules/libraries/dagster-gcp/dagster_gcp/gcs/io_manager.py
So i’m actually reading the files in that someone else is posting, and the pickeled object gcs manager doesn’t cut it
correct me if i’m wrong
ah yes that’s right-- if you want to read arbitrary files I think you should use a custom IO manager-- you can use the file manager here if you like, or you can just use any other client API for GCS
Given that the format is also weird i think i just might write a client that loads from the gcs since it doesn’t really fit the bill in the io manager workflow either
Would it be considered “bad practice” if i just used the client from gcs resource and then did the loading of the files from the bucket within an asset ?
Plenty of people do I/O inside their assets, though we’re striving to develop our IO management layer to the point that that’s not necessary. What is it about your case that doesn’t fit IO managers? I would think you could model the inputs you want to load as source assets and have use the GCS client inside the
of a custom IO manager.
so the catch is that the files that are being posted to the bucket don’t have the name format that i could be using upfront. the use case is following. Someone at any point in the day posts a file to the bucket that has a timestamp from yesterday but at any point in time of the day. I need to ingest that file, since it’s custom format i also need to parse things out of it.
I could technically from scratch write out a custom io manager, that works with partitions and figures out which file it needs to pull
the processed file needs to be posted into snowflake
I don’t see why you couldn’t do this with an IO manager. This patterns makes me think you should use a dynamically partitioned asset to represent the incoming files, with each file corresponding to an asset partition. You can use a sensor to detect and generate new partitions for each file. There is an example here: https://docs.dagster.io/concepts/partitions-schedules-sensors/partitions#dynamically-partitioned-assets Your IO manager would just load the file (with filename given by partition key), then you can do whatever you want with it and write to differently partitioned snowflake-based assets downstream.
Gotten sidetracked with other stuff, i’ll start back on this tomorrow - it might just be that it’s my lack of knowledge that’s stopping me - i’ll give it a try
Thanks again Sean, Chris as well - you’re super helpful