:wave: Wanted to get your all's take on file clea...
# announcements
a
👋 Wanted to get your all's take on file cleanup/dealing with tempfiles in dagster. A toy example I am playing with is downloading a csv (let's say the iris dataset from UCI) converting the csv into a multiline json file which I want to eventually load into a table in a sqlite db. I have everything working, but I am struggling with composing my pipeline in a way that deletes files once I am done with them. For example, I would like to delete my csv file once I transform everything to a json blob to not waste space. Here is one way I composed the DAG (I am excluding the load to sqlite solid for simplicity)
Copy code
@pipeline
def iris_ingestion_pipeline():
    convert_csv_to_json(
        download_csv_from_url_to_file()
    )
Currently, I am deleting the csv file in my
convert_csv_to_json
solid just to get it working. This is not ideal though because I am coupling transformation and IO logic. One option I got working was having
convert_csv_to_json
return the file to be deleted which would then get passed into a
delete_file
solid like so:
Copy code
@pipeline
def iris_ingestion_pipeline():
	delete_file(
	    convert_csv_to_json(
	        download_csv_from_url_to_file()
	    )
	)
This is also not ideal because the semantics of the convert_csv_to_json become brittle if you plan on reusing that solid in contexts where you don't want to delete the file I then tried to go the tempfile route, but the issue here is that I have no way of persisting a tempfile_fp across solids unless I hack the execution context. It is likely I am missing something obvious here with inputs but would love to get A: Your opinions for how to do this? Just so I have a better intuition around how solids are to be composed B: Opinions on support for tempfiles and tmpdir's?
a
One way to approach this problem would be to use the resource system to provide the directory to put the files. Resources can be written in a context manager yield once style - giving you an opportunity to clean the directory on pipeline completion or error.
We actually have one of these written that is demonstrated in the airline demo in the examples directory, let me grab a link
a
Ahhhhh I see. I knew I was missing something. That makes a lot of sense! Thanks so much Alex!