I’m looking for a general approach for Dagster. I’...
# ask-community
h
I’m looking for a general approach for Dagster. I’ve got a lot of C# programs that interact with APIs or websites and download/generate CSV files. Some of these C# programs interact with websites in a complex way and I’d rather not rewrite them in python at the moment. The output files all have a standardized naming pattern. How can I integrate this existing codebase and utilize the most of Dagster? Possible change I could make easily: • Make C# program return a json object detailing what all files have been generated and other such stuff. I want to have Dagster kick off the extract operation based on freshness, then have a downstream op to load the files into the database, then run dbt.
s
Before you run the C# program, do you know what CSV files it's going to create? The most Dagster-y way to do this would be something like:
Copy code
@asset
def x_csv_file():
    subprocess.Popen('run_my_c_sharp_program_that_generates_x_csv_file.sh', shell=True)
or it might be easier to just use one big "logical" asset to represent all the CSV files together
h
For some of my extracts I do know what CSV file it should generate. Others it will be multiple files as I usually save the pages of api results to different files. I guess I could make them all one if it makes things easier. In your first example you create the x_csv_file asset since the op will complete. So then the logical asset is used how? Does the downstream op just know about the actual files or do we sensor trigger based on their file creation location?
s
Does the downstream op just know about the actual files or do we sensor trigger based on their file creation location?
Either are possible, depending on your use case
If it would be helpful, I could find some time to chat about the different options here
h
I would actually love that. My real email is in my profile.