Uri Laserson
12/24/2022, 3:48 PMsandy
12/27/2022, 4:52 PMfrom dagster import op, job, In
@op
def op1() -> None:
write_file_x()
@op(ins={"after": In(type(None))})
def op2() -> None:
read_file_x()
@job
def job1():
op2(after=op1())
Uri Laserson
12/28/2022, 2:32 AMsandy
12/28/2022, 6:10 PMBut anyway, would configuring just the boundary nodes to use one IOmanager and the inner ones to use another be a horrible mess in the dagster API?It's fairly straightforward in Dagster to say "use IO manager X for these node outputs and use IO manager Y for these other node outputs" It's hairier to say "these nodes need to run on the same machine but these other nodes can run on different machines"
Is there anything functional that is lost in doing what you suggest, other than the "aesthetic" aspects of using the function args to define inputs and composing the ops in the job? (Though it would be sad to lose that)Just that you're responsible for your own IO
Is there an IO manager that takes an S3 URI as input and output and "makes it available" to the op by actually writing it to a predictable location in the local filesystem?This is something I've played around with in the past but we don't have something out of the box. If you're trying to get something up and running as quick as possible, I'd recommend against this route and instead just put
download_from_s3()
and upload_to_s3
at the beginning and end of your ops / assets. I do think is the ideal solution ultimately and would help you with it if you wanted to write one.