Uri Laserson12/24/2022, 3:48 PM
sandy12/27/2022, 4:52 PM
from dagster import op, job, In
def op1() -> None:
def op2() -> None:
Uri Laserson12/28/2022, 2:32 AM
sandy12/28/2022, 6:10 PM
But anyway, would configuring just the boundary nodes to use one IOmanager and the inner ones to use another be a horrible mess in the dagster API?It's fairly straightforward in Dagster to say "use IO manager X for these node outputs and use IO manager Y for these other node outputs" It's hairier to say "these nodes need to run on the same machine but these other nodes can run on different machines"
Is there anything functional that is lost in doing what you suggest, other than the "aesthetic" aspects of using the function args to define inputs and composing the ops in the job? (Though it would be sad to lose that)Just that you're responsible for your own IO
Is there an IO manager that takes an S3 URI as input and output and "makes it available" to the op by actually writing it to a predictable location in the local filesystem?This is something I've played around with in the past but we don't have something out of the box. If you're trying to get something up and running as quick as possible, I'd recommend against this route and instead just put
at the beginning and end of your ops / assets. I do think is the ideal solution ultimately and would help you with it if you wanted to write one.