Hi,
i am implementing an ETL pipeline in dagster and want to use K8s for job execution, each step of ETL needs to be packaged in its own image due to dependencies conflics and each step needs to be executed either in a separate pod or job, im wondering:
1. how to pass input/outputs between each step, is there an automatic way of handling this?
2. how to trigger each step in a separate pod/job, is there any docs available for this
At the very high level, i have the following structure in my mind, is this the correct way to do this?
@op()
def load() -> pd.DataFrame:
// Run kubernetes job by passing file path as a reference
...
// once finished, get the results
return results
@op()
def transform(pd.DataFrame) -> pd.DataFrame:
// Run kubernetes job by passing df
...
// once done, get the results
return results
@op()
def load(pd.DataFrame)
// Run kubernetes job by passing df
Any hints/guidance will be really appreciated since im just getting started with dagster
Thanks a lot