Jake Kagan
01/24/2023, 8:31 PMreusable_op
that queries gcp, then returns a dataframe.
sometimes i want that to be the first op to run, other times i want it to have dependencies.
but in both cases, i would like to pass arguments(run configuration) whenever i want, i.e. reusable.
i could have an io manager that converts a query into a dataframe. but what if i have a query that is dynamic with a placeholder. how can i pass a placeholder into an io manager? hoping someone has some examples of component reusability.yuhan
01/24/2023, 9:58 PMJake Kagan
01/25/2023, 3:36 AM@op(config_schema={"query": str, "placeholder": str}, out=Out(io_manager_key="io_bigq_to_df"))
def op_bf_to_df(context, upstream_op):
query = context.op_config['query']
placeholder = context.op_config['placeholder']
replacement = upstream_op
return query.replace(placeholder, replacement)
configged = configured(op_bf_to_df, name='configged')(
{"query": QRY_BIGQ_INCREMENTAL,
"placeholder": '$placeholder_latest_date$',
}
)
it would be nice if i could reuse that op by passing something for upstream_op
for example:
configged = configured(op_bf_to_df(something_like_this), name='configged')(
{"query": QRY_BIGQ_INCREMENTAL,
"placeholder": '$placeholder_latest_date$',
}
)