Daniel Galea
12/14/2022, 10:36 AM@Op(ins={"start": In(Nothing)})
but start
is not a key that is available in ins
of an asset
. I have a pipeline that is like:
ingest_dataset1 -> ingest_dataset2 -> start_processing_data
in the form of:
asset -> asset -> op.
The assets do not depend on each other, I just want one to run after the other.
I could also use:
op -> op -> op
but it is to my understanding that an Op should be used when performing some type of transformation or logic whereas an asset is signifies something which is stored to a persistent storage.Vinnie
12/14/2022, 10:56 AMfrom dagster import asset, AssetIn
@asset
def asset_one():
pass
@asset(ins={"start": AssetIn("asset_one")})
def asset_two(start):
pass
Vinnie
12/14/2022, 11:03 AMmulti_asset
that will ensure one gets materialized after the other without having this dependency (which you may or may not want, I like the fact that the asset graph serves as a mental model for all dependencies in the data platform).
The latter would look something like this:
@multi_asset(
outs={
"asset_one": AssetOut(is_required=True),
"asset_two": AssetOut(is_required=True),
},
can_subset=True, # optional in case sometimes not all are/can be materialized
)
def my_assets(context):
yield Output(value=process_asset_one(), output_name="asset_one")
yield Output(value=process_asset_two(), output_name="asset_two")
Daniel Galea
12/14/2022, 3:29 PMdagster._core.errors.DagsterInvalidDefinitionError: In @graph ingestion_pipeline, received a tuple of multiple outputs for input "start" (at position 0) in op invocation launch_emr_cluster. Must pass individual output, available from tuple: ('customer_transactions', 'customer_information')
My Op:
@op(
ins={"start": In(Nothing)},
)
def launch_emr_cluster(context: OpExecutionContext) -> str:
...
Vinnie
12/14/2022, 3:33 PM@op(ins={"customer_transactions": AssetIn("customer_transactions"), "customer_information": AssetIn("customer_information")}
def my_op(context, customer_transactions, customer_information):
...
I also have to say, I’m not entirely sure if ops can depend on assets. The Dagster team is probably waking up around now and I’m sure they can help though 🙂Daniel Galea
12/14/2022, 3:36 PMZach
12/14/2022, 6:26 PMjamie
12/14/2022, 8:15 PMnon_argument_deps
https://docs.dagster.io/concepts/assets/software-defined-assets#non-argument-dependencies to define order dependencies without data dependencies between assetsDaniel Galea
12/15/2022, 9:18 AM