https://dagster.io/ logo
#ask-community
Title
# ask-community
l

leehuwuj

07/09/2022, 8:46 AM
Materialize a selected task did not provide full OutputContext info. Hi everyone, i’m testing the pyspark-demo project but got some issue with OutputContext in IO-Manager. There are two assets: people -> people_over_50. If i materialize all asset in the same time they will be fine but when i materialize a downstream asset (people_over_50) then it will failed. I realize that the context provided to IOManager is not the same when materialize all and materialize one. Materialize all:
Copy code
{'_asset_info': AssetOutputInfo(key=AssetKey(['people']), partitions_fn=<function AssetLayer.from_graph_and_assets_node_mapping.<locals>.<lambda> at 0x134e733a0>, partitions_def=DailyPartitionsDefinition(schedule_type=<ScheduleType.DAILY: 'DAILY'>, start=datetime.datetime(2022, 7, 2, 0, 0), timezone='UTC', fmt='%Y-%m-%d', end_offset=0, minute_offset=0, hour_offset=0, day_offset=None), is_required=True),
 '_config': None,
 '_dagster_type': <dagster.core.types.dagster_type.TypeHintInferredDagsterType object at 0x134e359a0>,
 '_events': [],
 '_log': <DagsterLogManager dagster (NOTSET)>,
 '_mapping_key': None,
 '_metadata': {'owner': 'local@localhost'},
 '_metadata_entries': None,
 '_name': 'result',
 '_pipeline_name': '__ASSET_JOB_0',
 '_resource_config': {'output_type': 'PARQUET'},
 '_resources': _ScopedResources(pyspark=<dagster_pyspark.resources.PySparkResource object at 0x134f1e190>),
 '_resources_cm': None,
 '_run_id': 'd70f9280-0902-4d75-bf7b-25de5c911122',
 '_solid_def': <dagster.core.definitions.op_definition.OpDefinition object at 0x134e35a60>,
 '_step_context': <dagster.core.execution.context.system.StepExecutionContext object at 0x134fc1f10>,
 '_step_key': 'people',
 '_user_events': [],
 '_version': None,
 '_warn_on_step_context_use': True}
Materialize a downstream asset:
Copy code
{'_asset_info': AssetOutputInfo(key=AssetKey(['people']), partitions_fn=<function AssetOutputInfo.__new__.<locals>.<lambda> at 0x137019c10>, partitions_def=None, is_required=True),
 '_config': None,
 '_dagster_type': None,
 '_events': [],
 '_log': None,
 '_mapping_key': None,
 '_metadata': {'owner': 'local@localhost'},
 '_metadata_entries': None,
 '_name': 'people',
 '_pipeline_name': None,
 '_resource_config': {'output_type': 'PARQUET'},
 '_resources': _ScopedResources(pyspark=<dagster_pyspark.resources.PySparkResource object at 0x13706b130>),
 '_resources_cm': None,
 '_run_id': None,
 '_solid_def': None,
 '_step_context': None,
 '_step_key': 'none',
 '_user_events': [],
 '_version': None,
 '_warn_on_step_context_use': False}
Seems that currently, Dagster does not support to fill-up some missing information like (run_id, partitions_def,..) when materialize a specify asset? Do we have other replacement approaches to implement the pattern like, get the run id or partition key to construct the output path?
🤖 1
Sorry my bad, i called asset info instead of output context object. I think all i need are available in the execution_data and plan_data attribute.
3 Views