https://dagster.io/ logo
#ask-ai
Title
# ask-ai
p

Peter Mburu

03/04/2024, 12:08 PM
@Scout Explain the use of Output
s

Scout

03/04/2024, 12:08 PM
The
Output
event in Dagster is a critical component for passing data between steps in a pipeline. When you define a computation or a step within a pipeline using the
@op
decorator, you can use the
Output
object to send the result of that computation to subsequent steps. Here's how you can use `Output`: 1. Returning Output Directly: For many use cases, you can return a value directly from an op, and Dagster will handle it as an output. This is the simplest form of creating an output.
Copy code
python @op def my_output_op():     return 5
2. Using Output Object: If you need additional functionality, such as specifying output metadata or conditional branching, you can use the
Output
object. This object can be either returned or yielded from an op, and it allows you to attach metadata to the output or to specify the output's name if there are multiple outputs.
Copy code
python from dagster import Output, op  @op def my_output_op() -> Output:     return Output("some_value", metadata={"some_metadata": "a_value"})
3. Parameterized Type Annotations: When using the
Output
object, you can also provide type annotations to ensure type safety and clarity in your pipeline's design.
Copy code
python from dagster import Output, op  @op def my_output_generic_op() -> Output[int]:     return Output(5, metadata={"some_metadata": "a_value"})
4. Yielding Multiple Outputs: If an op has multiple outputs, you can yield
Output
objects with specified output names. When yielding outputs, type annotations cannot be used directly, but you can specify type information using the
out
argument of the op decorator.
Copy code
python from dagster import Output, op, Out  @op(out={"out1": Out(str), "out2": Out(int)}) def my_op_yields():     yield Output("foo", output_name="out1")     yield Output(5, output_name="out2")
5. Attaching Metadata to Outputs: You can attach metadata to an output to log information specific to that output. This is done using the
metadata
parameter of the
Output
object. The
Output
event is a flexible and powerful way to manage data flow and metadata within Dagster pipelines, providing clear interfaces for single and multiple outputs, type safety, and additional metadata capabilities.
p

Peter Mburu

03/04/2024, 12:19 PM
@Scout ====================================================================== ERROR [0.867s]: test_import_imf_data (dagster_pipelines.assets.ingestion.price.tests.test_pipelines.IMFIngestionPipelineTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/pmburu/git/fdw/dagster_pipelines/assets/ingestion/price/tests/test_pipelines.py", line 243, in test_import_imf_data result = materialize_to_memory(assets) File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/definitions/materialize.py", line 193, in materialize_to_memory return materialize( File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/definitions/materialize.py", line 103, in materialize return check.not_none( File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/definitions/job_definition.py", line 712, in execute_in_process return core_execute_in_process( File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/execute_in_process.py", line 76, in core_execute_in_process event_list = list(execute_run_iterable) File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/api.py", line 875, in iter yield from self.iterator( File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/api.py", line 764, in job_execution_iterator for event in job_context.executor.execute(job_context, execution_plan): File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/executor/in_process.py", line 55, in execute yield from iter( File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/api.py", line 875, in iter yield from self.iterator( File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/executor/in_process.py", line 26, in inprocess_execution_iterator yield from inner_plan_execution_iterator( File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/plan/execute_plan.py", line 121, in inner_plan_execution_iterator for step_event in check.generator( File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/plan/execute_plan.py", line 385, in dagster_event_sequence_for_step raise error File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/plan/execute_plan.py", line 286, in dagster_event_sequence_for_step for step_event in check.generator(step_events): File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/plan/execute_step.py", line 519, in core_dagster_event_sequence_for_step for user_event in _step_output_error_checked_user_event_sequence( File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/plan/execute_step.py", line 201, in _step_output_error_checked_user_event_sequence for user_event in user_event_sequence: File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/plan/execute_step.py", line 99, in _process_asset_results_to_events for user_event in user_event_sequence: File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/plan/compute.py", line 214, in execute_core_compute for step_output in _yield_compute_results(step_context, inputs, compute_fn, compute_context): File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/plan/compute.py", line 183, in _yield_compute_results for event in iterate_with_context( File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_utils/__init__.py", line 467, in iterate_with_context next_output = next(iterator) File "/home/pmburu/git/fdw/dagster_pipelines/assets/ingestion/price/imf_ingestion.py", line 532, in remote_imf_data def fetch_and_yield_imf_data(metadata_df, document_type, indicators, out_name, period_date=context.partition_key): File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/context/compute.py", line 1690, in partition_key return self.op_execution_context.partition_key File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/context/compute.py", line 306, in partition_key return self._step_execution_context.partition_key File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/context/system.py", line 391, in partition_key raise DagsterInvariantViolationError( dagster._core.errors.DagsterInvariantViolationError: Cannot access partition_key for a non-partitioned run ---------------------------------------------------------------------- Ran 1 test in 1.407s
s

Scout

03/04/2024, 12:19 PM
The error you're encountering indicates that there is an attempt to access the
partition_key
property within a non-partitioned run. The
partition_key
is only relevant when you are working with partitioned assets or runs, which are a way to slice your data processing into discrete chunks that can be processed independently, often corresponding to logical timeframes like days or months. Here's a breakdown of the error message: 1. The
partition_key
property is being accessed within the
remote_imf_data
function. 2. The
partition_key
is being accessed through the
context
object, which is provided to the function during execution. 3. The error is raised because the current run is not partitioned, and therefore, the
partition_key
property is not set and should not be accessed. To resolve this issue, you have a few options: - If the run is supposed to be partitioned, ensure that you are setting up the partitions correctly in your asset or pipeline definition and that you are initiating a partitioned run. - If the run is not supposed to be partitioned, you should modify the
remote_imf_data
function to avoid accessing the
partition_key
property, or handle the case where it is not available. Here's an example of how you might modify the
remote_imf_data
function to handle the absence of a `partition_key`:
Copy code
python @op def remote_imf_data(context, metadata_df, document_type, indicators, out_name):     if context.has_partition_key:         period_date = context.partition_key         # ... use period_date in your logic     else:         # ... handle the non-partitioned case
In this example,
context.has_partition_key
is a made-up property for illustration purposes. You would need to implement a check to determine whether the
partition_key
is available in your specific context. Remember to review your pipeline or asset definitions to ensure that partitioning is set up correctly if it is required for your use case. If you continue to face issues, you may want to look into the documentation or source code related to partitioning in Dagster or seek help from the community or support channels.
p

Peter Mburu

03/04/2024, 12:22 PM
@Scout I'm using materialize_to_memory for testing. How do i pass the partition_key?
s

Scout

03/04/2024, 12:22 PM
When using
materialize_to_memory
for testing partitioned assets in Dagster, you can pass the
partition_key
as an argument to specify which partition you want to materialize. This is useful when you want to test the behavior of assets under different partitioned conditions. Here's an example of how you might use
materialize_to_memory
with a `partition_key`:
Copy code
python from dagster import materialize_to_memory  result = materialize_to_memory(     assets=[your_asset],     partition_key="2023-01-01"  # Replace with the appropriate partition key )
In this example,
your_asset
is the asset you want to test, and
"2023-01-01"
is the partition key that corresponds to the partition you want to materialize. Replace
"2023-01-01"
with the actual partition key that is relevant to your test. If you encounter issues with passing the
partition_key
, ensure that: 1. The asset you are testing is indeed set up with a partitioning scheme. 2. The
partition_key
you are passing matches one of the defined partitions for the asset. 3. The asset's dependencies, if any, are also compatible with the specified
partition_key
. If you continue to face issues, it may be helpful to refer to the documentation for the most up-to-date usage patterns and examples related to partitioned assets and testing with
materialize_to_memory
. Additionally, you can check for any relevant discussions or issues in the Dagster community that might address similar testing scenarios.
p

Peter Mburu

03/04/2024, 12:24 PM
@Scout How do i test multi-asset in testing
s

Scout

03/04/2024, 12:24 PM
Testing a graph of assets in Dagster involves verifying that the assets work correctly together, both individually and as a whole. To test multiple assets, you can group them into a list and then pass that list to the
materialize
function during testing. This will execute the assets and allow you to inspect the results. Here's a step-by-step guide on how to test multiple assets: 1. Import the assets: Import all the assets you want to test into your test file. 2. Group the assets: Create a list that contains all the assets you want to test together. 3. Materialize the assets: Use the
materialize
function to execute the assets. This function returns an
ExecuteInProcessResult
object. 4. Assert the results: Use the
ExecuteInProcessResult
object to check the success of the execution and to retrieve the outputs of the assets for further assertions. Here's an example of how you might write a test for a group of assets:
Copy code
python from dagster import materialize from your_project.assets import asset1, asset2, asset3  def test_assets():     assets = [asset1, asset2, asset3]     result = materialize(assets)     assert result.success      # If you want to assert on the output of specific assets, you can do so like this:     output_asset1 = result.output_for_node("asset1")     output_asset2 = result.output_for_node("asset2")     # ... and so on for each asset you want to check.      # Perform your assertions on the outputs     assert output_asset1 is not None     assert output_asset2 == expected_value     # ... and so on for each asset you want to check.
In this example,
asset1
,
asset2
, and
asset3
are the assets you are testing. The
output_for_node
method is used to retrieve the output of each asset by its name. You can then perform assertions on these outputs to ensure they meet your expectations. Remember to replace
your_project.assets
with the actual module path where your assets are defined, and
asset1
,
asset2
,
asset3
with the actual names of your assets. To run the tests, you can use a test runner like
pytest
. Navigate to the directory containing your test file and run the test runner command (e.g.,
pytest test_assets.py
). Always refer to the documentation for the latest best practices and examples on testing assets in Dagster.
p

Peter Mburu

03/04/2024, 12:42 PM
@Scout Explain the code below: from dagster import AssetOut, Output, multi_asset @multi_asset( outs={ "a": AssetOut(is_required=False), "b": AssetOut(is_required=False), }, can_subset=True, ) def split_actions(context: AssetExecutionContext): if "a" in context.op_execution_context.selected_output_names: yield Output(value=123, output_name="a") if "b" in context.op_execution_context.selected_output_names: yield Output(value=456, output_name="b")
s

Scout

03/04/2024, 12:42 PM
The code snippet you've provided defines a
multi_asset
in Dagster, which is a function that can produce multiple assets as outputs. Let's break down the code: 1. Imports: The code imports
AssetOut
,
Output
, and
multi_asset
from the
dagster
module. These are used to define the outputs of the multi-asset and to yield values for those outputs. 2. Multi-Asset Definition: The
@multi_asset
decorator is used to define a multi-asset function named
split_actions
. This function can produce two separate assets, named "a" and "b". 3. Outputs Specification: - The
outs
parameter of the
@multi_asset
decorator is a dictionary that defines the outputs of the multi-asset. Each key in the dictionary corresponds to an output name, and the value is an
AssetOut
object. -
AssetOut(is_required=False)
indicates that the corresponding output asset is optional. This means that the multi-asset function can complete successfully without necessarily yielding a value for these outputs. 4. Subsetting: The
can_subset=True
parameter allows for subsetting, which means that when this multi-asset is executed, it can produce a subset of its defined outputs based on the selection criteria provided at runtime. 5. Multi-Asset Function: - The
split_actions
function takes an
AssetExecutionContext
object as its context parameter. This context provides system information such as resources, config, and logging, and it is specific to asset execution. - Inside the function, there are conditional checks to determine which outputs have been selected for execution using
context.op_execution_context.selected_output_names
. This attribute contains the names of the outputs that have been selected for materialization. - If "a" is in the selected output names, the function yields an
Output
with the value
123
and the
output_name
"a". Similarly, if "b" is in the selected output names, it yields an
Output
with the value
456
and the
output_name
"b". The
yield
statements are used to emit the values for each asset. When the function is executed, it will check which outputs have been selected and only produce those that are required. This allows for flexibility in asset generation and can be particularly useful when different outputs are needed under different circumstances or based on different configurations. In summary, this code defines a multi-asset in Dagster that can optionally produce two separate assets, "a" and "b", based on the outputs selected at runtime. The values of these assets are
123
and
456
, respectively.
p

Peter Mburu

03/04/2024, 12:46 PM
@Scout write sample code that has 3 multi-assets and use selected_output_names
s

Scout

03/04/2024, 12:46 PM
Certainly! Below is an example of how you might define three multi-assets using the
selected_output_names
to conditionally yield outputs. Each multi-asset will have two potential outputs, "output1" and "output2", and the yielding of these outputs will depend on whether they have been selected for execution.
Copy code
python from dagster import AssetOut, Output, multi_asset, AssetExecutionContext  # First multi-asset @multi_asset(     outs={         "output1": AssetOut(is_required=False),         "output2": AssetOut(is_required=False),     },     can_subset=True, ) def multi_asset_one(context: AssetExecutionContext):     if "output1" in context.op_execution_context.selected_output_names:         yield Output(value="value1_asset_one", output_name="output1")     if "output2" in context.op_execution_context.selected_output_names:         yield Output(value="value2_asset_one", output_name="output2")  # Second multi-asset @multi_asset(     outs={         "output1": AssetOut(is_required=False),         "output2": AssetOut(is_required=False),     },     can_subset=True, ) def multi_asset_two(context: AssetExecutionContext):     if "output1" in context.op_execution_context.selected_output_names:         yield Output(value="value1_asset_two", output_name="output1")     if "output2" in context.op_execution_context.selected_output_names:         yield Output(value="value2_asset_two", output_name="output2")  # Third multi-asset @multi_asset(     outs={         "output1": AssetOut(is_required=False),         "output2": AssetOut(is_required=False),     },     can_subset=True, ) def multi_asset_three(context: AssetExecutionContext):     if "output1" in context.op_execution_context.selected_output_names:         yield Output(value="value1_asset_three", output_name="output1")     if "output2" in context.op_execution_context.selected_output_names:         yield Output(value="value2_asset_three", output_name="output2")
In this example, each multi-asset function checks the
selected_output_names
to determine which outputs have been selected for execution. If an output name is present in the
selected_output_names
, the function yields an
Output
with a corresponding value and output name. When you execute these multi-assets, you can specify which outputs you want to materialize by providing the selected output names. This allows you to subset the outputs and only materialize the ones you need. The
can_subset
parameter is set to
True
to enable this behavior. Please note that the actual values and output names are placeholders and should be replaced with values that make sense for your specific use case.
p

Peter Mburu

03/04/2024, 12:48 PM
@Scout Update so that Third multi-asset depends on second multi-asset which depend on ist multi-asset
s

Scout

03/04/2024, 12:48 PM
To create dependencies between multi-assets in Dagster, you can use the
AssetIn
and
AssetOut
classes to define the inputs and outputs of each asset. The following example demonstrates how to define three multi-assets where the third multi-asset depends on the second, and the second depends on the first:
Copy code
python
from dagster import AssetIn, AssetOut, Output, multi_asset, AssetExecutionContext

# First multi-asset
@multi_asset(
    outs={
        "output1": AssetOut(is_required=False),
        "output2": AssetOut(is_required=False),
    },
    can_subset=True,
)
def multi_asset_one(context: AssetExecutionContext):
    if "output1" in context.op_execution_context.selected_output_names:
        yield Output(value="value1_asset_one", output_name="output1")
    if "output2" in context.op_execution_context.selected_output_names:
        yield Output(value="value2_asset_one", output_name="output2")

# Second multi-asset, depending on the first multi-asset
@multi_asset(
    ins={"input_from_asset_one": AssetIn(key="multi_asset_one")},
    outs={
        "output1": AssetOut(is_required=False),
        "output2": AssetOut(is_required=False),
    },
    can_subset=True,
)
def multi_asset_two(context: AssetExecutionContext, input_from_asset_one):
    if "output1" in context.op_execution_context.selected_output_names:
        yield Output(value=f"processed_{input_from_asset_one}_asset_two", output_name="output1")
    if "output2" in context.op_execution_context.selected_output_names:
        yield Output(value=f"processed_{input_from_asset_one}_asset_two", output_name="output2")

# Third multi-asset, depending on the second multi-asset
@multi_asset(
    ins={"input_from_asset_two": AssetIn(key="multi_asset_two")},
    outs={
        "output1": AssetOut(is_required=False),
        "output2": AssetOut(is_required=False),
    },
    can_subset=True,
)
def multi_asset_three(context: AssetExecutionContext, input_from_asset_two):
    if "output1" in context.op_execution_context.selected_output_names:
        yield Output(value=f"processed_{input_from_asset_two}_asset_three", output_name="output1")
    if "output2" in context.op_execution_context.selected_output_names:
        yield Output(value=f"processed_{input_from_asset_two}_asset_three", output_name="output2")
In this updated example: -
multi_asset_one
remains unchanged and serves as the base asset. -
multi_asset_two
now has an input named
input_from_asset_one
, which depends on an output from
multi_asset_one
. The
AssetIn
object is used to define this dependency, and the
key
parameter specifies the asset key that
multi_asset_two
depends on. -
multi_asset_three
has an input named
input_from_asset_two
, which depends on an output from
multi_asset_two
. Similarly, the
AssetIn
object wit
h the
key
parameter is used to define this dependency. When you execute these assets, Dagster will ensure that
multi_asset_one
runs first, followed by
multi_asset_two
, and finally
multi_asset_three
, respecting the dependencies defined between them. The placeholder strings
processed_{input_from_asset_one}_asset_two
and
processed_{input_from_asset_two}_asset_three
are used to indicate that the output values of
multi_asset_two
and
multi_asset_three
are derived from their respective inputs. You should replace these placeholders with actual processing logic relevant to your use case.
p

Peter Mburu

03/04/2024, 1:01 PM
@Scout from dagster import AssetIn, AssetOut, Output, multi_asset, AssetExecutionContext # First multi-asset @multi_asset( outs={ "output1": AssetOut(is_required=False), "output2": AssetOut(is_required=False), }, can_subset=True, ) def multi_asset_one(context: AssetExecutionContext): if "output1" in context.op_execution_context.selected_output_names: yield Output(value="value1_asset_one", output_name="output1") if "output2" in context.op_execution_context.selected_output_names: yield Output(value="value2_asset_one", output_name="output2") # Second multi-asset, depending on the first multi-asset @multi_asset( ins={"input_from_asset_one": AssetIn(key="multi_asset_one")}, outs={ "output21": AssetOut(is_required=False), "output22": AssetOut(is_required=False), }, can_subset=True, ) def multi_asset_two(context: AssetExecutionContext, input_from_asset_one): if "output21" in context.op_execution_context.selected_output_names: yield Output(value=f"processed_{input_from_asset_one}_asset_two", output_name="output21") if "output22" in context.op_execution_context.selected_output_names: yield Output(value=f"processed_{input_from_asset_one}_asset_two", output_name="output22") # Third multi-asset, depending on the second multi-asset @multi_asset( ins={"input_from_asset_two": AssetIn(key="multi_asset_two")}, outs={ "output31": AssetOut(is_required=False), "output32": AssetOut(is_required=False), }, can_subset=True, ) def multi_asset_three(context: AssetExecutionContext, input_from_asset_two): if "output31" in context.op_execution_context.selected_output_names: yield Output(value=f"processed_{input_from_asset_two}_asset_three", output_name="output31") if "output32" in context.op_execution_context.selected_output_names: yield Output(value=f"processed_{input_from_asset_two}_asset_three", output_name="output32") fails with: 2024-03-04 155813 +0300 - dagster.daemon.SchedulerDaemon - WARNING - Could not load location pipelines to check for schedules due to the following error: dagster._core.errors.DagsterInvalidDefinitionError: Input asset '["multi_asset_one"]' for asset '["output22"]' is not produced by any of the provided asset ops and is not one of the provided sources Stack Trace: File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_grpc/server.py", line 408, in init self._loaded_repositories: Optional[LoadedRepositories] = LoadedRepositories( ^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_grpc/server.py", line 242, in init loadable_targets = get_loadable_targets( ^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_grpc/utils.py", line 50, in get_loadable_targets else loadable_targets_from_python_module(module_name, working_directory) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/workspace/autodiscovery.py", line 35, in loadable_targets_from_python_module module = load_python_module( ^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/code_pointer.py", line 134, in load_python_module return importlib.import_module(module_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<frozen importlib._bootstrap>", line 1204, in _gcd_import File "<frozen importlib._bootstrap>", line 1176, in _find_and_load File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 690, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 940, in exec_module File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed File "/home/pmburu/repo/msc/uti/pipelines/__init__.py", line 18, in <module> defs = Definitions( ^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/definitions_class.py", line 433, in init self._created_pending_or_normal_repo = _create_repository_using_definitions_args( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/definitions_class.py", line 297, in _create_repository_using_definitions_args @repository( ^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/decorators/repository_decorator.py", line 163, in call else CachingRepositoryData.from_list( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/repository_definition/repository_data.py", line 370, in from_list return build_caching_repository_data_from_list( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/repository_definition/repository_data_builder.py", line 237, in build_caching_repository_data_from_list for job_def in get_base_asset_jobs( ^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/assets_job.py", line 85, in get_base_asset_jobs build_assets_job( File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/assets_job.py", line 213, in build_assets_job resolved_asset_deps = ResolvedAssetDependencies(assets, resolved_source_assets) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/resolved_asset_deps.py", line 22, in init self._deps_by_assets_def_id = resolve_assets_def_deps(assets_defs, source_assets) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/resolved_asset_deps.py", line 105, in resolve_assets_def_deps raise DagsterInvalidDefinitionError( 2024-03-04 155813 +0300 - dagster.code_server - ERROR - Error while importing code Traceback (most recent call last): File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_grpc/server.py", line 408, in init self._loaded_repositories: Optional[LoadedRepositories] = LoadedRepositories( ^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_grpc/server.py", line 242, in init loadable_targets = get_loadable_targets( ^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_grpc/utils.py", line 50, in get_loadable_targets else loadable_targets_from_python_module(module_name, working_directory) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/workspace/autodiscovery.py", line 35, in loadable_targets_from_python_module module = load_python_module( ^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/code_pointer.py", line 134, in load_python_module return importlib.import_module(module_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<frozen importlib._bootstrap>", line 1204, in _gcd_import File "<frozen importlib._bootstrap>", line 1176, in _find_and_load File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 690, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 940, in exec_module File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed File "/home/pmburu/repo/msc/uti/pipelines/__init__.py", line 18, in <module> defs = Definitions( ^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/definitions_class.py", line 433, in init self._created_pending_or_normal_repo = _create_repository_using_definitions_args( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/definitions_class.py", line 297, in _create_repository_using_definitions_args @repository( ^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/decorators/repository_decorator.py", line 163, in call else CachingRepositoryData.from_list( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/repository_definition/repository_data.py", line 370, in from_list return build_caching_repository_data_from_list( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/repository_definition/repository_data_builder.py", line 237, in build_caching_repository_data_from_list for job_def in get_base_asset_jobs( ^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/assets_job.py", line 85, in get_base_asset_jobs build_assets_job( File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/assets_job.py", line 213, in build_assets_job resolved_asset_deps = ResolvedAssetDependencies(assets, resolved_source_assets) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/resolved_asset_deps.py", line 22, in init self._deps_by_assets_def_id = resolve_assets_def_deps(assets_defs, source_assets) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/resolved_asset_deps.py", line 105, in resolve_assets_def_deps raise DagsterInvalidDefinitionError( dagster._core.errors.DagsterInvalidDefinitionError: Input asset '["multi_asset_one"]' for asset '["output21"]' is not produced by any of the provided asset ops and is not one of the provided sources /home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/workspace/context.py622 UserWarning: Error loading repository location pipelinesdagster. core.errors.DagsterInvalidDefinitionError Input asset '["multi_asset_one"]' for asset '["output21"]' is not produced by any of the provided asset ops and is not one of the provided sources
s

Scout

03/04/2024, 1:01 PM
The error message you're encountering indicates that there is a problem with the way the assets are defined and how they are supposed to be connected. Specifically, the error states that the input asset
["multi_asset_one"]
for asset
["output21"]
is not produced by any of the provided asset ops and is not one of the provided sources. This error typically occurs when there is a mismatch between the asset keys used in the
AssetIn
definitions and the actual asset keys that are produced by the multi-assets. In Dagster, asset keys are used to uniquely identify assets and to define dependencies between them. To resolve this issue, you need to ensure that the asset keys specified in the
AssetIn
definitions match the keys of the assets that are actually produced. Here are some steps to troubleshoot and fix the problem: 1. Check Asset Keys: Ensure that the asset keys you are using in the
AssetIn
definitions match the keys of the assets produced by the corresponding multi-assets. Asset keys are usually derived from the names of the assets, but they can also be explicitly specified. 2. Define Asset Keys Explicitly: If you haven't already, you may want to define asset keys explicitly using the
key
parameter of the
AssetOut
class. This can help avoid any ambiguity about which assets are being referred to. 3. Review Asset Definitions: Go through your asset definitions to make sure that each asset is correctly producing the outputs that other assets depend on. 4. Check Repository Definition: Ensure that all assets are included in the repository definition and that there are no missing assets that could be causing the error. Here's an example of how you might explicitly define asset keys to ensure that dependencies are correctly resolved: ```python from dagster import AssetIn, AssetOut, Output, multi_asset, AssetExecutionContext # First multi-asset @multi_asset( outs={ "output1": AssetOut(key="asset_one_output1", is_required=False), "output2": AssetOut(key="asset_one_output2", is_required=False), }, can_subset=True, ) def multi_asset_one(context: AssetExecutionContext): # ... (same as before) # Second multi-asset, depending on the first multi-asset @multi_asset( ins={"input_from_asset_one": AssetIn(key="asset_one_output1")}, outs={ "output21": AssetOut(key="asset_two_output21", is_required=False), "output22": AssetOut(key="asset_two_output22", is_required=False), }, can_subset=True, ) def multi_asset_two(context: AssetExecutionContext, input_from_asset_one): # ... (same as before) # Third multi-asset, depending on the second multi-asset @multi_asset( ins={"input_from_asset_two": AssetIn(key="asset_two_output21")}, outs={ "output31": AssetOut(ke
y="asset_three_output31", is_required=False), "output32": AssetOut(key="asset_three_output32", is_required=False), }, can_subset=True, ) def multi_asset_three(context: AssetExecutionContext, input_from_asset_two): # ... (same as before) ``` In this example, each
AssetOut
is given an explicit
key
, which is then used in the
AssetIn
definitions to establish the dependencies. Make sure to replace the placeholder keys with the actual keys that make sense for your assets. If you continue to encounter issues, you may want to consult the Dagster documentation or seek help from the community for more specific guidance on defining and connecting multi-assets.
p

Peter Mburu

03/04/2024, 2:52 PM
@Scout how do i fix this code: from dagster import AssetIn, AssetOut, Output, multi_asset, AssetExecutionContext, AssetKey # First multi-asset @multi_asset( outs={ "output1": AssetOut(is_required=False), "output2": AssetOut(is_required=False), }, can_subset=True, ) def multi_asset_one(context: AssetExecutionContext): if "output1" in context.op_execution_context.selected_output_names: yield Output(value="value1_asset_one", output_name="output1") if "output2" in context.op_execution_context.selected_output_names: yield Output(value="value2_asset_one", output_name="output2") # Second multi-asset, depending on the first multi-asset @multi_asset( outs={ "output21": AssetOut(is_required=False), "output22": AssetOut(is_required=False), }, internal_asset_deps={ "output21": {AssetKey("output1")}, "output22": {AssetKey("output2")} }, can_subset=True, ) def multi_asset_two(context: AssetExecutionContext): if "output21" in context.op_execution_context.selected_output_names: load the output of the first multi-asset yield Output(value=f"processed_{output1}_asset_two", output_name="output21") if "output22" in context.op_execution_context.selected_output_names: load the output of the second multi-asset yield Output(value=f"processed_{output2}_asset_two", output_name="output22")
s

Scout

03/04/2024, 2:52 PM
To fix the code, you need to ensure that the dependencies between the multi-assets are correctly defined and that the outputs from the first multi-asset are being passed as inputs to the second multi-asset. Here's a revised version of the code with the necessary changes:
Copy code
python
from dagster import AssetIn, AssetOut, Output, multi_asset, AssetExecutionContext, AssetKey

# First multi-asset
@multi_asset(
    outs={
        "output1": AssetOut(key="multi_asset_one_output1", is_required=False),
        "output2": AssetOut(key="multi_asset_one_output2", is_required=False),
    },
    can_subset=True,
)
def multi_asset_one(context: AssetExecutionContext):
    if "output1" in context.op_execution_context.selected_output_names:
        yield Output(value="value1_asset_one", output_name="output1")
    if "output2" in context.op_execution_context.selected_output_names:
        yield Output(value="value2_asset_one", output_name="output2")

# Second multi-asset, depending on the first multi-asset
@multi_asset(
    ins={
        "input_from_output1": AssetIn(key="multi_asset_one_output1"),
        "input_from_output2": AssetIn(key="multi_asset_one_output2"),
    },
    outs={
        "output21": AssetOut(key="multi_asset_two_output21", is_required=False),
        "output22": AssetOut(key="multi_asset_two_output22", is_required=False),
    },
    can_subset=True,
)
def multi_asset_two(context: AssetExecutionContext, input_from_output1, input_from_output2):
    if "output21" in context.op_execution_context.selected_output_names:
        # Process the output of the first multi-asset
        processed_value = f"processed_{input_from_output1}_asset_two"
        yield Output(value=processed_value, output_name="output21")
    if "output22" in context.op_execution_context.selected_output_names:
        # Process the output of the second multi-asset
        processed_value = f"processed_{input_from_output2}_asset_two"
        yield Output(value=processed_value, output_name="output22")
Here are the changes made to the original code: 1. Explicit Asset Keys: Each
AssetOut
in
multi_asset_one
is given an explicit key using the
key
parameter. These keys are used to identify the assets produced by
multi_asset_one
. 2. Input Definitions: In
multi_asset_two
, the
ins
parameter is used to define the inputs that depend on the outputs of
multi_asset_one
. The keys used in
AssetIn
match the keys defined in the outputs of
multi_asset_one
. 3. Removed
internal_asset_deps
: The
internal_asset_deps
parameter is not needed in this case because we are using the
ins
parameter to define the dependencies. The
internal_asset_deps
parameter is used when you need to specify dependencies between outputs within the same multi-asset. 4. *Passing Inputs
*: The inputs
input_from_output1
and
input_from_output2
are passed as arguments to
multi_asset_two
and are used to process the values. 5. Processing Logic: The placeholders
load the output of the first multi-asset
and
load the output of the second multi-asset
have been replaced with actual processing logic that uses the inputs. With these changes, the dependencies between the multi-assets should be correctly defined, and the code should work as expected. Make sure to replace the processing logic with the actual logic you need for your use case.
p

Peter Mburu

03/04/2024, 2:56 PM
@Scout without passing inputs as arguments
multi_asset_two
s

Scout

03/04/2024, 2:56 PM
If you want to define dependencies between assets without passing inputs as arguments in
multi_asset_two
, you can use the
AssetKey
to specify dependencies. However, you will need to ensure that the assets are part of the same job or repository so that Dagster can resolve the dependencies correctly. Here's how you can define
multi_asset_two
without passing inputs as arguments:
Copy code
python
from dagster import AssetIn, AssetOut, Output, multi_asset, AssetExecutionContext, AssetKey

# First multi-asset
@multi_asset(
    outs={
        "output1": AssetOut(key="multi_asset_one_output1", is_required=False),
        "output2": AssetOut(key="multi_asset_one_output2", is_required=False),
    },
    can_subset=True,
)
def multi_asset_one(context: AssetExecutionContext):
    if "output1" in context.op_execution_context.selected_output_names:
        yield Output(value="value1_asset_one", output_name="output1")
    if "output2" in context.op_execution_context.selected_output_names:
        yield Output(value="value2_asset_one", output_name="output2")

# Second multi-asset, depending on the first multi-asset
@multi_asset(
    outs={
        "output21": AssetOut(key="multi_asset_two_output21", is_required=False),
        "output22": AssetOut(key="multi_asset_two_output22", is_required=False),
    },
    can_subset=True,
)
def multi_asset_two(context: AssetExecutionContext):
    if "output21" in context.op_execution_context.selected_output_names:
        # Assume we have a way to retrieve the value from the first multi-asset's output1
        input_from_output1 = context.get_asset(AssetKey("multi_asset_one_output1"))
        processed_value = f"processed_{input_from_output1}_asset_two"
        yield Output(value=processed_value, output_name="output21")
    if "output22" in context.op_execution_context.selected_output_names:
        # Assume we have a way to retrieve the value from the first multi-asset's output2
        input_from_output2 = context.get_asset(AssetKey("multi_asset_one_output2"))
        processed_value = f"processed_{input_from_output2}_asset_two"
        yield Output(value=processed_value, output_name="output22")
In this example,
multi_asset_two
does not take inputs as arguments. Instead, it assumes that there is a way to retrieve the values of the first multi-asset's outputs within the execution context. The
context.get_asset
method is a placeholder for the actual method you would use to retrieve the asset values. Dagster does not provide a
context.get_asset
method, so you would need to implement the logic to retrieve the asset values based on your storage and execution environment. Please note that the actual implementation of how you retrieve the asset values will depend on the specifics of your Dagster setup, such as the I/O ma
nager you are using and how your assets are stored. You may need to access the outputs of
multi_asset_one
from the storage system directly, or use an I/O manager that can load the required asset values.
p

Peter Mburu

03/04/2024, 3:30 PM
@Scout # First multi-asset @multi_asset( outs={ "output1": AssetOut(is_required=False), "output2": AssetOut(is_required=False), }, can_subset=True, ) def multi_asset_one(context: AssetExecutionContext): if "output1" in context.op_execution_context.selected_output_names: yield Output(value="value1_asset_one", output_name="output1") if "output2" in context.op_execution_context.selected_output_names: yield Output(value="value2_asset_one", output_name="output2") # Second multi-asset, depending on the first multi-asset @multi_asset( outs={ "output21": AssetOut(is_required=False), "output22": AssetOut(is_required=False), }, internal_asset_deps={ "output21": {AssetKey("output1")}, "output22": {AssetKey("output2")} }, can_subset=True, ) def multi_asset_two(context: AssetExecutionContext): if "output21" in context.op_execution_context.selected_output_names: output1 = context.get_asset(AssetKey("output1")) yield Output(value=f"processed_{output1}_asset_two", output_name="output21") if "output22" in context.op_execution_context.selected_output_names: output2 = context.get_asset(AssetKey("output2")) yield Output(value=f"processed_{output2}_asset_two", output_name="output22") fails with: The above exception was caused by the following exception: dagster._check.CheckError: Invariant failed. Description: Invalid asset dependencies: {AssetKey(['output1'])} specified in
internal_asset_deps
argument for multi-asset 'multi_asset_two' on key 'output21'. Each specified asset key must be associated with an input to the asset or produced by this asset. Valid keys: [AssetKey(['output21']), AssetKey(['output22'])] File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/errors.py", line 287, in user_code_error_boundary yield File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_grpc/server.py", line 242, in init loadable_targets = get_loadable_targets( ^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_grpc/utils.py", line 50, in get_loadable_targets else loadable_targets_from_python_module(module_name, working_directory) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/workspace/autodiscovery.py", line 35, in loadable_targets_from_python_module module = load_python_module( ^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/code_pointer.py", line 134, in load_python_module return importlib.import_module(module_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<frozen importlib._bootstrap>", line 1204, in _gcd_import File "<frozen importlib._bootstrap>", line 1176, in _find_and_load File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 690, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 940, in exec_module File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed File "/home/pmburu/repo/msc/uti/pipelines/__init__.py", line 8, in <module> from .assets.antibiotic_resistance import multi_asset_one, multi_asset_two, multi_asset_three File "/home/pmburu/repo/msc/uti/pipelines/assets/antibiotic_resistance.py", line 118, in <module> @multi_asset( ^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/decorators/asset_decorator.py", line 719, in inner check.invariant( File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_check/__init__.py", line 1612, in invariant raise CheckError(f"Invariant failed. Description: {desc}")
s

Scout

03/04/2024, 3:30 PM
The error message indicates that there is an issue with the
internal_asset_deps
argument in the
multi_asset_two
definition. Specifically, the error states that the asset keys specified in
internal_asset_deps
are not valid because they must be associated with an input to the asset or produced by the asset itself. In Dagster,
internal_asset_deps
is used to specify dependencies between outputs within the same multi-asset. However, in your case, you are trying to specify dependencies on assets produced by a different multi-asset (
multi_asset_one
). This is not the correct use of
internal_asset_deps
. To fix this issue, you should remove the
internal_asset_deps
argument and instead use the
ins
parameter to define inputs to
multi_asset_two
that depend on the outputs of
multi_asset_one
. However, since you mentioned that you do not want to pass inputs as arguments, you will need to ensure that the assets are part of the same job or repository so that Dagster can resolve the dependencies correctly. Here's how you can define
multi_asset_two
without using
internal_asset_deps
and without passing inputs as arguments: ```python from dagster import AssetOut, Output, multi_asset, AssetExecutionContext, AssetKey # First multi-asset @multi_asset( outs={ "output1": AssetOut(key="multi_asset_one_output1", is_required=False), "output2": AssetOut(key="multi_asset_one_output2", is_required=False), }, can_subset=True, ) def multi_asset_one(context: AssetExecutionContext): if "output1" in context.op_execution_context.selected_output_names: yield Output(value="value1_asset_one", output_name="output1") if "output2" in context.op_execution_context.selected_output_names: yield Output(value="value2_asset_one", output_name="output2") # Second multi-asset, depending on the first multi-asset @multi_asset( outs={ "output21": AssetOut(key="multi_asset_two_output21", is_required=False), "output22": AssetOut(key="multi_asset_two_output22", is_required=False), }, can_subset=True, ) def multi_asset_two(context: AssetExecutionContext): if "output21" in context.op_execution_context.selected_output_names: # Logic to retrieve the value from the first multi-asset's output1 # This logic will depend on your specific setup and how you manage asset storage # For example, you might retrieve the value from a database or a file system output1_value = "retrieved_value1" # Placeholder for the actual retrieval logic yield Output(value=f"processed_{output1_value}_asset_two", output_name="output21") if "output22" in context.op_execution_context.selected_output_names: # Logic to retrieve the value from the first multi-asset's output2 #
This logic will depend on your specific setup and how you manage asset storage output2_value = "retrieved_value2" # Placeholder for the actual retrieval logic yield Output(value=f"processed_{output2_value}_asset_two", output_name="output22") ``` In this revised example,
multi_asset_two
does not use
internal_asset_deps
and does not take inputs as arguments. Instead, it assumes that there is a way to retrieve the values of the first multi-asset's outputs within the execution context. The actual implementation of how you retrieve the asset values will depend on the specifics of your Dagster setup, such as the I/O manager you are using and how your assets are stored. You may need to access the outputs of
multi_asset_one
from the storage system directly, or use an I/O manager that can load the required asset values.
p

Peter Mburu

03/04/2024, 5:48 PM
@Scout def test_import_imf_data(self): with conditional_logging(): with patch( "dagster_pipelines.assets.ingestion.price.imf_ingestion.fetch_remote_imf_data" ) as mock_process, patch( "dagster_pipelines.assets.ingestion.price.imf_ingestion.get_data_source_documents_for_web_service" ) as mock_filter, patch( "dagster_pipelines.assets.ingestion.price.imf_ingestion.fetch_imf_metadata" ) as mock_metadata: mock_process.return_value.ok = True mock_process.return_value = self.mock_data mock_filter.return_value.ok = True mock_filter.return_value = self.datasourcedocument mock_metadata.return_value.ok = True mock_metadata.return_value = self.mock_metadata_df assets = [ imf_ingestion.get_imf_datasource_documents, imf_ingestion.get_imf_data_series, imf_ingestion.imf_metadata, imf_ingestion.remote_imf_data, ] result = materialize(assets, partition_key="2022-01-01") assert result.success output_asset2 = result.output_for_node("remote_imf_data") ====================================================================== ERROR [0.886s]: test_import_imf_data (dagster_pipelines.assets.ingestion.price.tests.test_pipelines.IMFIngestionPipelineTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/pmburu/git/fdw/dagster_pipelines/assets/ingestion/price/tests/test_pipelines.py", line 245, in test_import_imf_data output_asset2 = result.output_for_node("remote_imf_data") File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/execute_in_process_result.py", line 120, in output_for_node return super(ExecuteInProcessResult, self).output_for_node( File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/execution_result.py", line 101, in output_for_node origin_output_def, origin_handle = target_node_def.resolve_output_to_origin( File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/definitions/op_definition.py", line 329, in resolve_output_to_origin return self.output_def_named(output_name), handle File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/definitions/node_definition.py", line 152, in output_def_named raise DagsterInvariantViolationError(f"{self._name} has no output named {name}.") dagster._core.errors.DagsterInvariantViolationError: remote_imf_data has no output named result.
s

Scout

03/04/2024, 5:48 PM
The error message you're encountering indicates that the
output_for_node
method is being called with an output name (
"result"
) that does not exist for the node
"remote_imf_data"
. The
output_for_node
method is used to retrieve the output of a specific node (in this case, an asset) from the result of a pipeline run or asset materialization. Here's the relevant part of the error message:
Copy code
dagster._core.errors.DagsterInvariantViolationError: remote_imf_data has no output named result.
This error suggests that within the
remote_imf_data
asset, there is no output defined with the name
"result"
. To resolve this issue, you should: 1. Check the definition of the
remote_imf_data
asset to see what outputs it actually defines. Ensure that you are using the correct output name when calling
output_for_node
. 2. If you expect the
remote_imf_data
asset to have an output named
"result"
, you need to make sure that the asset is yielding an
Output
with
output_name="result"
. 3. If the
remote_imf_data
asset is supposed to yield multiple outputs, ensure that each
Output
is yielded with the correct
output_name
that matches what you are trying to retrieve with
output_for_node
. 4. If the
remote_imf_data
asset only yields a single output and does not specify an
output_name
, you can retrieve the output without specifying an output name, as Dagster will use the default output name. Here's an example of how you might modify the
remote_imf_data
asset to yield an output named `"result"`:
Copy code
python from dagster import Output, asset  @asset def remote_imf_data():     # ... your logic here ...     yield Output(value=some_value, output_name="result")
And here's how you would retrieve the output in your test:
Copy code
python output_asset2 = result.output_for_node("remote_imf_data", "result")
Make sure that the output name you use in
output_for_node
matches the
output_name
specified when yielding the
Output
in your asset definition. If the asset does not specify an output name, you can omit the output name in
output_for_node
, and it will default to the standard output name.
p

Peter Mburu

03/04/2024, 6:22 PM
@Scout Is there a way to access a single asset from multi-asset?
s

Scout

03/04/2024, 6:22 PM
Yes, you can access a single asset from a multi-asset in Dagster. When you define a multi-asset with the
@multi_asset
decorator, you specify multiple outputs, each potentially corresponding to a different asset. When you materialize or execute the multi-asset, you can access the individual assets by their output names. Here's a simplified example of how you might define a multi-asset and then access one of its assets:
Copy code
python from dagster import AssetOut, Output, multi_asset  @multi_asset(     outs={         "asset1": AssetOut(),         "asset2": AssetOut(),     } ) def my_multi_asset():     # Your logic to produce values for asset1 and asset2     yield Output(value="value_for_asset1", output_name="asset1")     yield Output(value="value_for_asset2", output_name="asset2")  # Later, when you materialize or execute the multi-asset, you can access the individual assets: result = materialize([my_multi_asset]) value_for_asset1 = result.output_for_node("my_multi_asset", "asset1") value_for_asset2 = result.output_for_node("my_multi_asset", "asset2")
In this example,
my_multi_asset
is a multi-asset that yields two outputs, each corresponding to a different asset. When you call
materialize([my_multi_asset])
, both assets are materialized, and you can access the output of each asset using the
output_for_node
method on the result object, specifying the multi-asset's name and the output name. If you only want to materialize a subset of the assets defined in a multi-asset, you can use the
AssetSelection
API to specify which assets you want to materialize. Here's an example of how you might do that:
Copy code
python from dagster import AssetSelection, materialize  # Assuming 'my_multi_asset' is defined as above  # Materialize only 'asset1' from the multi-asset result = materialize(     AssetSelection.assets(my_multi_asset).with_output("asset1") ) value_for_asset1 = result.output_for_node("my_multi_asset", "asset1")
In this example,
AssetSelection.assets(my_multi_asset).with_output("asset1")
specifies that only the output named
"asset1"
from the multi-asset
my_multi_asset
should be materialized. Remember that the actual implementation of accessing individual assets from a multi-asset will depend on your specific use case and the way you have set up your Dagster assets and materialization process.
p

Peter Mburu

03/04/2024, 6:30 PM
@Scout ERROR [0.004s]: test_import_imf_data (dagster_pipelines.assets.ingestion.price.tests.test_pipelines.IMFIngestionPipelineTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/pmburu/git/fdw/dagster_pipelines/assets/ingestion/price/tests/test_pipelines.py", line 244, in test_import_imf_data AssetSelection.assets(remote_imf_data).with_output("gpd_remote_data") File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/pydantic/main.py", line 767, in getattr raise AttributeError(f'{type(self).name!r} object has no attribute {item!r}') AttributeError: 'KeysAssetSelection' object has no attribute 'with_output' ---------------------------------------------------------------------- Ran 1 test in 0.557s
s

Scout

03/04/2024, 6:30 PM
The error you're encountering suggests that there's an attempt to use a method
with_output
on an object of type
KeysAssetSelection
, which does not have this method. The
AssetSelection
API in Dagster is used to select specific assets for materialization, but the way it's being used in the code snippet from the error message seems to be incorrect. As of my knowledge cutoff date, Dagster does not have a
with_output
method on the
AssetSelection
class. Instead, you should use the
AssetSelection
API to select assets by their keys. If you want to materialize a specific asset or a subset of assets, you can use the
AssetSelection
to specify which assets to include. Here's an example of how you might correctly use
AssetSelection
to select a specific asset for materialization:
Copy code
python from dagster import AssetSelection, materialize  # Assuming 'remote_imf_data' is a multi-asset with an output asset named 'gpd_remote_data'  # Select the specific asset you want to materialize selected_assets = AssetSelection.keys("gpd_remote_data")  # Materialize the selected asset result = materialize(selected_assets)
In this example,
AssetSelection.keys("gpd_remote_data")
is used to create a selection of assets by their keys. You would replace
"gpd_remote_data"
with the actual key of the asset you want to materialize. If you're trying to materialize a specific output from a multi-asset, you would typically not need to use
AssetSelection
for that purpose. Instead, you would materialize the multi-asset and then access the specific output you're interested in from the result. If you continue to encounter issues, please ensure that you are using the latest version of Dagster and refer to the most recent documentation for any updates or changes to the API. If the issue persists, you may need to seek further assistance from the Dagster community or support channels.
p

Peter Mburu

03/04/2024, 6:48 PM
@Scout ERROR [0.004s]: test_import_imf_data (dagster_pipelines.assets.ingestion.price.tests.test_pipelines.IMFIngestionPipelineTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/pmburu/git/fdw/dagster_pipelines/assets/ingestion/price/tests/test_pipelines.py", line 252, in test_import_imf_data result = materialize(assets, partition_key="2022-01-01") File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/definitions/materialize.py", line 88, in materialize assets = check.sequence_param(assets, "assets", of_type=(AssetsDefinition, SourceAsset)) File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_check/__init__.py", line 1125, in sequence_param return _check_iterable_items(obj, of_type, "sequence") File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_check/__init__.py", line 1768, in _check_iterable_items raise CheckError( dagster._check.CheckError: Member of sequence mismatches type. Expected (<class 'dagster._core.definitions.assets.AssetsDefinition'>, <class 'dagster._core.definitions.source_asset.SourceAsset'>). Got KeysAssetSelection(selected_keys=[AssetKey(['gpd_remote_data'])]) of type <class 'dagster._core.definitions.asset_selection.KeysAssetSelection'>. ---------------------------------------------------------------------- Ran 1 test in 0.534s FAILED (errors=1) Generating XML reports... Preserving test database for alias 'default'...
s

Scout

03/04/2024, 6:48 PM
The error message indicates that the
materialize
function is expecting a sequence of
AssetsDefinition
or
SourceAsset
objects, but it received an object of type
KeysAssetSelection
instead. The
materialize
function is used to materialize assets, and it requires a list of asset definitions or source assets as its argument. The
KeysAssetSelection
object is part of the
AssetSelection
API, which is used to select specific assets by their keys. However, it seems that the
KeysAssetSelection
object is being passed directly to the
materialize
function, which is not the correct usage. To resolve this issue, you need to pass the actual asset definitions that you want to materialize to the
materialize
function. If you have a list of asset keys and want to materialize those specific assets, you need to ensure that you are passing the corresponding asset definitions. Here's an example of how you might correctly use the
materialize
function with asset definitions:
Copy code
python from dagster import materialize  # Assuming you have asset definitions like the following: # asset1, asset2, ...  # You would pass the asset definitions directly to the materialize function result = materialize([asset1, asset2], partition_key="2022-01-01")
In this example,
asset1
,
asset2
, etc., are the asset definitions that you want to materialize. You would replace these with the actual asset definitions that correspond to the asset keys you are interested in. If you are trying to materialize assets based on a selection of keys, you need to convert those keys into the corresponding asset definitions before calling
materialize
. The
AssetSelection
API can be used to filter assets within a repository or job, but the
materialize
function requires the actual asset definitions. Please ensure that you are using the latest version of Dagster and refer to the most recent documentation for any updates or changes to the API. If the issue persists, you may need to seek further assistance from the Dagster community or support channels.
p

Peter Mburu

03/04/2024, 6:59 PM
@Scout how i get actual asset from KeysAssetSelection
s

Scout

03/04/2024, 6:59 PM
To get the actual assets from a
KeysAssetSelection
, you would typically be working within the context of a Dagster repository where your assets are defined. The
KeysAssetSelection
object is used to create a selection of assets based on their keys, but to materialize those assets or to work with their definitions, you need to resolve them to the actual
AssetsDefinition
objects within your repository. As of my knowledge cutoff date, the documentation does not provide a direct method to convert a
KeysAssetSelection
into a list of
AssetsDefinition
objects for use with the
materialize
function. However, if you have access to the repository definition, you may be able to retrieve the
AssetsDefinition
objects by their keys. Here's a conceptual example of how you might retrieve
AssetsDefinition
objects from a repository using the asset keys:
Copy code
python from dagster import AssetKey, RepositoryDefinition  # Assuming you have a repository definition repository: RepositoryDefinition = ...  # And you have a KeysAssetSelection with some asset keys keys_asset_selection = AssetSelection.keys(AssetKey("my_asset1"), AssetKey("my_asset2"))  # You would retrieve the actual asset definitions from the repository using the asset keys assets_defs = [     repository.get_asset_def(asset_key)     for asset_key in keys_asset_selection.selected_keys ]  # Now you can use the assets_defs with the materialize function result = materialize(assets_defs, partition_key="2022-01-01")
In this example,
repository.get_asset_def(asset_key)
is a placeholder for the method you would use to retrieve the
AssetsDefinition
object for a given
AssetKey
from your repository. You would need to replace this with the actual method provided by your repository or the context in which you are working. Please note that the actual implementation may vary based on the version of Dagster you are using and the structure of your repository. If you are unable to find a method to retrieve
AssetsDefinition
objects by their keys, you may need to manually define a list of the assets you want to materialize based on your knowledge of the repository's structure. If you need further assistance, please refer to the latest Dagster documentation or seek help from the Dagster community.
p

Peter Mburu

03/04/2024, 7:02 PM
@Scout fix the code below: def test_import_imf_data(self): with conditional_logging(): with patch( "dagster_pipelines.assets.ingestion.price.imf_ingestion.fetch_remote_imf_data" ) as mock_process, patch( "dagster_pipelines.assets.ingestion.price.imf_ingestion.get_data_source_documents_for_web_service" ) as mock_filter, patch( "dagster_pipelines.assets.ingestion.price.imf_ingestion.fetch_imf_metadata" ) as mock_metadata: mock_process.return_value.ok = True mock_process.return_value = self.mock_data mock_filter.return_value.ok = True mock_filter.return_value = self.datasourcedocument mock_metadata.return_value.ok = True mock_metadata.return_value = self.mock_metadata_df assets = [ get_imf_datasource_documents, get_imf_data_series, imf_metadata, AssetSelection.assets(remote_imf_data).keys("gpd_remote_data") ] result = materialize(assets, partition_key="2022-01-01")
s

Scout

03/04/2024, 7:02 PM
The code snippet you've provided is attempting to include an
AssetSelection
object directly in the list of assets to be materialized. However, the
materialize
function expects a list of asset definitions, not an
AssetSelection
object. To fix the code, you need to ensure that you are passing a list of asset definitions to the
materialize
function. Here's a revised version of the test function:
Copy code
python
from dagster import materialize, AssetKey

def test_import_imf_data(self):
    with conditional_logging():
        with patch(
            "dagster_pipelines.assets.ingestion.price.imf_ingestion.fetch_remote_imf_data"
        ) as mock_process, patch(
            "dagster_pipelines.assets.ingestion.price.imf_ingestion.get_data_source_documents_for_web_service"
        ) as mock_filter, patch(
            "dagster_pipelines.assets.ingestion.price.imf_ingestion.fetch_imf_metadata"
        ) as mock_metadata:
            mock_process.return_value.ok = True
            mock_process.return_value = self.mock_data
            mock_filter.return_value.ok = True
            mock_filter.return_value = self.datasourcedocument
            mock_metadata.return_value.ok = True
            mock_metadata.return_value = self.mock_metadata_df
            
            # Assuming get_imf_datasource_documents, get_imf_data_series, and imf_metadata
            # are asset definitions, and remote_imf_data is a multi_asset definition.
            # You need to pass the asset definitions directly to the materialize function.
            # If you want to materialize a specific asset from a multi_asset, you need to
            # get the asset definition for that specific asset.
            
            # Assuming 'gpd_remote_data' is the key for the asset you want to materialize
            # from the multi_asset 'remote_imf_data', you would retrieve the asset definition
            # from your repository. The following line is a placeholder for the actual retrieval:
            gpd_remote_data_asset = repository.get_asset_def(AssetKey("gpd_remote_data"))
            
            # Now you can include the asset definition in the list of assets to materialize
            assets = [
                get_imf_datasource_documents,
                get_imf_data_series,
                imf_metadata,
                gpd_remote_data_asset  # This should be the actual asset definition
            ]
            
            # Materialize the assets
            result = materialize(assets, partition_key="2022-01-01")
            assert result.success
In this revised code: - We assume that
get_imf_datasource_documents
,
get_imf_data_series
, and
imf_metadata
are asset definitions that you want to materialize. - We replace the
AssetSelection
object with the actual asset d
efinition for the asset keyed by
"gpd_remote_data"
. You need to retrieve this asset definition from your repository, which typically involves accessing the repository object and calling a method like
get_asset_def
with the
AssetKey
. - The placeholder
repository.get_asset_def(AssetKey("gpd_remote_data"))
should be replaced with the actual code to retrieve the asset definition from your repository. Please note that the actual method to retrieve an asset definition from a repository may vary based on your specific setup and the version of Dagster you are using. If you need further assistance, please refer to the latest Dagster documentation or seek help from the Dagster community.
12 Views