< Scout> Explain the use of Output dagster #ask-ai

Join Slack

<@U0667DNC02Y> Explain the use of Output

# ask-ai

Peter Mburu

03/04/2024, 12:08 PM

@Scout Explain the use of Output

Scout

03/04/2024, 12:08 PM

The

Output

event in Dagster is a critical component for passing data between steps in a pipeline. When you define a computation or a step within a pipeline using the

@op

decorator, you can use the

Output

object to send the result of that computation to subsequent steps. Here's how you can use `Output`: 1. Returning Output Directly: For many use cases, you can return a value directly from an op, and Dagster will handle it as an output. This is the simplest form of creating an output.

Copy code

python @op def my_output_op():     return 5

2. Using Output Object: If you need additional functionality, such as specifying output metadata or conditional branching, you can use the

Output

object. This object can be either returned or yielded from an op, and it allows you to attach metadata to the output or to specify the output's name if there are multiple outputs.

Copy code

python from dagster import Output, op  @op def my_output_op() -> Output:     return Output("some_value", metadata={"some_metadata": "a_value"})

3. Parameterized Type Annotations: When using the

Output

object, you can also provide type annotations to ensure type safety and clarity in your pipeline's design.

Copy code

python from dagster import Output, op  @op def my_output_generic_op() -> Output[int]:     return Output(5, metadata={"some_metadata": "a_value"})

4. Yielding Multiple Outputs: If an op has multiple outputs, you can yield

Output

objects with specified output names. When yielding outputs, type annotations cannot be used directly, but you can specify type information using the

out

argument of the op decorator.

Copy code

python from dagster import Output, op, Out  @op(out={"out1": Out(str), "out2": Out(int)}) def my_op_yields():     yield Output("foo", output_name="out1")     yield Output(5, output_name="out2")

5. Attaching Metadata to Outputs: You can attach metadata to an output to log information specific to that output. This is done using the

metadata

parameter of the

Output

object. The

Output

event is a flexible and powerful way to manage data flow and metadata within Dagster pipelines, providing clear interfaces for single and multiple outputs, type safety, and additional metadata capabilities.

Peter Mburu

03/04/2024, 12:19 PM

@Scout ====================================================================== ERROR [0.867s]: test_import_imf_data (dagster_pipelines.assets.ingestion.price.tests.test_pipelines.IMFIngestionPipelineTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/pmburu/git/fdw/dagster_pipelines/assets/ingestion/price/tests/test_pipelines.py", line 243, in test_import_imf_data result = materialize_to_memory(assets) File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/definitions/materialize.py", line 193, in materialize_to_memory return materialize( File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/definitions/materialize.py", line 103, in materialize return check.not_none( File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/definitions/job_definition.py", line 712, in execute_in_process return core_execute_in_process( File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/execute_in_process.py", line 76, in core_execute_in_process event_list = list(execute_run_iterable) File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/api.py", line 875, in iter yield from self.iterator( File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/api.py", line 764, in job_execution_iterator for event in job_context.executor.execute(job_context, execution_plan): File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/executor/in_process.py", line 55, in execute yield from iter( File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/api.py", line 875, in iter yield from self.iterator( File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/executor/in_process.py", line 26, in inprocess_execution_iterator yield from inner_plan_execution_iterator( File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/plan/execute_plan.py", line 121, in inner_plan_execution_iterator for step_event in check.generator( File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/plan/execute_plan.py", line 385, in dagster_event_sequence_for_step raise error File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/plan/execute_plan.py", line 286, in dagster_event_sequence_for_step for step_event in check.generator(step_events): File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/plan/execute_step.py", line 519, in core_dagster_event_sequence_for_step for user_event in _step_output_error_checked_user_event_sequence( File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/plan/execute_step.py", line 201, in _step_output_error_checked_user_event_sequence for user_event in user_event_sequence: File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/plan/execute_step.py", line 99, in _process_asset_results_to_events for user_event in user_event_sequence: File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/plan/compute.py", line 214, in execute_core_compute for step_output in _yield_compute_results(step_context, inputs, compute_fn, compute_context): File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/plan/compute.py", line 183, in _yield_compute_results for event in iterate_with_context( File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_utils/__init__.py", line 467, in iterate_with_context next_output = next(iterator) File "/home/pmburu/git/fdw/dagster_pipelines/assets/ingestion/price/imf_ingestion.py", line 532, in remote_imf_data def fetch_and_yield_imf_data(metadata_df, document_type, indicators, out_name, period_date=context.partition_key): File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/context/compute.py", line 1690, in partition_key return self.op_execution_context.partition_key File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/context/compute.py", line 306, in partition_key return self._step_execution_context.partition_key File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/context/system.py", line 391, in partition_key raise DagsterInvariantViolationError( dagster._core.errors.DagsterInvariantViolationError: Cannot access partition_key for a non-partitioned run ---------------------------------------------------------------------- Ran 1 test in 1.407s

Scout

03/04/2024, 12:19 PM

The error you're encountering indicates that there is an attempt to access the

partition_key

property within a non-partitioned run. The

partition_key

is only relevant when you are working with partitioned assets or runs, which are a way to slice your data processing into discrete chunks that can be processed independently, often corresponding to logical timeframes like days or months. Here's a breakdown of the error message: 1. The

partition_key

property is being accessed within the

remote_imf_data

function. 2. The

partition_key

is being accessed through the

context

object, which is provided to the function during execution. 3. The error is raised because the current run is not partitioned, and therefore, the

partition_key

property is not set and should not be accessed. To resolve this issue, you have a few options: - If the run is supposed to be partitioned, ensure that you are setting up the partitions correctly in your asset or pipeline definition and that you are initiating a partitioned run. - If the run is not supposed to be partitioned, you should modify the

remote_imf_data

function to avoid accessing the

partition_key

property, or handle the case where it is not available. Here's an example of how you might modify the

remote_imf_data

function to handle the absence of a `partition_key`:

Copy code

python @op def remote_imf_data(context, metadata_df, document_type, indicators, out_name):     if context.has_partition_key:         period_date = context.partition_key         # ... use period_date in your logic     else:         # ... handle the non-partitioned case

In this example,

context.has_partition_key

is a made-up property for illustration purposes. You would need to implement a check to determine whether the

partition_key

is available in your specific context. Remember to review your pipeline or asset definitions to ensure that partitioning is set up correctly if it is required for your use case. If you continue to face issues, you may want to look into the documentation or source code related to partitioning in Dagster or seek help from the community or support channels.

Peter Mburu

03/04/2024, 12:22 PM

@Scout I'm using materialize_to_memory for testing. How do i pass the partition_key?

Scout

03/04/2024, 12:22 PM

When using

materialize_to_memory

for testing partitioned assets in Dagster, you can pass the

partition_key

as an argument to specify which partition you want to materialize. This is useful when you want to test the behavior of assets under different partitioned conditions. Here's an example of how you might use

materialize_to_memory

with a `partition_key`:

Copy code

python from dagster import materialize_to_memory  result = materialize_to_memory(     assets=[your_asset],     partition_key="2023-01-01"  # Replace with the appropriate partition key )

In this example,

your_asset

is the asset you want to test, and

"2023-01-01"

is the partition key that corresponds to the partition you want to materialize. Replace

"2023-01-01"

with the actual partition key that is relevant to your test. If you encounter issues with passing the

partition_key

, ensure that: 1. The asset you are testing is indeed set up with a partitioning scheme. 2. The

partition_key

you are passing matches one of the defined partitions for the asset. 3. The asset's dependencies, if any, are also compatible with the specified

partition_key

. If you continue to face issues, it may be helpful to refer to the documentation for the most up-to-date usage patterns and examples related to partitioned assets and testing with

materialize_to_memory

. Additionally, you can check for any relevant discussions or issues in the Dagster community that might address similar testing scenarios.

Peter Mburu

03/04/2024, 12:24 PM

@Scout How do i test multi-asset in testing

Scout

03/04/2024, 12:24 PM

Testing a graph of assets in Dagster involves verifying that the assets work correctly together, both individually and as a whole. To test multiple assets, you can group them into a list and then pass that list to the

materialize

function during testing. This will execute the assets and allow you to inspect the results. Here's a step-by-step guide on how to test multiple assets: 1. Import the assets: Import all the assets you want to test into your test file. 2. Group the assets: Create a list that contains all the assets you want to test together. 3. Materialize the assets: Use the

materialize

function to execute the assets. This function returns an

ExecuteInProcessResult

object. 4. Assert the results: Use the

ExecuteInProcessResult

object to check the success of the execution and to retrieve the outputs of the assets for further assertions. Here's an example of how you might write a test for a group of assets:

Copy code

python from dagster import materialize from your_project.assets import asset1, asset2, asset3  def test_assets():     assets = [asset1, asset2, asset3]     result = materialize(assets)     assert result.success      # If you want to assert on the output of specific assets, you can do so like this:     output_asset1 = result.output_for_node("asset1")     output_asset2 = result.output_for_node("asset2")     # ... and so on for each asset you want to check.      # Perform your assertions on the outputs     assert output_asset1 is not None     assert output_asset2 == expected_value     # ... and so on for each asset you want to check.

In this example,

asset1

asset2

, and

asset3

are the assets you are testing. The

output_for_node

method is used to retrieve the output of each asset by its name. You can then perform assertions on these outputs to ensure they meet your expectations. Remember to replace

your_project.assets

with the actual module path where your assets are defined, and

asset1

asset2

asset3

with the actual names of your assets. To run the tests, you can use a test runner like

pytest

. Navigate to the directory containing your test file and run the test runner command (e.g.,

pytest test_assets.py

). Always refer to the documentation for the latest best practices and examples on testing assets in Dagster.

Peter Mburu

03/04/2024, 12:42 PM

@Scout Explain the code below: from dagster import AssetOut, Output, multi_asset @multi_asset( outs={ "a": AssetOut(is_required=False), "b": AssetOut(is_required=False), }, can_subset=True, ) def split_actions(context: AssetExecutionContext): if "a" in context.op_execution_context.selected_output_names: yield Output(value=123, output_name="a") if "b" in context.op_execution_context.selected_output_names: yield Output(value=456, output_name="b")

Scout

03/04/2024, 12:42 PM

The code snippet you've provided defines a

multi_asset

in Dagster, which is a function that can produce multiple assets as outputs. Let's break down the code: 1. Imports: The code imports

AssetOut

Output

, and

multi_asset

from the

dagster

module. These are used to define the outputs of the multi-asset and to yield values for those outputs. 2. Multi-Asset Definition: The

@multi_asset

decorator is used to define a multi-asset function named

split_actions

. This function can produce two separate assets, named "a" and "b". 3. Outputs Specification: - The

outs

parameter of the

@multi_asset

decorator is a dictionary that defines the outputs of the multi-asset. Each key in the dictionary corresponds to an output name, and the value is an

AssetOut

object. -

AssetOut(is_required=False)

indicates that the corresponding output asset is optional. This means that the multi-asset function can complete successfully without necessarily yielding a value for these outputs. 4. Subsetting: The

can_subset=True

parameter allows for subsetting, which means that when this multi-asset is executed, it can produce a subset of its defined outputs based on the selection criteria provided at runtime. 5. Multi-Asset Function: - The

split_actions

function takes an

AssetExecutionContext

object as its context parameter. This context provides system information such as resources, config, and logging, and it is specific to asset execution. - Inside the function, there are conditional checks to determine which outputs have been selected for execution using

context.op_execution_context.selected_output_names

. This attribute contains the names of the outputs that have been selected for materialization. - If "a" is in the selected output names, the function yields an

Output

with the value

and the

output_name

"a". Similarly, if "b" is in the selected output names, it yields an

Output

with the value

and the

output_name

"b". The

yield

statements are used to emit the values for each asset. When the function is executed, it will check which outputs have been selected and only produce those that are required. This allows for flexibility in asset generation and can be particularly useful when different outputs are needed under different circumstances or based on different configurations. In summary, this code defines a multi-asset in Dagster that can optionally produce two separate assets, "a" and "b", based on the outputs selected at runtime. The values of these assets are

and

, respectively.

Peter Mburu

03/04/2024, 12:46 PM

@Scout write sample code that has 3 multi-assets and use selected_output_names

Scout

03/04/2024, 12:46 PM

Certainly! Below is an example of how you might define three multi-assets using the

selected_output_names

to conditionally yield outputs. Each multi-asset will have two potential outputs, "output1" and "output2", and the yielding of these outputs will depend on whether they have been selected for execution.

Copy code

python from dagster import AssetOut, Output, multi_asset, AssetExecutionContext  # First multi-asset @multi_asset(     outs={         "output1": AssetOut(is_required=False),         "output2": AssetOut(is_required=False),     },     can_subset=True, ) def multi_asset_one(context: AssetExecutionContext):     if "output1" in context.op_execution_context.selected_output_names:         yield Output(value="value1_asset_one", output_name="output1")     if "output2" in context.op_execution_context.selected_output_names:         yield Output(value="value2_asset_one", output_name="output2")  # Second multi-asset @multi_asset(     outs={         "output1": AssetOut(is_required=False),         "output2": AssetOut(is_required=False),     },     can_subset=True, ) def multi_asset_two(context: AssetExecutionContext):     if "output1" in context.op_execution_context.selected_output_names:         yield Output(value="value1_asset_two", output_name="output1")     if "output2" in context.op_execution_context.selected_output_names:         yield Output(value="value2_asset_two", output_name="output2")  # Third multi-asset @multi_asset(     outs={         "output1": AssetOut(is_required=False),         "output2": AssetOut(is_required=False),     },     can_subset=True, ) def multi_asset_three(context: AssetExecutionContext):     if "output1" in context.op_execution_context.selected_output_names:         yield Output(value="value1_asset_three", output_name="output1")     if "output2" in context.op_execution_context.selected_output_names:         yield Output(value="value2_asset_three", output_name="output2")

In this example, each multi-asset function checks the

selected_output_names

to determine which outputs have been selected for execution. If an output name is present in the

selected_output_names

, the function yields an

Output

with a corresponding value and output name. When you execute these multi-assets, you can specify which outputs you want to materialize by providing the selected output names. This allows you to subset the outputs and only materialize the ones you need. The

can_subset

parameter is set to

True

to enable this behavior. Please note that the actual values and output names are placeholders and should be replaced with values that make sense for your specific use case.

Peter Mburu

03/04/2024, 12:48 PM

@Scout Update so that Third multi-asset depends on second multi-asset which depend on ist multi-asset

Scout

03/04/2024, 12:48 PM

To create dependencies between multi-assets in Dagster, you can use the

AssetIn

and

AssetOut

classes to define the inputs and outputs of each asset. The following example demonstrates how to define three multi-assets where the third multi-asset depends on the second, and the second depends on the first:

Copy code

python
from dagster import AssetIn, AssetOut, Output, multi_asset, AssetExecutionContext

# First multi-asset
@multi_asset(
    outs={
        "output1": AssetOut(is_required=False),
        "output2": AssetOut(is_required=False),
    },
    can_subset=True,
)
def multi_asset_one(context: AssetExecutionContext):
    if "output1" in context.op_execution_context.selected_output_names:
        yield Output(value="value1_asset_one", output_name="output1")
    if "output2" in context.op_execution_context.selected_output_names:
        yield Output(value="value2_asset_one", output_name="output2")

# Second multi-asset, depending on the first multi-asset
@multi_asset(
    ins={"input_from_asset_one": AssetIn(key="multi_asset_one")},
    outs={
        "output1": AssetOut(is_required=False),
        "output2": AssetOut(is_required=False),
    },
    can_subset=True,
)
def multi_asset_two(context: AssetExecutionContext, input_from_asset_one):
    if "output1" in context.op_execution_context.selected_output_names:
        yield Output(value=f"processed_{input_from_asset_one}_asset_two", output_name="output1")
    if "output2" in context.op_execution_context.selected_output_names:
        yield Output(value=f"processed_{input_from_asset_one}_asset_two", output_name="output2")

# Third multi-asset, depending on the second multi-asset
@multi_asset(
    ins={"input_from_asset_two": AssetIn(key="multi_asset_two")},
    outs={
        "output1": AssetOut(is_required=False),
        "output2": AssetOut(is_required=False),
    },
    can_subset=True,
)
def multi_asset_three(context: AssetExecutionContext, input_from_asset_two):
    if "output1" in context.op_execution_context.selected_output_names:
        yield Output(value=f"processed_{input_from_asset_two}_asset_three", output_name="output1")
    if "output2" in context.op_execution_context.selected_output_names:
        yield Output(value=f"processed_{input_from_asset_two}_asset_three", output_name="output2")

In this updated example: -

multi_asset_one

remains unchanged and serves as the base asset. -

multi_asset_two

now has an input named

input_from_asset_one

, which depends on an output from

multi_asset_one

. The

AssetIn

object is used to define this dependency, and the

key

parameter specifies the asset key that

multi_asset_two

depends on. -

multi_asset_three

has an input named

input_from_asset_two

, which depends on an output from

multi_asset_two

. Similarly, the

AssetIn

object wit

Scout

03/04/2024, 12:49 PM

h the

key

parameter is used to define this dependency. When you execute these assets, Dagster will ensure that

multi_asset_one

runs first, followed by

multi_asset_two

, and finally

multi_asset_three

, respecting the dependencies defined between them. The placeholder strings

processed_{input_from_asset_one}_asset_two

and

processed_{input_from_asset_two}_asset_three

are used to indicate that the output values of

multi_asset_two

and

multi_asset_three

are derived from their respective inputs. You should replace these placeholders with actual processing logic relevant to your use case.

Peter Mburu

03/04/2024, 1:01 PM

@Scout from dagster import AssetIn, AssetOut, Output, multi_asset, AssetExecutionContext # First multi-asset @multi_asset( outs={ "output1": AssetOut(is_required=False), "output2": AssetOut(is_required=False), }, can_subset=True, ) def multi_asset_one(context: AssetExecutionContext): if "output1" in context.op_execution_context.selected_output_names: yield Output(value="value1_asset_one", output_name="output1") if "output2" in context.op_execution_context.selected_output_names: yield Output(value="value2_asset_one", output_name="output2") # Second multi-asset, depending on the first multi-asset @multi_asset( ins={"input_from_asset_one": AssetIn(key="multi_asset_one")}, outs={ "output21": AssetOut(is_required=False), "output22": AssetOut(is_required=False), }, can_subset=True, ) def multi_asset_two(context: AssetExecutionContext, input_from_asset_one): if "output21" in context.op_execution_context.selected_output_names: yield Output(value=f"processed_{input_from_asset_one}_asset_two", output_name="output21") if "output22" in context.op_execution_context.selected_output_names: yield Output(value=f"processed_{input_from_asset_one}_asset_two", output_name="output22") # Third multi-asset, depending on the second multi-asset @multi_asset( ins={"input_from_asset_two": AssetIn(key="multi_asset_two")}, outs={ "output31": AssetOut(is_required=False), "output32": AssetOut(is_required=False), }, can_subset=True, ) def multi_asset_three(context: AssetExecutionContext, input_from_asset_two): if "output31" in context.op_execution_context.selected_output_names: yield Output(value=f"processed_{input_from_asset_two}_asset_three", output_name="output31") if "output32" in context.op_execution_context.selected_output_names: yield Output(value=f"processed_{input_from_asset_two}_asset_three", output_name="output32") fails with: 2024-03-04 155813 +0300 - dagster.daemon.SchedulerDaemon - WARNING - Could not load location pipelines to check for schedules due to the following error: dagster._core.errors.DagsterInvalidDefinitionError: Input asset '["multi_asset_one"]' for asset '["output22"]' is not produced by any of the provided asset ops and is not one of the provided sources Stack Trace: File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_grpc/server.py", line 408, in init self._loaded_repositories: Optional[LoadedRepositories] = LoadedRepositories( ^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_grpc/server.py", line 242, in init loadable_targets = get_loadable_targets( ^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_grpc/utils.py", line 50, in get_loadable_targets else loadable_targets_from_python_module(module_name, working_directory) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/workspace/autodiscovery.py", line 35, in loadable_targets_from_python_module module = load_python_module( ^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/code_pointer.py", line 134, in load_python_module return importlib.import_module(module_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<frozen importlib._bootstrap>", line 1204, in _gcd_import File "<frozen importlib._bootstrap>", line 1176, in _find_and_load File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 690, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 940, in exec_module File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed File "/home/pmburu/repo/msc/uti/pipelines/__init__.py", line 18, in <module> defs = Definitions( ^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/definitions_class.py", line 433, in init self._created_pending_or_normal_repo = _create_repository_using_definitions_args( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/definitions_class.py", line 297, in _create_repository_using_definitions_args @repository( ^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/decorators/repository_decorator.py", line 163, in call else CachingRepositoryData.from_list( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/repository_definition/repository_data.py", line 370, in from_list return build_caching_repository_data_from_list( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/repository_definition/repository_data_builder.py", line 237, in build_caching_repository_data_from_list for job_def in get_base_asset_jobs( ^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/assets_job.py", line 85, in get_base_asset_jobs build_assets_job( File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/assets_job.py", line 213, in build_assets_job resolved_asset_deps = ResolvedAssetDependencies(assets, resolved_source_assets) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/resolved_asset_deps.py", line 22, in init self._deps_by_assets_def_id = resolve_assets_def_deps(assets_defs, source_assets) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/resolved_asset_deps.py", line 105, in resolve_assets_def_deps raise DagsterInvalidDefinitionError( 2024-03-04 155813 +0300 - dagster.code_server - ERROR - Error while importing code Traceback (most recent call last): File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_grpc/server.py", line 408, in init self._loaded_repositories: Optional[LoadedRepositories] = LoadedRepositories( ^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_grpc/server.py", line 242, in init loadable_targets = get_loadable_targets( ^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_grpc/utils.py", line 50, in get_loadable_targets else loadable_targets_from_python_module(module_name, working_directory) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/workspace/autodiscovery.py", line 35, in loadable_targets_from_python_module module = load_python_module( ^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/code_pointer.py", line 134, in load_python_module return importlib.import_module(module_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<frozen importlib._bootstrap>", line 1204, in _gcd_import File "<frozen importlib._bootstrap>", line 1176, in _find_and_load File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 690, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 940, in exec_module File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed File "/home/pmburu/repo/msc/uti/pipelines/__init__.py", line 18, in <module> defs = Definitions( ^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/definitions_class.py", line 433, in init self._created_pending_or_normal_repo = _create_repository_using_definitions_args( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/definitions_class.py", line 297, in _create_repository_using_definitions_args @repository( ^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/decorators/repository_decorator.py", line 163, in call else CachingRepositoryData.from_list( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/repository_definition/repository_data.py", line 370, in from_list return build_caching_repository_data_from_list( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/repository_definition/repository_data_builder.py", line 237, in build_caching_repository_data_from_list for job_def in get_base_asset_jobs( ^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/assets_job.py", line 85, in get_base_asset_jobs build_assets_job( File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/assets_job.py", line 213, in build_assets_job resolved_asset_deps = ResolvedAssetDependencies(assets, resolved_source_assets) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/resolved_asset_deps.py", line 22, in init self._deps_by_assets_def_id = resolve_assets_def_deps(assets_defs, source_assets) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/resolved_asset_deps.py", line 105, in resolve_assets_def_deps raise DagsterInvalidDefinitionError( dagster._core.errors.DagsterInvalidDefinitionError: Input asset '["multi_asset_one"]' for asset '["output21"]' is not produced by any of the provided asset ops and is not one of the provided sources /home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/workspace/context.py622 UserWarning: Error loading repository location pipelinesdagster. core.errors.DagsterInvalidDefinitionError Input asset '["multi_asset_one"]' for asset '["output21"]' is not produced by any of the provided asset ops and is not one of the provided sources

Scout

03/04/2024, 1:01 PM

The error message you're encountering indicates that there is a problem with the way the assets are defined and how they are supposed to be connected. Specifically, the error states that the input asset

["multi_asset_one"]

for asset

["output21"]

is not produced by any of the provided asset ops and is not one of the provided sources. This error typically occurs when there is a mismatch between the asset keys used in the

AssetIn

definitions and the actual asset keys that are produced by the multi-assets. In Dagster, asset keys are used to uniquely identify assets and to define dependencies between them. To resolve this issue, you need to ensure that the asset keys specified in the

AssetIn

definitions match the keys of the assets that are actually produced. Here are some steps to troubleshoot and fix the problem: 1. Check Asset Keys: Ensure that the asset keys you are using in the

AssetIn

definitions match the keys of the assets produced by the corresponding multi-assets. Asset keys are usually derived from the names of the assets, but they can also be explicitly specified. 2. Define Asset Keys Explicitly: If you haven't already, you may want to define asset keys explicitly using the

key

parameter of the

AssetOut

class. This can help avoid any ambiguity about which assets are being referred to. 3. Review Asset Definitions: Go through your asset definitions to make sure that each asset is correctly producing the outputs that other assets depend on. 4. Check Repository Definition: Ensure that all assets are included in the repository definition and that there are no missing assets that could be causing the error. Here's an example of how you might explicitly define asset keys to ensure that dependencies are correctly resolved: ```python from dagster import AssetIn, AssetOut, Output, multi_asset, AssetExecutionContext # First multi-asset @multi_asset( outs={ "output1": AssetOut(key="asset_one_output1", is_required=False), "output2": AssetOut(key="asset_one_output2", is_required=False), }, can_subset=True, ) def multi_asset_one(context: AssetExecutionContext): # ... (same as before) # Second multi-asset, depending on the first multi-asset @multi_asset( ins={"input_from_asset_one": AssetIn(key="asset_one_output1")}, outs={ "output21": AssetOut(key="asset_two_output21", is_required=False), "output22": AssetOut(key="asset_two_output22", is_required=False), }, can_subset=True, ) def multi_asset_two(context: AssetExecutionContext, input_from_asset_one): # ... (same as before) # Third multi-asset, depending on the second multi-asset @multi_asset( ins={"input_from_asset_two": AssetIn(key="asset_two_output21")}, outs={ "output31": AssetOut(ke

Scout

03/04/2024, 1:02 PM

y="asset_three_output31", is_required=False), "output32": AssetOut(key="asset_three_output32", is_required=False), }, can_subset=True, ) def multi_asset_three(context: AssetExecutionContext, input_from_asset_two): # ... (same as before) ``` In this example, each

AssetOut

is given an explicit

key

, which is then used in the

AssetIn

definitions to establish the dependencies. Make sure to replace the placeholder keys with the actual keys that make sense for your assets. If you continue to encounter issues, you may want to consult the Dagster documentation or seek help from the community for more specific guidance on defining and connecting multi-assets.

Peter Mburu

03/04/2024, 2:52 PM

@Scout how do i fix this code: from dagster import AssetIn, AssetOut, Output, multi_asset, AssetExecutionContext, AssetKey # First multi-asset @multi_asset( outs={ "output1": AssetOut(is_required=False), "output2": AssetOut(is_required=False), }, can_subset=True, ) def multi_asset_one(context: AssetExecutionContext): if "output1" in context.op_execution_context.selected_output_names: yield Output(value="value1_asset_one", output_name="output1") if "output2" in context.op_execution_context.selected_output_names: yield Output(value="value2_asset_one", output_name="output2") # Second multi-asset, depending on the first multi-asset @multi_asset( outs={ "output21": AssetOut(is_required=False), "output22": AssetOut(is_required=False), }, internal_asset_deps={ "output21": {AssetKey("output1")}, "output22": {AssetKey("output2")} }, can_subset=True, ) def multi_asset_two(context: AssetExecutionContext): if "output21" in context.op_execution_context.selected_output_names: load the output of the first multi-asset yield Output(value=f"processed_{output1}_asset_two", output_name="output21") if "output22" in context.op_execution_context.selected_output_names: load the output of the second multi-asset yield Output(value=f"processed_{output2}_asset_two", output_name="output22")

Scout

03/04/2024, 2:52 PM

To fix the code, you need to ensure that the dependencies between the multi-assets are correctly defined and that the outputs from the first multi-asset are being passed as inputs to the second multi-asset. Here's a revised version of the code with the necessary changes:

Copy code

python
from dagster import AssetIn, AssetOut, Output, multi_asset, AssetExecutionContext, AssetKey

# First multi-asset
@multi_asset(
    outs={
        "output1": AssetOut(key="multi_asset_one_output1", is_required=False),
        "output2": AssetOut(key="multi_asset_one_output2", is_required=False),
    },
    can_subset=True,
)
def multi_asset_one(context: AssetExecutionContext):
    if "output1" in context.op_execution_context.selected_output_names:
        yield Output(value="value1_asset_one", output_name="output1")
    if "output2" in context.op_execution_context.selected_output_names:
        yield Output(value="value2_asset_one", output_name="output2")

# Second multi-asset, depending on the first multi-asset
@multi_asset(
    ins={
        "input_from_output1": AssetIn(key="multi_asset_one_output1"),
        "input_from_output2": AssetIn(key="multi_asset_one_output2"),
    },
    outs={
        "output21": AssetOut(key="multi_asset_two_output21", is_required=False),
        "output22": AssetOut(key="multi_asset_two_output22", is_required=False),
    },
    can_subset=True,
)
def multi_asset_two(context: AssetExecutionContext, input_from_output1, input_from_output2):
    if "output21" in context.op_execution_context.selected_output_names:
        # Process the output of the first multi-asset
        processed_value = f"processed_{input_from_output1}_asset_two"
        yield Output(value=processed_value, output_name="output21")
    if "output22" in context.op_execution_context.selected_output_names:
        # Process the output of the second multi-asset
        processed_value = f"processed_{input_from_output2}_asset_two"
        yield Output(value=processed_value, output_name="output22")

Here are the changes made to the original code: 1. Explicit Asset Keys: Each

AssetOut

multi_asset_one

is given an explicit key using the

key

parameter. These keys are used to identify the assets produced by

multi_asset_one

. 2. Input Definitions: In

multi_asset_two

, the

ins

parameter is used to define the inputs that depend on the outputs of

multi_asset_one

. The keys used in

AssetIn

match the keys defined in the outputs of

multi_asset_one

. 3. Removed
internal_asset_deps
: The

internal_asset_deps

parameter is not needed in this case because we are using the

ins

parameter to define the dependencies. The

internal_asset_deps

parameter is used when you need to specify dependencies between outputs within the same multi-asset. 4. *Passing Inputs

Scout

03/04/2024, 2:53 PM

*: The inputs

input_from_output1

and

input_from_output2

are passed as arguments to

multi_asset_two

and are used to process the values. 5. Processing Logic: The placeholders

load the output of the first multi-asset

and

load the output of the second multi-asset

have been replaced with actual processing logic that uses the inputs. With these changes, the dependencies between the multi-assets should be correctly defined, and the code should work as expected. Make sure to replace the processing logic with the actual logic you need for your use case.

Peter Mburu

03/04/2024, 2:56 PM

@Scout without passing inputs as arguments

multi_asset_two

Scout

03/04/2024, 2:56 PM

If you want to define dependencies between assets without passing inputs as arguments in

multi_asset_two

, you can use the

AssetKey

to specify dependencies. However, you will need to ensure that the assets are part of the same job or repository so that Dagster can resolve the dependencies correctly. Here's how you can define

multi_asset_two

without passing inputs as arguments:

Copy code

python
from dagster import AssetIn, AssetOut, Output, multi_asset, AssetExecutionContext, AssetKey

# First multi-asset
@multi_asset(
    outs={
        "output1": AssetOut(key="multi_asset_one_output1", is_required=False),
        "output2": AssetOut(key="multi_asset_one_output2", is_required=False),
    },
    can_subset=True,
)
def multi_asset_one(context: AssetExecutionContext):
    if "output1" in context.op_execution_context.selected_output_names:
        yield Output(value="value1_asset_one", output_name="output1")
    if "output2" in context.op_execution_context.selected_output_names:
        yield Output(value="value2_asset_one", output_name="output2")

# Second multi-asset, depending on the first multi-asset
@multi_asset(
    outs={
        "output21": AssetOut(key="multi_asset_two_output21", is_required=False),
        "output22": AssetOut(key="multi_asset_two_output22", is_required=False),
    },
    can_subset=True,
)
def multi_asset_two(context: AssetExecutionContext):
    if "output21" in context.op_execution_context.selected_output_names:
        # Assume we have a way to retrieve the value from the first multi-asset's output1
        input_from_output1 = context.get_asset(AssetKey("multi_asset_one_output1"))
        processed_value = f"processed_{input_from_output1}_asset_two"
        yield Output(value=processed_value, output_name="output21")
    if "output22" in context.op_execution_context.selected_output_names:
        # Assume we have a way to retrieve the value from the first multi-asset's output2
        input_from_output2 = context.get_asset(AssetKey("multi_asset_one_output2"))
        processed_value = f"processed_{input_from_output2}_asset_two"
        yield Output(value=processed_value, output_name="output22")

In this example,

multi_asset_two

does not take inputs as arguments. Instead, it assumes that there is a way to retrieve the values of the first multi-asset's outputs within the execution context. The

context.get_asset

method is a placeholder for the actual method you would use to retrieve the asset values. Dagster does not provide a

context.get_asset

method, so you would need to implement the logic to retrieve the asset values based on your storage and execution environment. Please note that the actual implementation of how you retrieve the asset values will depend on the specifics of your Dagster setup, such as the I/O ma

Scout

03/04/2024, 2:56 PM

nager you are using and how your assets are stored. You may need to access the outputs of

multi_asset_one

from the storage system directly, or use an I/O manager that can load the required asset values.

Peter Mburu

03/04/2024, 3:30 PM

@Scout # First multi-asset @multi_asset( outs={ "output1": AssetOut(is_required=False), "output2": AssetOut(is_required=False), }, can_subset=True, ) def multi_asset_one(context: AssetExecutionContext): if "output1" in context.op_execution_context.selected_output_names: yield Output(value="value1_asset_one", output_name="output1") if "output2" in context.op_execution_context.selected_output_names: yield Output(value="value2_asset_one", output_name="output2") # Second multi-asset, depending on the first multi-asset @multi_asset( outs={ "output21": AssetOut(is_required=False), "output22": AssetOut(is_required=False), }, internal_asset_deps={ "output21": {AssetKey("output1")}, "output22": {AssetKey("output2")} }, can_subset=True, ) def multi_asset_two(context: AssetExecutionContext): if "output21" in context.op_execution_context.selected_output_names: output1 = context.get_asset(AssetKey("output1")) yield Output(value=f"processed_{output1}_asset_two", output_name="output21") if "output22" in context.op_execution_context.selected_output_names: output2 = context.get_asset(AssetKey("output2")) yield Output(value=f"processed_{output2}_asset_two", output_name="output22") fails with: The above exception was caused by the following exception: dagster._check.CheckError: Invariant failed. Description: Invalid asset dependencies: {AssetKey(['output1'])} specified in

internal_asset_deps

argument for multi-asset 'multi_asset_two' on key 'output21'. Each specified asset key must be associated with an input to the asset or produced by this asset. Valid keys: [AssetKey(['output21']), AssetKey(['output22'])] File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/errors.py", line 287, in user_code_error_boundary yield File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_grpc/server.py", line 242, in init loadable_targets = get_loadable_targets( ^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_grpc/utils.py", line 50, in get_loadable_targets else loadable_targets_from_python_module(module_name, working_directory) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/workspace/autodiscovery.py", line 35, in loadable_targets_from_python_module module = load_python_module( ^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/code_pointer.py", line 134, in load_python_module return importlib.import_module(module_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<frozen importlib._bootstrap>", line 1204, in _gcd_import File "<frozen importlib._bootstrap>", line 1176, in _find_and_load File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 690, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 940, in exec_module File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed File "/home/pmburu/repo/msc/uti/pipelines/__init__.py", line 8, in <module> from .assets.antibiotic_resistance import multi_asset_one, multi_asset_two, multi_asset_three File "/home/pmburu/repo/msc/uti/pipelines/assets/antibiotic_resistance.py", line 118, in <module> @multi_asset( ^^^^^^^^^^^^ File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_core/definitions/decorators/asset_decorator.py", line 719, in inner check.invariant( File "/home/pmburu/miniconda3/lib/python3.11/site-packages/dagster/_check/__init__.py", line 1612, in invariant raise CheckError(f"Invariant failed. Description: {desc}")

Scout

03/04/2024, 3:30 PM

The error message indicates that there is an issue with the

internal_asset_deps

argument in the

multi_asset_two

definition. Specifically, the error states that the asset keys specified in

internal_asset_deps

are not valid because they must be associated with an input to the asset or produced by the asset itself. In Dagster,

internal_asset_deps

is used to specify dependencies between outputs within the same multi-asset. However, in your case, you are trying to specify dependencies on assets produced by a different multi-asset (

multi_asset_one

). This is not the correct use of

internal_asset_deps

. To fix this issue, you should remove the

internal_asset_deps

argument and instead use the

ins

parameter to define inputs to

multi_asset_two

that depend on the outputs of

multi_asset_one

. However, since you mentioned that you do not want to pass inputs as arguments, you will need to ensure that the assets are part of the same job or repository so that Dagster can resolve the dependencies correctly. Here's how you can define

multi_asset_two

without using

internal_asset_deps

and without passing inputs as arguments: ```python from dagster import AssetOut, Output, multi_asset, AssetExecutionContext, AssetKey # First multi-asset @multi_asset( outs={ "output1": AssetOut(key="multi_asset_one_output1", is_required=False), "output2": AssetOut(key="multi_asset_one_output2", is_required=False), }, can_subset=True, ) def multi_asset_one(context: AssetExecutionContext): if "output1" in context.op_execution_context.selected_output_names: yield Output(value="value1_asset_one", output_name="output1") if "output2" in context.op_execution_context.selected_output_names: yield Output(value="value2_asset_one", output_name="output2") # Second multi-asset, depending on the first multi-asset @multi_asset( outs={ "output21": AssetOut(key="multi_asset_two_output21", is_required=False), "output22": AssetOut(key="multi_asset_two_output22", is_required=False), }, can_subset=True, ) def multi_asset_two(context: AssetExecutionContext): if "output21" in context.op_execution_context.selected_output_names: # Logic to retrieve the value from the first multi-asset's output1 # This logic will depend on your specific setup and how you manage asset storage # For example, you might retrieve the value from a database or a file system output1_value = "retrieved_value1" # Placeholder for the actual retrieval logic yield Output(value=f"processed_{output1_value}_asset_two", output_name="output21") if "output22" in context.op_execution_context.selected_output_names: # Logic to retrieve the value from the first multi-asset's output2 #

Scout

03/04/2024, 3:31 PM

This logic will depend on your specific setup and how you manage asset storage output2_value = "retrieved_value2" # Placeholder for the actual retrieval logic yield Output(value=f"processed_{output2_value}_asset_two", output_name="output22") ``` In this revised example,

multi_asset_two

does not use

internal_asset_deps

and does not take inputs as arguments. Instead, it assumes that there is a way to retrieve the values of the first multi-asset's outputs within the execution context. The actual implementation of how you retrieve the asset values will depend on the specifics of your Dagster setup, such as the I/O manager you are using and how your assets are stored. You may need to access the outputs of

multi_asset_one

from the storage system directly, or use an I/O manager that can load the required asset values.

Peter Mburu

03/04/2024, 5:48 PM

@Scout def test_import_imf_data(self): with conditional_logging(): with patch( "dagster_pipelines.assets.ingestion.price.imf_ingestion.fetch_remote_imf_data" ) as mock_process, patch( "dagster_pipelines.assets.ingestion.price.imf_ingestion.get_data_source_documents_for_web_service" ) as mock_filter, patch( "dagster_pipelines.assets.ingestion.price.imf_ingestion.fetch_imf_metadata" ) as mock_metadata: mock_process.return_value.ok = True mock_process.return_value = self.mock_data mock_filter.return_value.ok = True mock_filter.return_value = self.datasourcedocument mock_metadata.return_value.ok = True mock_metadata.return_value = self.mock_metadata_df assets = [ imf_ingestion.get_imf_datasource_documents, imf_ingestion.get_imf_data_series, imf_ingestion.imf_metadata, imf_ingestion.remote_imf_data, ] result = materialize(assets, partition_key="2022-01-01") assert result.success output_asset2 = result.output_for_node("remote_imf_data") ====================================================================== ERROR [0.886s]: test_import_imf_data (dagster_pipelines.assets.ingestion.price.tests.test_pipelines.IMFIngestionPipelineTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/pmburu/git/fdw/dagster_pipelines/assets/ingestion/price/tests/test_pipelines.py", line 245, in test_import_imf_data output_asset2 = result.output_for_node("remote_imf_data") File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/execute_in_process_result.py", line 120, in output_for_node return super(ExecuteInProcessResult, self).output_for_node( File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/execution/execution_result.py", line 101, in output_for_node origin_output_def, origin_handle = target_node_def.resolve_output_to_origin( File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/definitions/op_definition.py", line 329, in resolve_output_to_origin return self.output_def_named(output_name), handle File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/definitions/node_definition.py", line 152, in output_def_named raise DagsterInvariantViolationError(f"{self._name} has no output named {name}.") dagster._core.errors.DagsterInvariantViolationError: remote_imf_data has no output named result.

Scout

03/04/2024, 5:48 PM

The error message you're encountering indicates that the

output_for_node

method is being called with an output name (

"result"

) that does not exist for the node

"remote_imf_data"

. The

output_for_node

method is used to retrieve the output of a specific node (in this case, an asset) from the result of a pipeline run or asset materialization. Here's the relevant part of the error message:

Copy code

dagster._core.errors.DagsterInvariantViolationError: remote_imf_data has no output named result.

This error suggests that within the

remote_imf_data

asset, there is no output defined with the name

"result"

. To resolve this issue, you should: 1. Check the definition of the

remote_imf_data

asset to see what outputs it actually defines. Ensure that you are using the correct output name when calling

output_for_node

. 2. If you expect the

remote_imf_data

asset to have an output named

"result"

, you need to make sure that the asset is yielding an

Output

with

output_name="result"

. 3. If the

remote_imf_data

asset is supposed to yield multiple outputs, ensure that each

Output

is yielded with the correct

output_name

that matches what you are trying to retrieve with

output_for_node

. 4. If the

remote_imf_data

asset only yields a single output and does not specify an

output_name

, you can retrieve the output without specifying an output name, as Dagster will use the default output name. Here's an example of how you might modify the

remote_imf_data

asset to yield an output named `"result"`:

Copy code

python from dagster import Output, asset  @asset def remote_imf_data():     # ... your logic here ...     yield Output(value=some_value, output_name="result")

And here's how you would retrieve the output in your test:

Copy code

python output_asset2 = result.output_for_node("remote_imf_data", "result")

Make sure that the output name you use in

output_for_node

matches the

output_name

specified when yielding the

Output

in your asset definition. If the asset does not specify an output name, you can omit the output name in

output_for_node

, and it will default to the standard output name.

Peter Mburu

03/04/2024, 6:22 PM

@Scout Is there a way to access a single asset from multi-asset?

Scout

03/04/2024, 6:22 PM

Yes, you can access a single asset from a multi-asset in Dagster. When you define a multi-asset with the

@multi_asset

decorator, you specify multiple outputs, each potentially corresponding to a different asset. When you materialize or execute the multi-asset, you can access the individual assets by their output names. Here's a simplified example of how you might define a multi-asset and then access one of its assets:

Copy code

python from dagster import AssetOut, Output, multi_asset  @multi_asset(     outs={         "asset1": AssetOut(),         "asset2": AssetOut(),     } ) def my_multi_asset():     # Your logic to produce values for asset1 and asset2     yield Output(value="value_for_asset1", output_name="asset1")     yield Output(value="value_for_asset2", output_name="asset2")  # Later, when you materialize or execute the multi-asset, you can access the individual assets: result = materialize([my_multi_asset]) value_for_asset1 = result.output_for_node("my_multi_asset", "asset1") value_for_asset2 = result.output_for_node("my_multi_asset", "asset2")

In this example,

my_multi_asset

is a multi-asset that yields two outputs, each corresponding to a different asset. When you call

materialize([my_multi_asset])

, both assets are materialized, and you can access the output of each asset using the

output_for_node

method on the result object, specifying the multi-asset's name and the output name. If you only want to materialize a subset of the assets defined in a multi-asset, you can use the

AssetSelection

API to specify which assets you want to materialize. Here's an example of how you might do that:

Copy code

python from dagster import AssetSelection, materialize  # Assuming 'my_multi_asset' is defined as above  # Materialize only 'asset1' from the multi-asset result = materialize(     AssetSelection.assets(my_multi_asset).with_output("asset1") ) value_for_asset1 = result.output_for_node("my_multi_asset", "asset1")

In this example,

AssetSelection.assets(my_multi_asset).with_output("asset1")

specifies that only the output named

"asset1"

from the multi-asset

my_multi_asset

should be materialized. Remember that the actual implementation of accessing individual assets from a multi-asset will depend on your specific use case and the way you have set up your Dagster assets and materialization process.

Peter Mburu

03/04/2024, 6:30 PM

Scout

03/04/2024, 6:30 PM

The error you're encountering suggests that there's an attempt to use a method

with_output

on an object of type

KeysAssetSelection

, which does not have this method. The

AssetSelection

API in Dagster is used to select specific assets for materialization, but the way it's being used in the code snippet from the error message seems to be incorrect. As of my knowledge cutoff date, Dagster does not have a

with_output

method on the

AssetSelection

class. Instead, you should use the

AssetSelection

API to select assets by their keys. If you want to materialize a specific asset or a subset of assets, you can use the

AssetSelection

to specify which assets to include. Here's an example of how you might correctly use

AssetSelection

to select a specific asset for materialization:

Copy code

python from dagster import AssetSelection, materialize  # Assuming 'remote_imf_data' is a multi-asset with an output asset named 'gpd_remote_data'  # Select the specific asset you want to materialize selected_assets = AssetSelection.keys("gpd_remote_data")  # Materialize the selected asset result = materialize(selected_assets)

In this example,

AssetSelection.keys("gpd_remote_data")

is used to create a selection of assets by their keys. You would replace

"gpd_remote_data"

with the actual key of the asset you want to materialize. If you're trying to materialize a specific output from a multi-asset, you would typically not need to use

AssetSelection

for that purpose. Instead, you would materialize the multi-asset and then access the specific output you're interested in from the result. If you continue to encounter issues, please ensure that you are using the latest version of Dagster and refer to the most recent documentation for any updates or changes to the API. If the issue persists, you may need to seek further assistance from the Dagster community or support channels.

Peter Mburu

03/04/2024, 6:48 PM

@Scout ERROR [0.004s]: test_import_imf_data (dagster_pipelines.assets.ingestion.price.tests.test_pipelines.IMFIngestionPipelineTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/pmburu/git/fdw/dagster_pipelines/assets/ingestion/price/tests/test_pipelines.py", line 252, in test_import_imf_data result = materialize(assets, partition_key="2022-01-01") File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_core/definitions/materialize.py", line 88, in materialize assets = check.sequence_param(assets, "assets", of_type=(AssetsDefinition, SourceAsset)) File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_check/__init__.py", line 1125, in sequence_param return _check_iterable_items(obj, of_type, "sequence") File "/home/pmburu/.venvs/fntfdwlcl/lib/python3.8/site-packages/dagster/_check/__init__.py", line 1768, in _check_iterable_items raise CheckError( dagster._check.CheckError: Member of sequence mismatches type. Expected (<class 'dagster._core.definitions.assets.AssetsDefinition'>, <class 'dagster._core.definitions.source_asset.SourceAsset'>). Got KeysAssetSelection(selected_keys=[AssetKey(['gpd_remote_data'])]) of type <class 'dagster._core.definitions.asset_selection.KeysAssetSelection'>. ---------------------------------------------------------------------- Ran 1 test in 0.534s FAILED (errors=1) Generating XML reports... Preserving test database for alias 'default'...

Scout

03/04/2024, 6:48 PM

The error message indicates that the

materialize

function is expecting a sequence of

AssetsDefinition

SourceAsset

objects, but it received an object of type

KeysAssetSelection

instead. The

materialize

function is used to materialize assets, and it requires a list of asset definitions or source assets as its argument. The

KeysAssetSelection

object is part of the

AssetSelection

API, which is used to select specific assets by their keys. However, it seems that the

KeysAssetSelection

object is being passed directly to the

materialize

function, which is not the correct usage. To resolve this issue, you need to pass the actual asset definitions that you want to materialize to the

materialize

function. If you have a list of asset keys and want to materialize those specific assets, you need to ensure that you are passing the corresponding asset definitions. Here's an example of how you might correctly use the

materialize

function with asset definitions:

Copy code

python from dagster import materialize  # Assuming you have asset definitions like the following: # asset1, asset2, ...  # You would pass the asset definitions directly to the materialize function result = materialize([asset1, asset2], partition_key="2022-01-01")

In this example,

asset1

asset2

, etc., are the asset definitions that you want to materialize. You would replace these with the actual asset definitions that correspond to the asset keys you are interested in. If you are trying to materialize assets based on a selection of keys, you need to convert those keys into the corresponding asset definitions before calling

materialize

. The

AssetSelection

API can be used to filter assets within a repository or job, but the

materialize

function requires the actual asset definitions. Please ensure that you are using the latest version of Dagster and refer to the most recent documentation for any updates or changes to the API. If the issue persists, you may need to seek further assistance from the Dagster community or support channels.

Peter Mburu

03/04/2024, 6:59 PM

@Scout how i get actual asset from KeysAssetSelection

Scout

03/04/2024, 6:59 PM

To get the actual assets from a

KeysAssetSelection

, you would typically be working within the context of a Dagster repository where your assets are defined. The

KeysAssetSelection

object is used to create a selection of assets based on their keys, but to materialize those assets or to work with their definitions, you need to resolve them to the actual

AssetsDefinition

objects within your repository. As of my knowledge cutoff date, the documentation does not provide a direct method to convert a

KeysAssetSelection

into a list of

AssetsDefinition

objects for use with the

materialize

function. However, if you have access to the repository definition, you may be able to retrieve the

AssetsDefinition

objects by their keys. Here's a conceptual example of how you might retrieve

AssetsDefinition

objects from a repository using the asset keys:

Copy code

python from dagster import AssetKey, RepositoryDefinition  # Assuming you have a repository definition repository: RepositoryDefinition = ...  # And you have a KeysAssetSelection with some asset keys keys_asset_selection = AssetSelection.keys(AssetKey("my_asset1"), AssetKey("my_asset2"))  # You would retrieve the actual asset definitions from the repository using the asset keys assets_defs = [     repository.get_asset_def(asset_key)     for asset_key in keys_asset_selection.selected_keys ]  # Now you can use the assets_defs with the materialize function result = materialize(assets_defs, partition_key="2022-01-01")

In this example,

repository.get_asset_def(asset_key)

is a placeholder for the method you would use to retrieve the

AssetsDefinition

object for a given

AssetKey

from your repository. You would need to replace this with the actual method provided by your repository or the context in which you are working. Please note that the actual implementation may vary based on the version of Dagster you are using and the structure of your repository. If you are unable to find a method to retrieve

AssetsDefinition

objects by their keys, you may need to manually define a list of the assets you want to materialize based on your knowledge of the repository's structure. If you need further assistance, please refer to the latest Dagster documentation or seek help from the Dagster community.

Peter Mburu

03/04/2024, 7:02 PM

@Scout fix the code below: def test_import_imf_data(self): with conditional_logging(): with patch( "dagster_pipelines.assets.ingestion.price.imf_ingestion.fetch_remote_imf_data" ) as mock_process, patch( "dagster_pipelines.assets.ingestion.price.imf_ingestion.get_data_source_documents_for_web_service" ) as mock_filter, patch( "dagster_pipelines.assets.ingestion.price.imf_ingestion.fetch_imf_metadata" ) as mock_metadata: mock_process.return_value.ok = True mock_process.return_value = self.mock_data mock_filter.return_value.ok = True mock_filter.return_value = self.datasourcedocument mock_metadata.return_value.ok = True mock_metadata.return_value = self.mock_metadata_df assets = [ get_imf_datasource_documents, get_imf_data_series, imf_metadata, AssetSelection.assets(remote_imf_data).keys("gpd_remote_data") ] result = materialize(assets, partition_key="2022-01-01")

Scout

03/04/2024, 7:02 PM

The code snippet you've provided is attempting to include an

AssetSelection

object directly in the list of assets to be materialized. However, the

materialize

function expects a list of asset definitions, not an

AssetSelection

object. To fix the code, you need to ensure that you are passing a list of asset definitions to the

materialize

function. Here's a revised version of the test function:

Copy code

python
from dagster import materialize, AssetKey

def test_import_imf_data(self):
    with conditional_logging():
        with patch(
            "dagster_pipelines.assets.ingestion.price.imf_ingestion.fetch_remote_imf_data"
        ) as mock_process, patch(
            "dagster_pipelines.assets.ingestion.price.imf_ingestion.get_data_source_documents_for_web_service"
        ) as mock_filter, patch(
            "dagster_pipelines.assets.ingestion.price.imf_ingestion.fetch_imf_metadata"
        ) as mock_metadata:
            mock_process.return_value.ok = True
            mock_process.return_value = self.mock_data
            mock_filter.return_value.ok = True
            mock_filter.return_value = self.datasourcedocument
            mock_metadata.return_value.ok = True
            mock_metadata.return_value = self.mock_metadata_df
            
            # Assuming get_imf_datasource_documents, get_imf_data_series, and imf_metadata
            # are asset definitions, and remote_imf_data is a multi_asset definition.
            # You need to pass the asset definitions directly to the materialize function.
            # If you want to materialize a specific asset from a multi_asset, you need to
            # get the asset definition for that specific asset.
            
            # Assuming 'gpd_remote_data' is the key for the asset you want to materialize
            # from the multi_asset 'remote_imf_data', you would retrieve the asset definition
            # from your repository. The following line is a placeholder for the actual retrieval:
            gpd_remote_data_asset = repository.get_asset_def(AssetKey("gpd_remote_data"))
            
            # Now you can include the asset definition in the list of assets to materialize
            assets = [
                get_imf_datasource_documents,
                get_imf_data_series,
                imf_metadata,
                gpd_remote_data_asset  # This should be the actual asset definition
            ]
            
            # Materialize the assets
            result = materialize(assets, partition_key="2022-01-01")
            assert result.success

In this revised code: - We assume that

get_imf_datasource_documents

get_imf_data_series

, and

imf_metadata

are asset definitions that you want to materialize. - We replace the

AssetSelection

object with the actual asset d

Scout

03/04/2024, 7:03 PM

efinition for the asset keyed by

"gpd_remote_data"

. You need to retrieve this asset definition from your repository, which typically involves accessing the repository object and calling a method like

get_asset_def

with the

AssetKey

. - The placeholder

repository.get_asset_def(AssetKey("gpd_remote_data"))

should be replaced with the actual code to retrieve the asset definition from your repository. Please note that the actual method to retrieve an asset definition from a repository may vary based on your specific setup and the version of Dagster you are using. If you need further assistance, please refer to the latest Dagster documentation or seek help from the Dagster community.

43 Views

Open in Slack

Previous Next