Hello, I’m designing a pipeline and I’m facing a p...
# ask-community
a
Hello, I’m designing a pipeline and I’m facing a problem of circular dependency. The objective of it is to extract the information from all our products’ reviews. The pipeline: 1. Generate the parameters to make the extraction process of all the products’ first page of reviews. (It start’s with an scheduler each monday). 2. Trigger the extraction. 3. Process the new information. 4. Check (with the information of reviews we have in our database) if we need to keep extracting the next page, or if we should stop there (we already have all the information). Prepare the new parameters to make the extraction process. 5. Back to 2. The pipeline is designed to be triggered in a loop until we have all the new information. Any ideas? I was closing this pipeline using this asset, using 3 assets in order to materialize this one.
Copy code
@asset(
    description="Create collections with the requests and return collections IDs",
    auto_materialize_policy=AutoMaterializePolicy.eager(),
)
def generate_collections(
    context,
    generate_parameters_extraction,
    dbt_parameters_extraction,
    check_collections_deleted,
):
    if check_collections_deleted is True:
        my_logger = logging.getLogger("root")
        <http://my_logger.info|my_logger.info>("Collections are being generated")
        if dbt_parameters_extraction is not None:
            return extraction.creation_of_collections(dbt_parameters_extraction)
    else:
        return extraction.creation_of_collections(generate_parameters_extraction)
o
hi @Adriana Jiménez Ambel! This is an interesting setup -- my first instinct would be to draw that connection from 4 -> 2 using an asset sensor. So you could set up an asset sensor that listens for materializations of
4
, and whenever it materializes, 2 is kicked off with those new parameters.
4
can be set up to only emit an output in the case that there is new data to process
a
Make sense but the problem is that I need the asset generate_collections to depend initialy on the asset
1
, and then to be getting the info from
4
until the result coming from
4
is null. Also I need to wait until 3 is completed in order to push the materialization of 4.
@owen
o
I'm imagining something like: 2 depends on: 1 and 4 3 depends on: 2 4 depends on: 3 when 1 is updated, 2 will be updated, then 3, then 4. asset sensor listens for materializations of 4, and in response, kicks off materializations of 2
a
After rebuilding the pipeline again to include the sensor I’m facing the same problem as before (image attached). So knowing the assets cannot have a circular relationship, how should I face this issue ? Is it mandatory for dagster that the dependency between the assets is linear? @owen
o
Ah I made a mistake in my above suggestion (circular dependencies aren't allowed). Instead, 2 should only explicitly depend on asset 1, but can load in the contents of asset 4 inside its body, using
load_asset_value
. This discussion covers using that function for a slightly different case, but is mostly still relevant: https://github.com/dagster-io/dagster/discussions/14805