Hi Folks I used to use asset reconciliation sensor on dagste dagster #ask-community

Hi Folks. I used to use asset_reconciliation_senso...

Jeferson Machado Santos

08/24/2023, 5:41 PM

Hi Folks. I used to use asset_reconciliation_sensor on dagster 1.3.4 and now I am upgrading to 1.4.5 and trying to use AutoMaterializationPolicy. It happens that automaterializations is launching runs for assets which are being updated on previous runs still not finished. For example, on screenshot below, there are 6 assets launched in a second run but they are still being updated on the previous run of 385 assets. Does anyone know why that happens?

Jeferson Machado Santos

08/24/2023, 5:42 PM

Here an asset with 2 runs at the same time

owen

08/24/2023, 8:57 PM

hi @Jeferson Machado Santos -- what information is there in the

Auto-materialize history

tab ?

Jeferson Machado Santos

08/24/2023, 9:08 PM

Hi owen. It is not showing because I still did not made "dagster instance migrate", need to backup DB. Maybe this could be the issue?

owen

08/24/2023, 9:09 PM

Running

dagster instance migrate

shouldn't impact the functionality, but it would definitely help provide clues to what might be going on here

Jeferson Machado Santos

08/24/2023, 9:14 PM

Ok, will work on that and return here. Thanks

Jeferson Machado Santos

08/28/2023, 4:45 PM

Hi owen. Now I have the information on Auto-materialize History. This asset has two Runs at the same time. The second was triggered while the first was still running

Jeferson Machado Santos

08/28/2023, 4:45 PM

Auto-materialize policy:

Jeferson Machado Santos

08/28/2023, 4:45 PM

image.png

Jeferson Machado Santos

08/28/2023, 4:46 PM

as you can see, both evaluations have the same reason

Jeferson Machado Santos

08/28/2023, 4:46 PM

any idea what happened here?

Jeferson Machado Santos

08/28/2023, 7:17 PM

@owen, another case that is more clear to me. This asset is being updated on run a807757d. The run did not finish, but the asset is already materialized because I can check on my bigquery. However, run 01a05ea5 has been launched even if it has passed like 3 minutes from its update on bigquery and previous run is still running. I believe the descendants assets are being updated on the first run and are requiring fresher data, and dagster did not recognize the latest materialization because the first run still did not finish. Does that make sense? It seems like a wrong behaviour for me. Any ideas on what is happening here?

Jeferson Machado Santos

08/30/2023, 4:59 PM

Hi @owen, any update or idea on this issue?

owen

08/30/2023, 11:49 PM

hi @Jeferson Machado Santos — currently on vacation at the moment but i’ll forward this to someone on the team and have them look into it

claire

08/31/2023, 11:25 PM

Hi Jeferson, I can partially explain the behavior you're seeing here. Looks like you're using dbt assets--these assets run together in a single step, causing them to display as "in-progress" until the step is complete, even if the asset has already materialized. As long as the upstream asset is materialized, downstream assets can execute in a different run even if the upstream asset's step hasn't completed. It is surprising that you're seeing the upstream asset execute multiple times though--would you mind sharing your assets definitions and the graph structure?

Jeferson Machado Santos

09/01/2023, 12:36 AM

Hi @claire, yes, I can share

Jeferson Machado Santos

09/01/2023, 12:36 AM

Here are de assets definition

Jeferson Machado Santos

09/01/2023, 12:36 AM

Copy code

def get_sources(selected_assets, manifest):
    nodes = manifest["nodes"]
    sources = []
    for a in selected_assets:
        if f"model.xepelin_dw.{a}" in nodes.keys():
            node_info = nodes[f"model.xepelin_dw.{a}"]
            if len(node_info["sources"]) > 0:
                for s in node_info["sources"]:
                    sources.append("source:" + ".".join(s))

    return sources


@dbt_assets(manifest=manifest, select="*",exclude = "resource_type:snapshot")
def my_dbt_assets(context: OpExecutionContext, dbt: DbtCliResource):
    #try:
    selected_assets = [a[0][-1] for a in context.selected_asset_keys]
    sources = get_sources(selected_assets, manifest)
    #dbt.cli(["run"], context=context)
    if len(sources)>0:
        yield from dbt.cli(["test", "--select", " ".join(sources)], manifest=manifest).stream()
    else:
        <http://context.log.info|context.log.info>("No sources to test")
    yield from dbt.cli(["run"], context=context).stream()
    yield from dbt.cli(["test"], context=context).stream()

Jeferson Machado Santos

09/01/2023, 12:36 AM

the idea is to run tests on source, run, and tests on model for each "update / run"

Jeferson Machado Santos

09/01/2023, 12:39 AM

About the asset graph, the full asset graph is giant for the whole dbt project. There might be almos 1.000 assets

Jeferson Machado Santos

09/01/2023, 12:41 AM

but, here is one case. A run with 700+ assets was launched, and before it finished, a new run started for assets pre_Agreement_cl and stg_Agreement_cl. This stg_Agreemtn_cl has others downstream not shown in this picture.

Jeferson Machado Santos

09/01/2023, 12:42 AM

So, they had 2 materializations on both runs

claire

09/07/2023, 12:55 AM

Apologies about the late response, I was also on vacation--what are the freshness policies defined on the assets?

Jeferson Machado Santos

09/07/2023, 1:46 PM

Most of them are like

FreshnessPolicy(maximum_lag_minutes=60.0, cron_schedule='0 * * * *', cron_schedule_timezone=None)

claire

09/13/2023, 4:47 PM

Thanks for the detail here -- we're looking into why this is happening. For now, I've opened up an issue here: https://github.com/dagster-io/dagster/issues/16478

2 Views

Open in Slack

Previous Next