https://dagster.io/ logo
#ask-community
Title
# ask-community
j

Jeferson Machado Santos

08/24/2023, 5:41 PM
Hi Folks. I used to use asset_reconciliation_sensor on dagster 1.3.4 and now I am upgrading to 1.4.5 and trying to use AutoMaterializationPolicy. It happens that automaterializations is launching runs for assets which are being updated on previous runs still not finished. For example, on screenshot below, there are 6 assets launched in a second run but they are still being updated on the previous run of 385 assets. Does anyone know why that happens?
Here an asset with 2 runs at the same time
o

owen

08/24/2023, 8:57 PM
hi @Jeferson Machado Santos -- what information is there in the
Auto-materialize history
tab ?
j

Jeferson Machado Santos

08/24/2023, 9:08 PM
Hi owen. It is not showing because I still did not made "dagster instance migrate", need to backup DB. Maybe this could be the issue?
o

owen

08/24/2023, 9:09 PM
Running
dagster instance migrate
shouldn't impact the functionality, but it would definitely help provide clues to what might be going on here
j

Jeferson Machado Santos

08/24/2023, 9:14 PM
Ok, will work on that and return here. Thanks
Hi owen. Now I have the information on Auto-materialize History. This asset has two Runs at the same time. The second was triggered while the first was still running
Auto-materialize policy:
image.png
as you can see, both evaluations have the same reason
any idea what happened here?
@owen, another case that is more clear to me. This asset is being updated on run a807757d. The run did not finish, but the asset is already materialized because I can check on my bigquery. However, run 01a05ea5 has been launched even if it has passed like 3 minutes from its update on bigquery and previous run is still running. I believe the descendants assets are being updated on the first run and are requiring fresher data, and dagster did not recognize the latest materialization because the first run still did not finish. Does that make sense? It seems like a wrong behaviour for me. Any ideas on what is happening here?
Hi @owen, any update or idea on this issue?
o

owen

08/30/2023, 11:49 PM
hi @Jeferson Machado Santos — currently on vacation at the moment but i’ll forward this to someone on the team and have them look into it
c

claire

08/31/2023, 11:25 PM
Hi Jeferson, I can partially explain the behavior you're seeing here. Looks like you're using dbt assets--these assets run together in a single step, causing them to display as "in-progress" until the step is complete, even if the asset has already materialized. As long as the upstream asset is materialized, downstream assets can execute in a different run even if the upstream asset's step hasn't completed. It is surprising that you're seeing the upstream asset execute multiple times though--would you mind sharing your assets definitions and the graph structure?
j

Jeferson Machado Santos

09/01/2023, 12:36 AM
Hi @claire, yes, I can share
Here are de assets definition
Copy code
def get_sources(selected_assets, manifest):
    nodes = manifest["nodes"]
    sources = []
    for a in selected_assets:
        if f"model.xepelin_dw.{a}" in nodes.keys():
            node_info = nodes[f"model.xepelin_dw.{a}"]
            if len(node_info["sources"]) > 0:
                for s in node_info["sources"]:
                    sources.append("source:" + ".".join(s))

    return sources


@dbt_assets(manifest=manifest, select="*",exclude = "resource_type:snapshot")
def my_dbt_assets(context: OpExecutionContext, dbt: DbtCliResource):
    #try:
    selected_assets = [a[0][-1] for a in context.selected_asset_keys]
    sources = get_sources(selected_assets, manifest)
    #dbt.cli(["run"], context=context)
    if len(sources)>0:
        yield from dbt.cli(["test", "--select", " ".join(sources)], manifest=manifest).stream()
    else:
        <http://context.log.info|context.log.info>("No sources to test")
    yield from dbt.cli(["run"], context=context).stream()
    yield from dbt.cli(["test"], context=context).stream()
the idea is to run tests on source, run, and tests on model for each "update / run"
About the asset graph, the full asset graph is giant for the whole dbt project. There might be almos 1.000 assets
but, here is one case. A run with 700+ assets was launched, and before it finished, a new run started for assets pre_Agreement_cl and stg_Agreement_cl. This stg_Agreemtn_cl has others downstream not shown in this picture.
So, they had 2 materializations on both runs
c

claire

09/07/2023, 12:55 AM
Apologies about the late response, I was also on vacation--what are the freshness policies defined on the assets?
j

Jeferson Machado Santos

09/07/2023, 1:46 PM
Most of them are like
FreshnessPolicy(maximum_lag_minutes=60.0, cron_schedule='0 * * * *', cron_schedule_timezone=None)
c

claire

09/13/2023, 4:47 PM
Thanks for the detail here -- we're looking into why this is happening. For now, I've opened up an issue here: https://github.com/dagster-io/dagster/issues/16478
2 Views