https://dagster.io/ logo
Title
a

Auster Cid

11/05/2021, 6:49 PM
My team ran into a strange bug when launching backfills today. We have some pipelines that are built out of a yaml file by a pipeline factory. Much like in this example. We also have other factories that generate
Run Config Functions
and
Date Partition Sets
for each of these pipelines. Launching partition runs from the playground works as expected, but when running a backfill, all launched runs get the config of the last partition selected for the backfill. Any idea why?
d

daniel

11/05/2021, 6:53 PM
hi Auster - is it possible to share your partition set code?
a

Auster Cid

11/05/2021, 7:07 PM
sure:
def return_partition_set_pipe_factory(name):
    return PartitionSetDefinition(
        name=f"{name}_dates",
        pipeline_name=name,
        partition_fn=date_partition_range(
            start=datetime.datetime(2020, 1, 1), inclusive=True
        ),
        run_config_fn_for_partition=run_config_pipe_factory(name),
    )

def partitions_set_pipe_factory():
    return [return_partition_set_pipe_factory(name) for name in yaml_data]
d

daniel

11/05/2021, 7:09 PM
how about run_config_pipe_factory?
a

Auster Cid

11/05/2021, 7:10 PM
had to redact some sensitive stuff, but is essentially something like:
def run_config_pipe_factory(name):
    def run_config(partition):
        # read yaml data and populate run config dict
        return dict

    return run_config
I've done some debugging and can confirm that run_config is executed once for each partition in the backfill, but the runs still receive the wrong config
d

daniel

11/05/2021, 7:16 PM
Got it - so one possibility I'll throw out is that your function might be written in such a way that it's returning the same object even if it's called multiple times? Which might point to something we could fix on our side, but you could try sticking a .deepcopy() on the end of the run _config function. That could explain these symptoms - if its a shared dict somehow, the state would end up inadvertently shared across all the partitions
a

Auster Cid

11/05/2021, 7:17 PM
hmm, makes sense, I'll give it a shot
d

daniel

11/05/2021, 7:17 PM
Like for example if your implementation was
SHARED_RUN_CONFIG = {} // Template

def my_run_config() {
    my_config = SHARED_RUN_CONFIG
    my_config["partition_specific_stuff"] = "your_stuff
    return my_config
}
I could see this happening in that case
that said if that's what it is, that's something we should be able to fix on our side
a

Auster Cid

11/05/2021, 7:21 PM
yep, that was it
the dictionary read from the yaml file was outside the function scope so the changes made to it were being shared by all run_configs
tyvm for your help @daniel
d

daniel

11/05/2021, 7:54 PM
np!