Hello everyone, Has anyone faced failure from larg...
# ask-community
r
Hello everyone, Has anyone faced failure from large backfills (100+ partitions) ? I'm still testing some configuration using local/dev deployement (with
dagster dev
and local code locations) and have tried launching 2 large backfills which failed with the following errors:
Copy code
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "failed to connect to all addresses"
	debug_error_string = "{"created":"@1675190530.356000000","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3261,"referenced_errors":[{"created":"@1675190530.356000000","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}"
Do you know what could be causing this issue ?
b
I'm facing a similar issue. I have about 1400 partitions and after I launch a backfill, it runs until about 300 partitions completed. Then the execution stops I get the same error as you do. Did you manage to find a solution?
r
Hello Balázs, Unfortunately I didn't find a solution. Since this was a one-shot operation (full-refresh), the "workaround", at the time, was to split into smaller backfills (with a smaller set of partitions). Following the full-refresh, all of our assets are materialized either through schedules or sensors and we didn't have a need to use backfill on large amount of partitions. I'm not sure if newer versions of dagster have solved this issue. If not, I would advise you to raise an issue on their Github.
b
Hi Lucas, thanks for responding. Yeah, basically that's what I'm doing right this moment, selecting smaller subsets and running those, because I don't know what else to do. I'm using a fairly recent dagster, so I'll raise an issue.
I upgraded to version 1.2.7 (from 1.2.3) and this issue seems to be fixed.