Hello everyone, Has anyone faced failure from larg...
# ask-community
Hello everyone, Has anyone faced failure from large backfills (100+ partitions) ? I'm still testing some configuration using local/dev deployement (with
dagster dev
and local code locations) and have tried launching 2 large backfills which failed with the following errors:
Copy code
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "failed to connect to all addresses"
	debug_error_string = "{"created":"@1675190530.356000000","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3261,"referenced_errors":[{"created":"@1675190530.356000000","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}"
Do you know what could be causing this issue ?
I'm facing a similar issue. I have about 1400 partitions and after I launch a backfill, it runs until about 300 partitions completed. Then the execution stops I get the same error as you do. Did you manage to find a solution?
Hello Balázs, Unfortunately I didn't find a solution. Since this was a one-shot operation (full-refresh), the "workaround", at the time, was to split into smaller backfills (with a smaller set of partitions). Following the full-refresh, all of our assets are materialized either through schedules or sensors and we didn't have a need to use backfill on large amount of partitions. I'm not sure if newer versions of dagster have solved this issue. If not, I would advise you to raise an issue on their Github.
Hi Lucas, thanks for responding. Yeah, basically that's what I'm doing right this moment, selecting smaller subsets and running those, because I don't know what else to do. I'm using a fairly recent dagster, so I'll raise an issue.
I upgraded to version 1.2.7 (from 1.2.3) and this issue seems to be fixed.