https://dagster.io/ logo
#announcements
Title
# announcements
c

Chris Roth

04/16/2020, 7:17 PM
I'm noticing something mysterious. I have 4 celery workers set up with Redis and I'm running a pipeline with about 6 serial solids in it. I wrote the solids before setting up celery so they read and write a json file to and from disk instead of passing as an intermediate. But somehow the downstream solids are succeeding even on celery. How is this possible? It seems kind of like all of the solids in a pipeline are always getting run on the same worker
n

nate

04/16/2020, 8:05 PM
hmm what does your pipeline config look like? you have the
execution:
key set to configure the pipeline to run on Celery? if so, might be worth looking at flower to confirm whether this is indeed happening w/ tasks getting scheduled on the same worker
c

Chris Roth

04/16/2020, 8:07 PM
flower is showing a pretty even distribution across workers
n

nate

04/16/2020, 8:09 PM
and the read/writes of this JSON file are from local disk?
c

Chris Roth

04/16/2020, 8:09 PM
yup
n

nate

04/16/2020, 8:10 PM
that is indeed strange… and you’re able to confirm that the downstream solids are successfully reading that json file off disk, despite being run on a different worker?
c

Chris Roth

04/16/2020, 8:11 PM
my suspicion is that they are somehow always getting run on the same worker, but yeah it seems that the downstream solids are always able to read the file
n

nate

04/16/2020, 8:12 PM
yeah, either way its surprising behavior
c

Chris Roth

04/16/2020, 8:12 PM
yup
it sort of reminds me of the issue i was having when i was using sqs where it kept creating a new queue for each run... something along those lines where it's causing all solids to get run on the same worker
i also wonder if it could simply be a weird effect in how tasks are getting distributed on celery