I'm up and running a simple pipeline deployed with...
# announcements
j
I'm up and running a simple pipeline deployed with Dask on SGE, but I'm not seeing the run progress bars turn green in dagit anymore
a
alright lets see
where are the dask workers running?
for context: the way that we have things set up currently is that the source of truth for the “event stream” is the event storage as per configured on the dagster instance. The worker nodes are expected to have the same instance cofiguration so that they can write the event stream to the same place
j
I read that, but I mostly focused on the Dask-specfici page which was really helpful
I am using
dagster_dask
and pointing to a scheduler I started in the config
So the workers are running in AWS along with the scheduler on a master node connected through dask
I guess based on your explanation I should verify that the workers have S3 access too, which I haven't checked because most of my stuff is just on a shared EBS volume
a
do you have the instance configured where you are initiating the run?
can you hover over the version number in
dagit
?
j
I entered the config in the dagit playground, but I also have a YAML file next to my module
When I hover over the version, a big tooltip keeps flashing but not staying long enough for me to read it
a
huh - havent seen the flashing bug before
how about
cat $DAGSTER_HOME/dagster.yaml
j
a
ah ok ya you haven’t set up your instance yet - so the “source of truth” is a temp directory created when you launched
dagit
if you go through that first link i sent you
j
Gotcha, thanks for all your help!
a
and set up for example an RDS database in AWS and set your
run_storage
and
event_storage
to point at that- you should get everything showing up in dagit
j
That makes more sense
a
thanks for working through this all - you are the first person to show up in slack who has tried the
Dask
integration so im excited to see how it all works once you get it set up!
j
🎉
🎉 2
For future reference, I had to get
DAGSTER_HOME
propagated to all
dask-workers
for this to work. Can be configured in
jobqueue.yaml
on the Dask side:
Copy code
# in ~/.config/dask/jobqueue.yaml
sge:
  job-extra: ['-v DAGSTER_HOME=/shared/dagster']
Here
/shared
is a volume that all workers have access to. Also, you have to make sure to start new jobs if you update your source code, otherwise the
dask-workers
will be running stale pipelines
a
would you be interested in sending a PR for the dagster-dask README with these notes and any others you have?
j
done
🤩 1
a
the fix from above is out now in
0.7.5