Hello, dagster team! I am trying to populate the ...
# ask-community
r
Hello, dagster team! I am trying to populate the 'executor_def' with the Slurm cluster when defining the job with below code snippet. from dagster_dask import dask_executor executor = dask_executor.configured({"cluster": {"slurm" : {"cores": 2, "memory": "24GB"}}}) @job(executor_def=executor) def do_something(): pass Now when I launch a run, it starts a process but it is not submitting the slurm job, also dagster logs look fine saying that the "the run has been started" What could be the possible reason.
o
hi @Reetika Gour! what ops exist in your job? does the run say that it's completed or does it just hang indefinitely?
if the job you're executing does not contain any ops (as shown in that snippet), then it's possible that no job is being submitted as there is no work to do
r
Thanks for the prompt response! There are 4 ops under this job like below: @job(executor_def=executor) def do_something(): output_op1 = op1() output_op_2 = op2(output_op1) output_op_3 = op3(output_op1) output_op_4 = op4(output_op1) Now when I launch a run, it starts a process and then hangs indefinitely. Please see the attached screenshot.
o
hm interesting -- are you able to manually create a slurm cluster without dagster? (i.e. https://jobqueue.dask.org/en/latest/generated/dask_jobqueue.SLURMCluster.html) my feeling is that this might be something more specific to the configuration you're passing to generate the cluster, rather than something specific to dagster
r
Yes, I am able to create a slurm cluster manually as shown in the link you shared. But same thing does not work when using "dask_executor. The problem remains same with this as described earlier. Also please help me in materialising the assets, each of the ops above generates some output (mostly csv or txt files) which user wants to have it available on the UI.
o
for the metadata, you can do something along the lines of:
Copy code
context.add_output_metadata(
    {"output": MarkdownMetadataValue(<markdown-formatted string of your csv/txt>)
)
inside the body of your ops/assets. This will allow the output to be viewable in the UI. I'd recommend avoiding this route if the csv output gets particularly large, as each of these metadata entries does need to be stored in the database (so you probably wouldn't want to have 10s of megabytes of markdown in there).
I was able to replicate the issue you were describing above, and I believe it's related to the fact that the default n_workers for a SlurmCluster is 0 (so no workers were available to do any tasks)
I was able to get beyond that by configuring
"n_workers": 1
, but ended up with an error about no
sbatch
file being found. This seems like it's more in the domain of Slurm-specific stuff
r
I tried configuring
"n_workers": 1
but no luck. It just hangs indefinitely.
I don't see slurm specific logs in the dagster logs. Any idea?
o
If you look in the terminal where you run
dagster dev
or
dagit
, do you see that
sbatch
error? My understanding here is that this SlurmCluster will only work if you're on an actual slurm cluster when running it, which is likely not the case if you're running locally, although I'll admit again that I'm not familiar with slurm.
r
No, I don't see any
sbatch
error in the terminal.