Hello dagster team I am trying to populate the executor def dagster #ask-community

Hello, dagster team! I am trying to populate the ...

Reetika Gour

06/14/2023, 7:22 AM

Hello, dagster team! I am trying to populate the 'executor_def' with the Slurm cluster when defining the job with below code snippet. from dagster_dask import dask_executor executor = dask_executor.configured({"cluster": {"slurm" : {"cores": 2, "memory": "24GB"}}}) @job(executor_def=executor) def do_something(): pass Now when I launch a run, it starts a process but it is not submitting the slurm job, also dagster logs look fine saying that the "the run has been started" What could be the possible reason.

owen

06/14/2023, 10:22 PM

hi @Reetika Gour! what ops exist in your job? does the run say that it's completed or does it just hang indefinitely?

owen

06/14/2023, 10:24 PM

if the job you're executing does not contain any ops (as shown in that snippet), then it's possible that no job is being submitted as there is no work to do

Reetika Gour

06/15/2023, 6:00 AM

Thanks for the prompt response! There are 4 ops under this job like below: @job(executor_def=executor) def do_something(): output_op1 = op1() output_op_2 = op2(output_op1) output_op_3 = op3(output_op1) output_op_4 = op4(output_op1) Now when I launch a run, it starts a process and then hangs indefinitely. Please see the attached screenshot.

owen

06/15/2023, 4:45 PM

hm interesting -- are you able to manually create a slurm cluster without dagster? (i.e. https://jobqueue.dask.org/en/latest/generated/dask_jobqueue.SLURMCluster.html) my feeling is that this might be something more specific to the configuration you're passing to generate the cluster, rather than something specific to dagster

Reetika Gour

06/19/2023, 8:32 AM

Yes, I am able to create a slurm cluster manually as shown in the link you shared. But same thing does not work when using "dask_executor. The problem remains same with this as described earlier. Also please help me in materialising the assets, each of the ops above generates some output (mostly csv or txt files) which user wants to have it available on the UI.

owen

06/21/2023, 6:45 PM

for the metadata, you can do something along the lines of:

Copy code

context.add_output_metadata(
    {"output": MarkdownMetadataValue(<markdown-formatted string of your csv/txt>)
)

inside the body of your ops/assets. This will allow the output to be viewable in the UI. I'd recommend avoiding this route if the csv output gets particularly large, as each of these metadata entries does need to be stored in the database (so you probably wouldn't want to have 10s of megabytes of markdown in there).

owen

06/21/2023, 6:46 PM

I was able to replicate the issue you were describing above, and I believe it's related to the fact that the default n_workers for a SlurmCluster is 0 (so no workers were available to do any tasks)

owen

06/21/2023, 6:47 PM

I was able to get beyond that by configuring

"n_workers": 1

, but ended up with an error about no

sbatch

file being found. This seems like it's more in the domain of Slurm-specific stuff

Reetika Gour

06/30/2023, 8:57 AM

I tried configuring

"n_workers": 1

but no luck. It just hangs indefinitely.

Reetika Gour

06/30/2023, 8:58 AM

I don't see slurm specific logs in the dagster logs. Any idea?

owen

06/30/2023, 4:53 PM

If you look in the terminal where you run

dagster dev

dagit

, do you see that

sbatch

error? My understanding here is that this SlurmCluster will only work if you're on an actual slurm cluster when running it, which is likely not the case if you're running locally, although I'll admit again that I'm not familiar with slurm.

Reetika Gour

07/01/2023, 4:57 AM

No, I don't see any

sbatch

error in the terminal.

Open in Slack

Previous Next