is it possible to launch a run in a docker container and the dagster #ask-community

is it possible to launch a run in a docker contain...

Danny Steffy

03/03/2023, 7:48 PM

is it possible to launch a run in a docker container, and then have each fanned out op step be done in_process or multi_process? I'm not sure how the DockerRunLauncher interacts with the executor for each op step.

daniel

03/03/2023, 7:49 PM

Hi Danny - doing it multi_process is actually the default behavior if you use the DockerRunLauncher - it still uses the default executor which is described here: https://docs.dagster.io/concepts/ops-jobs-graphs/job-execution#default-job-executor

Danny Steffy

03/03/2023, 7:52 PM

we have an

Engine_Event

saying

Starting execution with step handler DockerStepHandler.

Does that launch a new container for each step? How do I configure it not to do that for each step?

daniel

03/03/2023, 7:53 PM

I would only expect that to show up if you've explicitly enabled the docker_executor in your job

Danny Steffy

03/03/2023, 7:53 PM

I have a defined job on a sensor that has that specified, but does the UI use that job as well for the asset?

daniel

03/03/2023, 7:54 PM

I wouldn't expect it to if it's solely defined on the sensor - any chance you can share the code?

Danny Steffy

03/03/2023, 7:55 PM

here's the graph of ops and asset definition from the graph:

Copy code

@graph
def full_scored_data_set(recruiter_teams_to_score, trained_model):
    """chunk the recruiters to score and score them in batches"""
    # result = (
    #     chunking_op(recruiter_teams_to_score)
    #     .map(lambda batch: generate_score_for_ids(batch, trained_model))
    #     .collect()
    # )
    result = (
        key_to_score(recruiter_teams_to_score)
        .map(
            lambda key: score_data_set(
                data_to_score_for_key(recruiter_in_batch=key), trained_model
            )
        )
        .collect()
    )
    return append_scores(result)


graph_asset = AssetsDefinition.from_graph(
    full_scored_data_set,
    keys_by_input_name={
        "recruiter_teams_to_score": AssetKey("recruiter_teams_to_score"),
        "trained_model": AssetKey("trained_model"),
    },
    resource_defs={"mem": mem_io_manager},
)

Danny Steffy

03/03/2023, 7:55 PM

here's the sensor definition:

Copy code

distributed_scoring_sensor(
        define_asset_job(
            "distributed_scoring_job",
            selection=AssetSelection.groups("distributed_scoring"),
            executor_def=docker_executor,
        )
    ),

daniel

03/03/2023, 7:56 PM

If you're then launching a job called "distributed_scoring_job" in dagit, then i would expect that to use that executor config

Danny Steffy

03/03/2023, 7:56 PM

I'm launching it via the "Materialize" button in the UI

daniel

03/03/2023, 7:57 PM

And what's "It" here precisely? The "distributed_scoring_job" job?

Danny Steffy

03/03/2023, 7:57 PM

the

full_scored_data_set

asset

daniel

03/03/2023, 7:58 PM

Hmm, I'll ask around about this with somebody who will know this for sure

Danny Steffy

03/03/2023, 7:58 PM

thanks!

Danny Steffy

03/03/2023, 8:02 PM

our end goal is to have the run be spun up in a Docker container and have

data_to_score_for_keys

and

score_data_set

be run in-process so we can use the memory IO manager (we don't want to persist the data from

data_to_score_for_keys

as it's a lot of data to persist to disk). If that's not possible, I'm guessing I would need a custom IO manager that could store the data in a file and then clean up the file after the downstream op ingests it?

daniel

03/03/2023, 8:09 PM

is there also a Definitions object?

daniel

03/03/2023, 8:09 PM

(just double-checking that you didn't define a default executor there)

Danny Steffy

03/03/2023, 8:10 PM

ah yep, I have it in the Definitions object

daniel

03/03/2023, 8:10 PM

ah that makes more sense

Danny Steffy

03/03/2023, 8:11 PM

would removing that still have the run be launched in a Docker Container?

Danny Steffy

03/03/2023, 8:11 PM

and then each step would be the default multi_process?

daniel

03/03/2023, 8:11 PM

yeah, that's controlled by the DockerRunLauncher

👍 1

Danny Steffy

03/03/2023, 8:11 PM

excellent! thanks for the help!

condagster 1

Danny Steffy

03/03/2023, 8:17 PM

ah, so now the executor is only spinning up 4 processes at a time. Is it possible to increase that default concurrency limitation for every run in the UI?

daniel

03/03/2023, 8:18 PM

Yeah take a look at the examples with max_concurrent on that page: https://docs.dagster.io/concepts/ops-jobs-graphs/job-execution#default-job-executor

Danny Steffy

03/03/2023, 8:21 PM

right, but can I pass that config to the Definitions object somehow? Or would it be loaded in via the dagster.yaml config?

daniel

03/03/2023, 8:23 PM

Yeah, you can set default configuration by doing e.g.

Copy code

from dagster import multiprocess_executor

Definitions(
  ...,
  executor=multiprocess_executor.configured({"max_concurrent": 8})

Danny Steffy

03/03/2023, 8:24 PM

ah, prefect. thanks so much!

Danny Steffy

03/03/2023, 8:30 PM

so I take it it's not possible to have the

data_to_score_for_key

op be a multiprocess executor, but have the

score_data_set

op be an in_process executor? I was hoping to do that so that the

data_to_score_for_key

could be persisted only in memory until the

score_data_set

op is finished.

Danny Steffy

03/03/2023, 8:31 PM

if it's not possible, would a custom IO manager that cleans up it's own files be the better solution?

daniel

03/03/2023, 8:32 PM

Splitting out executors per op isn't possible yet, but it's something that's on our radar for a future improvement

👍 1

daniel

03/03/2023, 8:32 PM

What you're describing does sound more like an IO manager

Danny Steffy

03/03/2023, 8:35 PM

would a method similar to this work then: https://dagster.slack.com/archives/C01U954MEER/p1677866136690939?thread_ts=1677864919.402759&cid=C01U954MEER

daniel

03/03/2023, 8:37 PM

Hm this is a bit more out of my wheelhouse, could you maybe make a new post for the IOManager piece and our OSS oncall who knows these things better can weigh in?

Danny Steffy

03/03/2023, 8:38 PM

absolutely, thanks for your help with all of this! Absolutely loving Dagster so far

condagster 1

8 Views

Open in Slack

Previous Next