When running spark (in local mode) inside the dock...
# ask-community
g
When running spark (in local mode) inside the docker container of the dagster job/workspace how can I expose the port (4040) for its UI when using the DockerRunner of dagster?
(and also map it so it is available at the host)
j
I believe this should be possible using
container_kwargs
g
how? Cause I think if 2 runs are started in parallel it would fail (to map to the same port)
https://docker-py.readthedocs.io/en/stable/containers.html
None
, to assign a random host port. For example,
{'2222/tcp': None}
. might work
ports: - 4040: None
I hope this will work - let`s see
j
Ah got it
Any luck?
g
I am still working on some other steps / issues. Will update you once I know more.
👍 1
not fully there yet. But I already notice that when the job consists of multiple steps >> 1 spark session is instanciated. How can I prevent this and use a single shared sparksession (for multiple steps/ops/assets in a job)?
s
If you use the in-process executor and the pyspark resource, that sharing should happen automatically. If you're using any of the other executors, this isn't possible, because they execute each step in its own process and a constraint of spark itself is that spark sessions can't be shared across multiple processes
g
I am using the
docker_executor
is it possible to use the in_process one inside the docker container?
s
I can't think of any reason why those wouldn't work together, but @johann would know better than me
j
DockerRunLauncher
and
in_process
are compatible
g
how can I pass in_process_executor/executor_def to define_asset_job?
s
Right now, the recommendation is to supply it via the repository:
Copy code
@repository(default_executor_def=in_process_executor):
def repo():
    ...
But if you anticipate wanting different executors for different jobs or assets in the same repository, let us know, and it shouldn't be difficult for us to make that possible. cc @chris
g
Normally I would want the higher parallel multi process one (in the docker run launcher) and only for specific ones the in process one
But I still get number of parallel tasks spark sessions instanciated even when using the in_process executor - that is strange.
@johann Using dagster 0.15.7 and the docker run launcher I observe a weird error message: AttributeError: 'list' object has no attribute 'items' File "/opt/conda/lib/python3.9/site-packages/dagster/core/instance/__init__.py", line 1729, in launch_run self._run_launcher.launch_run(LaunchRunContext(pipeline_run=run, workspace=workspace)) File "/opt/conda/lib/python3.9/site-packages/dagster_docker/docker_run_launcher.py", line 152, in launch_run self._launch_container_with_command(run, docker_image, command) File "/opt/conda/lib/python3.9/site-packages/dagster_docker/docker_run_launcher.py", line 100, in _launch_container_with_command container = client.containers.create( File "/opt/conda/lib/python3.9/site-packages/docker/models/containers.py", line 877, in create create_kwargs = _create_container_args(kwargs) File "/opt/conda/lib/python3.9/site-packages/docker/models/containers.py", line 1095, in _create_container_args create_kwargs['host_config'] = HostConfig(**host_config_kwargs) File "/opt/conda/lib/python3.9/site-packages/docker/types/containers.py", line 392, in init self['PortBindings'] = convert_port_bindings(port_bindings) File "/opt/conda/lib/python3.9/site-packages/docker/utils/utils.py", line 100, in convert_port_bindings for k, v in iter(port_bindings.items()): When:
Copy code
ports:
        - 4040: None
is passed as it s configuration
@johann Same goes for:
- 4040: 4040
j
Looking at https://docker-py.readthedocs.io/en/stable/containers.html, it seems like the
ports
arg expects a dict not a list
I still get number of parallel tasks spark sessions instanciated even when using the in_process executor
Yeah this is unexpected- @chris any guesses?
g
https://dagster.slack.com/archives/C01U954MEER/p1658757616851729?thread_ts=1657300824.939669&cid=C01U954MEER is fixed now with a recent release of dagster and the define_assets_job accepting the execution context
👍 1
https://dagster.slack.com/archives/C01U954MEER/p1658755967333989?thread_ts=1657300824.939669&cid=C01U954MEER do you have an example how to pass it as a dict instead as a dict? ports: # Error: 'list' object has no attribute 'items' - 4040: 4040 failed before