dagster queued run mount duckDB! mount warehouse_...
# ask-community
g
dagster queued run mount duckDB! mount warehouse_location I am trying to get my dagster pipeline to work inside docker. For this I am following along with: - https://github.com/dehume/big-data-madison-dagster - https://github.com/dagster-io/dagster/tree/master/examples/deploy_docker In particular, https://github.com/dagster-io/dagster/blob/master/examples/deploy_docker/dagster.yaml#L11 suggests using
DockerRunLauncher
. For both of
dagit
and
dagster-daemon
I have enabled docker-in-docker by mounting: https://github.com/dagster-io/dagster/blob/master/examples/deploy_docker/docker-compose.yml#L61
/var/run/docker.sock:/var/run/docker.sock
But I only get:
Copy code
DockerException: Error while fetching server API version: ('Connection aborted.', PermissionError(13, 'Permission denied'))

 File "/opt/conda/lib/python3.9/site-packages/dagster/core/instance/__init__.py", line 1698, in launch_run
    self._run_launcher.launch_run(LaunchRunContext(pipeline_run=run, workspace=workspace))
  File "/opt/conda/lib/python3.9/site-packages/dagster_docker/docker_run_launcher.py", line 152, in launch_run
    self._launch_container_with_command(run, docker_image, command)
  File "/opt/conda/lib/python3.9/site-packages/dagster_docker/docker_run_launcher.py", line 97, in _launch_container_with_command
    client = self._get_client(container_context)
  File "/opt/conda/lib/python3.9/site-packages/dagster_docker/docker_run_launcher.py", line 72, in _get_client
    client = docker.client.from_env()
  File "/opt/conda/lib/python3.9/site-packages/docker/client.py", line 96, in from_env
    return cls(
  File "/opt/conda/lib/python3.9/site-packages/docker/client.py", line 45, in __init__
    self.api = APIClient(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/docker/api/client.py", line 197, in __init__
    self._version = self._retrieve_server_version()
  File "/opt/conda/lib/python3.9/site-packages/docker/api/client.py", line 221, in _retrieve_server_version
    raise DockerException(
when executing:
Copy code
docker compose --profile dagster up --build
I am running Docker for Mac how can I get dagster to work nicely in this setup?
d
Hi Georg - do you get this same error if you try to build and run the deploy_docker example with no changes? If so, I use Docker for Mac and it works for me, so there must be some difference between our Docker setups..
g
When I docker-compose up for https://github.com/dagster-io/dagster/tree/master/examples/deploy_docker I can execute the DIND stuff from dagster - interstingly.
But for me in my setup - it fails. Even though I have mapped the docker socket in the exact same way. Do you have any idea what is causing the problem here?
d
That's very perplexing - the best recommendation I have is to triple check that that volume is actually mounted on both the dagit and daemon containers, because that should be all that you need for it to wrok
g
I can see the docker.sock being mounted in dagit
d
what about in the daemon?
g
same /var/run/docker.sock is available
Though:
USER dagster:dagster
the user in the dockerfile is not root
does the user inside the dockerfile need some special permissions?
I have changed the user - but still get a permission denied
Interestingly I also see failures like:
WARNING:root:Retrying failed database connection: (psycopg2.OperationalError) connection to server at "postgresql" (172.31.0.3), port 5432 failed: Connection refused
one step further: ImageNotFound: 404 Client Error for http+docker://localhost/v1.41/images/create?tag=latest&fromImage=other: Not Found ("pull access denied for other, repository does not exist or may require 'docker login': denied: requested access to the resource is denied")
but locally the client (on the mac side ) is logged in
ok - the docker stuff is fixed now
but I am stuck at: `dagster.core.errors.DagsterInstanceSchemaOutdated: Raised an exception that may indicate that the Dagster database needs to be be migrated. Database is at revision None, head is b601eb913efa. To migrate, run
dagster instance migrate
.`
when spinning up an empty container - I would expect that the migration is run automatically
and in fact, this works for the default example - so what is the difference in my code?
and:
(psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "pg_class_relname_nsp_index"
DETAIL:  Key (relname, relnamespace)=(secondary_indexes_id_seq, 2200) already exists.
[SQL:
CREATE TABLE secondary_indexes (
id SERIAL NOT NULL,
name VARCHAR(512),
create_timestamp TIMESTAMP WITHOUT TIME ZONE DEFAULT CURRENT_TIMESTAMP,
migration_completed TIMESTAMP WITHOUT TIME ZONE,
PRIMARY KEY (id),
UNIQUE (name)
)
is found in the logs
postgres is also showing me several: ERROR: duplicate key value violates unique constraint "instigators_selector_id_key" errors
Strangely, the SFTP connection to the container name:
sftp
fails: unable to connect to port 2222 on 192.168.48.5
(but works nicely for localhost)
furthermore: NotFound: 404 Client Error for http+docker://localhost/v1.41/containers/ffd4684f2076cc41d455b8928ba1126cdf0e605d0f39c4ce38316bf91450fc7c/start: Not Found ("network docker_example_network not found") the docker error is now again quite irritating
p
This is from a brand new DB getting initialized?
Wondering if this is some initialization race condition between
dagit
and
dagster-daemon
g
yes
I think I fixed a couple of mistakes - nonetheless I still cannot get the DIND queued docker launcher to work. Meanwhile 1) It no longer fails instantly but is stuck (without any further logs) when trying to spin up the container 2) dagster daemon fails to instantiate the SFTP connection 3) postgres is throwing duplicate_key constraint validation errors
d
Can you try removing your postgres container so that it brings it back up again from scratch?
docker rm <postgres container ID>
possibly with a -f
and try re-deploying?
g
tried that already a couple of times - also deleted the mapped volume directory
to replicate:
Copy code
git clone <https://github.com/geoHeil/dagster-ssh-demo.git>

cd dagster-ssh-demo

docker compose --profile dagster up --build
go to: http://localhost:3000/workspace/deploy_docker_repository@other/jobs/my_job/playground and try to launch
👀 1
(2) is fixed now
but now all (manual dummy and sensor initiated runs) are stuck in the startup phase.
10 runs are currently in progress. Maximum is 10, won't launch more.
And postgres keeps showing logs like: ERROR: duplicate key value violates unique constraint "instigators_selector_id_key" postgresql | 2022-04-22 172216.774 UTC [1140] DETAIL: Key (selector_id)=(40ae1d09616324124ae8fab93494603eee744f81) already exists. postgresql | 2022-04-22 172216.774 UTC [1140] STATEMENT: INSERT INTO instigators (selector_id, repository_selector_id, status, instigator_type, instigator_body) VALUES ('40ae1d09616324124ae8fab93494603eee744f81', '22ed74ad3bf3c735dd23193f2387c48c5a8cc556', 'AUTOMATICALLY_RUNNING', 'SENSOR', '{"__class__": "InstigatorState", "job_specific_data": {"__class__": "SensorInstigatorData", "cursor": null, "last_run_key": null, "last_tick_timestamp": 1650648132.730657, "min_interval": 30}, "job_type": {"__enum__": "InstigatorType.SENSOR"}, "origin": {"__class__": "ExternalJobOrigin", "external_repository_origin": {"__class__": "ExternalRepositoryOrigin", "repository_location_origin": {"__class__": "GrpcServerRepositoryLocationOrigin", "host": "ssh-demo", "location_name": "ssh-demo", "port": 4000, "socket": null}, "repository_name": "SSH_DEMO"}, "job_name": "foo_scd2_asset_sensor"}, "status": {"__enum__": "InstigatorStatus.AUTOMATICALLY_RUNNING"}}') RETURNING instigators.id
d
I would hope this wouldn't matter, but is it possible that postgres:11 vs. postgres:14.2 coul dmake a difference?
the example that is working for you has the former
i'm a little confused why they aren't waiting for the postgres container to spin up even though they have depends_on: set
actually taking this line out is making postgres behave better for me:
Copy code
-    volumes:
-      - ./postgres-dagster:/var/lib/postgresql/data
the broad advice that I have is to start with the example that is working and then incrementally add things that are different until it stops working - i think it will be a lot easier to isolate problems to a specific change that way
p
I also was able to get the example to working (taking out the line that @daniel mentioned as well as some
warehouse_location
volumen mounts).
g
nonetheless: I just commented out the 2 (volumes postgresql and warehouse_lcoation) but for me everything is still stuck in queued up startup phase.
also postgres 11 is showing the duplicate key warnings/errors
Are you able to actually get runs to finish running (and not have them stuck in the getting started phase?
p
Ah, I guess the runs didn’t successfully launch due to a docker permissions issue in the run coordinator:
pull access denied for ssh_demo_other, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
but I am not running into the same DB consistency checks that you are running into as the daemon is running
g
no -> this is fixed now if you pull the latest version. This was due to the
DAGSTER_CURRENT_IMAGE: "ssh-demo"
instead of
DAGSTER_CURRENT_IMAGE: "ssh_demo_ssh-demo"
p
I’m hitting the same error as before, (
pull access denied for ssh_demo_other
). switching line 151 to
ssh_demo_ssh-demo
generates the same error also:
Copy code
docker.errors.ImageNotFound: 404 Client Error for <http+docker://localhost/v1.41/images/create?tag=latest&fromImage=ssh_demo_ssh-demo>: Not Found ("pull access denied for ssh_demo_ssh-demo, repository does not exist or may require 'docker login': denied: requested access to the resource is denied")
g
I cloned into a fresh folder. I guess the name must be identical to the ones from
docker images
for the somehow in this new folder the name needed to change to
dagster-ssh-demo_ssh_demo
then I do not get the pul problem
However, runs are still stuck launching: dagster.daemon.QueuedRunCoordinatorDaemon - INFO - Launched 2 runs. but the logs show that they clearly have been received
When commenting out this block:
Copy code
# run_launcher:
#   module: dagster_docker
#   class: DockerRunLauncher
#   config:
#     env_vars:
#       - DAGSTER_POSTGRES_USER
#       - DAGSTER_POSTGRES_PASSWORD
#       - DAGSTER_POSTGRES_DB
#     network: dagster_network
#     container_kwargs:
#       auto_remove: true
the tasks start to execute.
(in dagster.yaml )
This seems strage though:
Copy code
dagster-daemon    | 2022-04-23 10:19:16 +0000 - dagster.daemon.QueuedRunCoordinatorDaemon - INFO - Retrieved 3 queued runs, checking limits.
dagster-daemon    | 2022-04-23 10:19:19 +0000 - dagster.daemon.QueuedRunCoordinatorDaemon - INFO - Launched 3 runs.
dagster-daemon    | INFO  [dagster.daemon.QueuedRunCoordinatorDaemon] Launched 3 runs.
dagster-daemon    | DEBUG [dagster.daemon.SchedulerDaemon] Not checking for any runs since no schedules have been started.
dagster-daemon    | DEBUG [dagster.daemon.QueuedRunCoordinatorDaemon] Poll returned no queued runs.
d
Do the runs show up in dagit? Usually the event log will say what container it tried to spin up, and open there are clues for why it didn't start in the logs for that container
g
yes - but are stuck in startup
unfortunately, so far I did not find any clues yet
I can only see: [DockerRunLauncher] Launching run in a new container 337b6b03bc2dc28bda38318700100de882ababb349388c587acb318b829c6cc3 with image dagster-ssh-demo_other
d
Run ‘docker logs’ with that container ID, any clues there?
g
dagster.core.errors.DagsterInvalidConfigError: Errors whilst loading configuration for {'postgres_url': Field(<dagster.config.source.StringSourceType object at 0x7fbe5fad2f10>, default=@, is_required=False), 'postgres_db': Field(<dagster.config.field_utils.Shape object at 0x7fbe5ae89be0>, default=@, is_required=False), 'should_autocreate_tables': Field(<dagster.config.config_type.Bool object at 0x7fbe60783340>, default=True, is_required=False)}. Error 1: Post processing at path rootpostgres dbhostname of original value {'env': 'DAGSTER_POSTGRES_HOSTNAME'} failed: dagster.config.errors.PostProcessingError: You have attempted to fetch the environment variable "DAGSTER_POSTGRES_HOSTNAME" which is not set. In order for this execution to succeed it must be set in this environment.
though I set these here: x-app-vars: &default-app-vars DAGSTER_POSTGRES_HOSTNAME: "postgresql"
(and pass it to all the containers)
d
That looks like some additional config is needed on the run launcher to include some env vars, check the example for reference
g
Interestingly: https://github.com/dagster-io/dagster/blob/master/examples/deploy_docker/docker-compose.yml (the working example for the docker launcher) is not setting the DAGSTER_POSTGRES_HOSTNAME variable. But: https://github.com/dagster-io/dagster/blob/master/examples/deploy_docker/dagster.yaml#L24 -- let me double check the dagster.yaml file
But why does the link you sent not include the hostname?
(and is working)
d
It may be using the default that postgres sets, I'm not positive
this fixes the run launcher
🎉 1
let me re-enable the other services successively and then work on mounting the volumes.
But I am curious - you mentioned that dropping the volume mount makes your postgres behave better - how did this have any influence here at all?
d
I'm not sure, I was just removing things that were different from the example until it started working again :)
It seemed like having it in the volume was keeping the postgres container from starting up correctly
g
Interestingly - this did not work for me. Anyways - let me check the next steps.
I am on to the next step now: The pyspark resource (in local mode) is not coming up:
Copy code
RuntimeError: Java gateway process exited before sending its port number
  File "/opt/conda/lib/python3.9/site-packages/dagster/core/errors.py", line 184, in user_code_error_boundary
    yield
  File "/opt/conda/lib/python3.9/site-packages/dagster/core/execution/resources_init.py", line 298, in single_resource_event_generator
    resource_def.resource_fn(context)
  File "/opt/conda/lib/python3.9/site-packages/dagster_pyspark/resources.py", line 53, in pyspark_resource
    return PySparkResource(init_context.resource_config["spark_conf"])
  File "/opt/conda/lib/python3.9/site-packages/dagster_pyspark/resources.py", line 20, in __init__
    self._spark_session = spark_session_from_config(spark_conf)
  File "/opt/conda/lib/python3.9/site-packages/dagster_pyspark/resources.py", line 15, in spark_session_from_config
    return builder.getOrCreate()
  File "/opt/conda/lib/python3.9/site-packages/pyspark/sql/session.py", line 228, in getOrCreate
    sc = SparkContext.getOrCreate(sparkConf)
  File "/opt/conda/lib/python3.9/site-packages/pyspark/context.py", line 392, in getOrCreate
    SparkContext(conf=conf or SparkConf())
  File "/opt/conda/lib/python3.9/site-packages/pyspark/context.py", line 144, in __init__
    SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
  File "/opt/conda/lib/python3.9/site-packages/pyspark/context.py", line 339, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway(conf)
  File "/opt/conda/lib/python3.9/site-packages/pyspark/java_gateway.py", line 108, in launch_gateway
    raise RuntimeError("Java gateway process exited before sending its port number")
But so far I did not see anything suspicious in the logs of the container.
this looks like JAVA_HOME is not set error
(but the logs) do not show this directly
This looks like https://github.com/geoHeil/dagster-ssh-demo/blob/master/environment.yml#L12 (conda pyspark) is not bringing in the downstream JDK dependency and not setting java home
spark is starting now
Is the postgres error/warning: ERROR: duplicate key value violates unique constraint "instigators_selector_id_key" anything I should worry about?
I can confirm that a lot of the dummy example jobs are working now! However the state_handling for the mappend volumes is not yet working as expected
Also strange: too many retries for DB connection sometimes dagit & the dagster-daemon fail to start with this error- albeit they are set to depend-on / wait for the DB container
@daniel is my suspicion correct that the ``DockerRunLauncher`` is not using the volume mappings defined in docker-compose? How can I set up the volume mappings for the run launcher so they are applied when starting the container?
d
g
thanks. Interesting: `
Copy code
repo.py:/opt/dagster/app/
fro mthe link looks like a relative path. Though for me when passing: ./warehouse_location_dagster:/opt/dagster/dagster_home/warehouse_location get an error message that relative paths are not allowed
Even when mounting an absolute path:
/path/to/dagster-ssh-demo/warehouse_location_dagster:/opt/dagster/dagster_home/warehouse_location
no files are written to this directory
d
I'm not at my keyboard currently but we’ll be able to take a closer look at this on Monday
🎉 1
g
@daniel did you already find the time to take a look at the volume mount problem?
d
I did not, but I have a bit of time now to investigate
is the github repo that you posted earlier up to date with the code that you're using that repros this?
what I think is happening with the postgres volume thing is that making it mount that data as a volume on startup causes it to take a lot longer to start up, so other services that depends on it start spewing some errors while they are waiting for postgres to be ready (adding
depends_on
in your docker-compose file just makes the containers wait for the container to start, they don't make it wait to be fully ready). For me the daemon and dagit eventually reach an OK place and are able to run correctly - there's just some spew at the beginning while they wait to be able to connect to postgres
there are some tips in the docker docs for controlling service order more explicitly if you want to make the other services specifically wait for postgres to be ready: https://docs.docker.com/compose/startup-order/
The "duplicate key value violates unique constraint" errors are actually expected currently (it adds the row if it doesn't exist, then updates it if it does exist and that constraint fires) but we'll see what we can do to make that less spew-y
that leaves the issue you described about mounting volumes in the run launcher not working - can you share more details about what exactly I should do to reproduce that? Which job i should run, what the exact expected behavior is vs. what you're seeing, and what code is supposed to be writing to the volume?
g
yes the code repository is up-to-date
In particular the IO managers are writing to this location https://github.com/geoHeil/dagster-ssh-demo/blob/master/SSH_DEMO/resources/parquet_io_manager.py#L99 the ingest assets https://github.com/geoHeil/dagster-ssh-demo/blob/master/SSH_DEMO/assets/ingest_assets.py read from t he SFTP docker container - and store it into the warehouse but they store it locally in the container which gets deleted (as the volume mappings which are applied a) from docker-compose and b) from the launch configuration of dagit.yaml for the docker-based executor somehow seem to work in a a different way
You do not need to run any job.
Copy code
git clone <https://github.com/geoHeil/dagster-ssh-demo.git>
cd dagster-ssh-demo

make start 
# or alternatively without make
docker compose --profile dagster up --build
is all what is needed - the sensors start to automatically poll the SFTP resource for ingestable files
d
l think when specifying volumes using container_kwargs, the key has to be an absolute path, not a relative one - that's actually a docker restriction: https://docker-py.readthedocs.io/en/stable/containers.html (and is different than docker-compose)
i.e. if you changed it from
Copy code
volumes:
-        - warehouse_location_dagster:/opt/dagster/dagster_home/warehouse_location
to something like (this is my absolute path, yours is probably different):
Copy code
volumes:
         - /Users/dgibson/dagster-ssh-demo/warehouse_location_dagster:/opt/dagster/dagster_home/warehouse_location
I think it would be more likely to work. I tried that and am still getting an error in your job, but i think that may be logic in your modified IO manager now? I'd hope that the volume would work as expectedn ow
when I ran
docker inspect <container ID>
on a launched container, I saw
Copy code
"Mounts": [
            {
                "Type": "bind",
                "Source": "/Users/dgibson/dagster-ssh-demo/warehouse_location_dagster",
                "Destination": "/opt/dagster/dagster_home/warehouse_location",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            }
        ],
which matched the path on the gRPC server
I see how needing to specify the absolute path is annoying - it's tricky because dagster.yaml is getting loaded and interpreted inside the Docker container, where it doesn't have any way to know what the base directory to use for a relative path that's referring to the filesystem outside of Docker
g
I experimented locally with an absolute path and had the same problem
let me re try/check with docker inspect
I get:
"Mounts": [ { "Type": "bind", "Source": "/Users/geoheil/Downloads/fooo/dagster-ssh-demo/warehouse_location_dagster", "Destination": "/opt/dagster/dagster_home/warehouse_location", "Mode": "", "RW": true, "Propagation": "rprivate" } ]
for:
- /Users/geoheil/Downloads/fooo/dagster-ssh-demo/warehouse_location_dagster:/opt/dagster/dagster_home/warehouse_location
But still get: Path does not exist: file:/opt/dagster/dagster_home/src/warehouse_location/foo_asset.
I think I need to adapt the path mapping to:
/Users/geoheil/Downloads/fooo/dagster-ssh-demo/warehouse_location_dagster:/opt/dagster/dagster_home/src/warehouse_location
(include the src)
Interestingly:
Error: No arguments given and workspace.yaml not found.
I get this error then from dagster daemon (did not have that one before this change)
Cool - this error indeed is solved with the absolute path and including the
src
in the mapping (see latest commit)
Except for the combined_assset_sensor everything else works fine. This sensor however does not fire from within docker.
199 Views