Trying to connect a multi-container Dagster deploy...
# ask-community
n
Trying to connect a multi-container Dagster deployment running on GCP to Cloud SQL with SSL enabled, but unable to find the documentation. This channel covers how-to for K8s, but not a simpler docker-compose deployment: https://dagster.slack.com/archives/C01U954MEER/p1651005567014399?thread_ts=1651005022.510339&cid=C01U954MEER It's a fairly straightforward proposition, but I'm not finding documentation for setting cert paths for Postgres storage in
dagster.yaml
(https://docs.dagster.io/deployment/dagster-instance#postgres-storage) or in the
dagster_postgres
source (https://docs.dagster.io/_modules/dagster_postgres/event_log/event_log#PostgresEventLogStorage). Many thanks for any help!
I'm seeing now
dagster.yaml
accepts Postgres connection string
m
@Nicolas May Are you connecting to the Cloud SQL instance via a private ip, public ip, or using the Cloud SQL proxy? I am currently using the private ip (see code here), but the ideal pattern would be to run Cloud SQL Proxy in a sidecar pattern and have Dagster connect to 127.0.0.1. That would create an encrypted connection between the two. If you wanted to team up on communicating that in a GitHub issue somewhere perhaps we can help the Dagster team add support for it?
n
Private IP
m
@sean @yuhan @sandy Would it be helpful for me to document somewhere Google’s recommendation for how a resource in GKE should connect to a database in Cloud SQL? It involves running their Cloud SQL Proxy image in a sidecar pattern (example) and I do not believe that is possible with the Helm chart today. That can help @Nicolas May and I determine the best practice here. If sidecar pattern with Cloud SQL Proxy is not possible then we have to figure out SSL with a private IP.
@Nicolas May I hope I am not highjacking this thread from your original ask!
n
@marcos I guess I'm not seeing where you're configuring the connection to the private IP
n
I should rephrase... I guess I'm not seeing where you're configuring the SSL-enabled connection to the private IP
What I'm piecing together is that we have to pass a connection string in
dagster.yaml
like...
Copy code
storage:
  postgres:
    postgres_url: <postgresql://user:pass@10.0.0.1:5432/mydatabase?sslmode=verify-ca&sslrootcert=server-ca.pem&sslcert=client-cert.pem&sslkey=client-key.pem>
... but obvi pulling the conn url from env:
.env
Copy code
PG_DB_CONN_STRING=<postgresql://user:pass@10.0.0.1:5432/mydatabase?sslmode=verify-ca&sslrootcert=server-ca.pem&sslcert=client-cert.pem&sslkey=client-key.pem>
dagster.yaml
Copy code
storage:
  postgres:
    postgres_url:
      env: PG_DB_CONN_STRING
m
Today I am using a private ip without SSL. I don’t love it so if you figure out how to add SSL to that, please do let me know. I am advocating for Cloud SQL Auth proxy over that if we can get it to work (based on this documentation)
n
Same here... SSL is disabled... Security here at work no like 😁 I'll let you know if the pg conn string approach works, and ya, happy to collab on finding any working solution and implementing a best-practice solution
🙏🏽 1
Thanks for sharing what you've got
@marcos This Postgres connection url string works
m
@Nicolas May Amazing! OK. I will be moving my deployments to use the private IP with SSL then. Did you have to download a client certificate in Cloud SQL and store that in GKE somewhere or was it really just updating the yaml file to use the connection string format with those arguments?
y
Hi @marcos it’d definitely be helpful! Please feel free to start a github discussion - it might be the most lightweight for you to start with and can also collaborate with us or other community members.
👍🏽 1
n
@marcos @yuhan For our deployment (again, a multi-container via docker compose), there were a few things to update to make this work: (1) Download the certs from Cloud SQL. The certs need to be accessible from dagit and dagster-daemon services. We'll figure out how to improve this down the road, but for now the certs are on the VM and the relevant VM
docker-compose.yaml
services have bind mounts to the certs directory. For example:
Copy code
docker_dagit:
    image: us-central1-docker.pkg.dev/project-foo/foo/dagster_multi_container_docker_dagit
    container_name: docker_dagit
    entrypoint:
      - dagit
      - -h
      - "0.0.0.0"
      - -p
      - "3000"
      - -w
      - workspace.yaml
    expose:
      - "3000"
    ports:
      - "3000:3000"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /tmp/io_manager_storage:/tmp/io_manager_storage
      - type: bind                                       ### <-- HERE
        source: /host/path/to/certs                      ### <-- HERE
        target: /container/path/to/certs/                ### <-- HERE
    environment:
      DAGSTER_POSTGRES_HOST: "${DAGSTER_POSTGRES_HOST}"
      DAGSTER_POSTGRES_DB: "${DAGSTER_POSTGRES_DB}"
      DAGSTER_POSTGRES_USER: "${DAGSTER_POSTGRES_USER}"
      DAGSTER_POSTGRES_PASSWORD: "${DAGSTER_POSTGRES_PASSWORD}"
      PG_DB_CONN_STRING: "${PG_DB_CONN_STRING}"          ### <-- HERE
    networks:
      - docker_network
    depends_on:
      - docker_example_user_code
^^^ Also do the above for dagster-daemon service. (2) Also notice that the docker-compose service above has a new env var
PG_DB_CONN_STRING
, which is passed from the
.env
file. All the services have this env var defined as
PG_DB_CONN_STRING: "${PG_DB_CONN_STRING}"
. I still have to clean up the environment variables here. (3a)
dagster.yaml
-
run_launcher
needs two things: (a) the
PG_DB_CONN_STRING
passed to
env_vars
config, (b) a volume map to the dir that contains the certs, like
/host/path/to/certs/:/container/path/to/certs
. If this isn't updated, the containerized runs just freeze... no errors as far as I could tell. For example:
Copy code
run_launcher:
  module: dagster_docker
  class: DockerRunLauncher
  config:
    env_vars:
      #- DAGSTER_POSTGRES_HOST
      #- DAGSTER_POSTGRES_DB
      #- DAGSTER_POSTGRES_USER
      #- DAGSTER_POSTGRES_PASSWORD
      - PG_DB_CONN_STRING                ### <-- HERE
    network: docker_network
    container_kwargs:
      volumes:
        - /var/run/docker.sock:/var/run/docker.sock
        - /tmp/io_manager_storage:/tmp/io_manager_storage
        - /host/path/to/certs:/container/path/to/certs  ### <-- HERE
(3b)
dagster-compose.yaml
-
storage
needs to be updated. You can get rid of all the
DAGSTER_POSTGRES_*
stuff.
Copy code
storage:
  postgres:
    postgres_url:
      env: PG_DB_CONN_STRING
(4)
.env
on the VM needs to be updated to include
PG_DB_CONN_STRING
Copy code
PG_DB_CONN_STRING=<postgresql://user:pass@10.0.0.1:5432/mydatabase?sslmode=verify-ca&sslrootcert=/container/path/to/certs/server-ca.pem&sslcert=/container/path/to/certs/client-cert.pem&sslkey=/container/path-to/certs/client-key.pem>
I believe that's all of it. Keep me posted if you're so inclined.