I'm trying to deploy a repository.py which lives i...
# ask-community
w
I'm trying to deploy a repository.py which lives in a different directory than where DAGSTER_HOME is pointing to. My first thought is to specify it in the workspace.yaml something like this:
Copy code
load_from:
    - python_file: /home/user/project/project_a/repository.py
But dagster-daemon errors out with "Cannot load all address". I unfortunately didn't copy the whole error, but suffice to say, it's having issues. What is the correct way to do this? This is obviously needed for multi project repositories.
j
Hey @Will Gunadi I think the syntax you're looking for is
Copy code
load_from:
   - python_file: /home/user/project/project_a/repository.py
This docs page also specifies other ways you can add repositories (by python module, package) and how to add multiple repositories to a workspace.yaml https://docs.dagster.io/concepts/repositories-workspaces/workspaces#defining-a-workspace
w
I got the syntax correct, it's working for any repositories in the same folder as dagster. But it breaks down when I point it to anything outside.
j
when you run
dagit
are you in the directory containing
workspace.yaml
or in
DAGSTER_HOME
?
w
I think I ran dagit from a folder called dagster, which is sibling to folder project_a which contains the .yaml files and the repository.py And the DAGSTER_HOME was set to project_a
j
ok! when you run
dagit
you'll need to either run it in the same folder as
workspace.yaml
or provide the CLI arg
dagit -w /path/to/workspace.yaml
For
DAGSTER_HOME
you don't need to set it to project_a in order for dagster to work.
DAGSTER_HOME
should point to a directory where you maintain any dagster instance level configuration and want compute logs to be stored. typically this is a directory separate from any dagster code (ops, jobs, etc). for example, mine's set to
$HOME/dagster_home
. here's a bit more info on dagster instances and
DAGSTER_HOME
(it gets a bit in the weeds, but the beginning of the page may help provide some context on what's going on) https://docs.dagster.io/deployment/dagster-instance#dagster-instance
w
I looked closer at the error, it is actually a grpc error:
Copy code
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.UNAVAILABLE
        details = "failed to connect to all addresses"
        debug_error_string = "{"created":"@1648141611.056687082","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3150,"referenced_errors":[{"created":"@1648141611.056686622","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":165,"grpc_status":14}]}"
>
@jamie
d
Hi Will - just want to nail down the exact repro steps here. When you have your workspace.yaml set in a certain way, you reliably get that failed to connect to all addresses issue? Do you have a full stack trace / possible to post or DM the full output of your dagster-daemon process while this is happening?
w
Copy code
Stack Trace:
  File "/home/etl/f45_v.1.2_0.14.1/v_f45_v.1.2_0.14.1/lib/python3.8/site-packages/dagster/core/host_representation/grpc_server_registry.py", line 206, in _get_grpc_endpoint
    server_process = GrpcServerProcess(
  File "/home/etl/f45_v.1.2_0.14.1/v_f45_v.1.2_0.14.1/lib/python3.8/site-packages/dagster/grpc/server.py", line 1095, in __init__
    self.server_process = open_server_process(
  File "/home/etl/f45_v.1.2_0.14.1/v_f45_v.1.2_0.14.1/lib/python3.8/site-packages/dagster/grpc/server.py", line 1008, in open_server_process
    wait_for_grpc_server(server_process, client, subprocess_args, timeout=startup_timeout)
  File "/home/etl/f45_v.1.2_0.14.1/v_f45_v.1.2_0.14.1/lib/python3.8/site-packages/dagster/grpc/server.py", line 943, in wait_for_grpc_server
    raise Exception(
@daniel
That's the only stack trace I have
d
Usually there's a message like "Timed out waiting for gRPC server" or "gRPC server exited with return code" - do you see anything like that?
w
Copy code
2022-03-24 17:24:14 +0000 - dagster.daemon.SchedulerDaemon - WARNING - Could not load location f45_repository.py to check for schedules due to the following error: Exception: Timed out waiting for gRPC server to start with arguments: "/home/etl/f45_v.1.2_0.14.1/v_f45_v.1.2_0.14.1/bin/python3 -m dagster api grpc --lazy-load-user-code --socket /tmp/tmpvsfehyw4 --heartbeat --heartbeat-timeout 120 --fixed-server-id 714f272c-65e8-482e-a2ad-98f2dfe6aabb --log-level WARNING --use-python-environment-entry-point -f /home/etl/f45_v.1.2_0.14.1/f45_repository.py". Most recent connection error: dagster.core.errors.DagsterUserCodeUnreachableError: Could not reach user code server
d
Try running that command yourself in the same folder as the daemon, maybe that will give us a clue why it's not starting up
Copy code
/home/etl/f45_v.1.2_0.14.1/v_f45_v.1.2_0.14.1/bin/python3 -m dagster api grpc --lazy-load-user-code --socket /tmp/tmpvsfehyw4 --heartbeat --heartbeat-timeout 120 --fixed-server-id 714f272c-65e8-482e-a2ad-98f2dfe6aabb --log-level WARNING --use-python-environment-entry-point -f /home/etl/f45_v.1.2_0.14.1/f45_repository.py
The expected behavior is that that should output something like
Copy code
2022-03-24 09:17:36 -0500 - dagster.code_server - INFO - Started Dagster code server for module dagster_test.toys.repo on port 4000 in process 2048
but that error message indicates that it's hanging for some reason or taking an unreasonable amount of time to start up
w
Does the port 4000 has to be open to public? I'm running an EC2 instance
Which I have no access to the aws-console for it.
And yes, that command just hangs indefinitely. But I also notice that the /tmp/ socket file doesn't exist
d
Got it - you could swap it out with
--port <some number>
just to rule it out. Is there anything in f45_repository.py that might explain why it would be hanging or taking a long time when dagster tries to import it?
like a side-effect that tries to do something slow?
e.g. if you just run
python /home/etl/f45_v.1.2_0.14.1/f45_repository.py
does that return cleanly? or also hang?
w
Wait... it does hang. Let me check what's causing it.
@daniel you are a lifesaver! 😄
d
nice - what did it end up being? Would be nice if we could make the error message here less cryptic
w
That trick to check if the repository.py actually loads is very useful. It's hanging because it's trying to connect to the wrong PG database that this EC2 instance has no access to
d
ah, got it