Receiving `Error: pg_config executable not found.`...
# dagster-serverless
t
Receiving
Error: pg_config executable not found.
error in GitHub actions
Hi, I am trying to use the baked in GitHub actions to deploy to Dagster Serverless, however we are getting the above error in the
Build and deploy to Dagster Cloud serverless
step. See log output below. This is similar to this thread, however there doesn’t appear to be a resolution on that thread. Can someone please assist?
Copy code
#10 68.47 Collecting dbt-core
#10 68.50   Downloading dbt-core-0.14.4.tar.gz (540 kB)
#10 68.51      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 540.3/540.3 kB 53.2 MB/s eta 0:00:00
#10 68.59   Preparing metadata (setup.py): started
#10 68.90   Preparing metadata (setup.py): finished with status 'done'
#10 68.93 Collecting dbt-postgres==0.14.4
#10 68.94   Downloading dbt-postgres-0.14.4.tar.gz (7.6 kB)
#10 68.95   Preparing metadata (setup.py): started
#10 69.21   Preparing metadata (setup.py): finished with status 'done'
#10 69.38 Collecting psycopg2<2.8,>=2.7
#10 69.45   Downloading psycopg2-2.7.7.tar.gz (427 kB)
#10 69.46      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 427.5/427.5 kB 38.3 MB/s eta 0:00:00
#10 69.54   Preparing metadata (setup.py): started
#10 69.84   Preparing metadata (setup.py): finished with status 'error'
#10 69.85   error: subprocess-exited-with-error
#10 69.85   
#10 69.85   × python setup.py egg_info did not run successfully.
#10 69.85   │ exit code: 1
#10 69.85   ╰─> [23 lines of output]
#10 69.85       running egg_info
#10 69.85       creating /tmp/pip-pip-egg-info-9l56y9cr/psycopg2.egg-info
#10 69.85       writing /tmp/pip-pip-egg-info-9l56y9cr/psycopg2.egg-info/PKG-INFO
#10 69.85       writing dependency_links to /tmp/pip-pip-egg-info-9l56y9cr/psycopg2.egg-info/dependency_links.txt
#10 69.85       writing top-level names to /tmp/pip-pip-egg-info-9l56y9cr/psycopg2.egg-info/top_level.txt
#10 69.85       writing manifest file '/tmp/pip-pip-egg-info-9l56y9cr/psycopg2.egg-info/SOURCES.txt'
#10 69.85       
#10 69.85       Error: pg_config executable not found.
#10 69.85       
#10 69.85       pg_config is required to build psycopg2 from source.  Please add the directory
#10 69.85       containing pg_config to the $PATH or specify the full executable path with the
#10 69.85       option:
#10 69.85       
#10 69.85           python setup.py build_ext --pg-config /path/to/pg_config build ...
#10 69.85       
#10 69.85       or with the pg_config option in 'setup.cfg'.
#10 69.85       
#10 69.85       If you prefer to avoid building psycopg2 from source, please install the PyPI
#10 69.85       'psycopg2-binary' package instead.
#10 69.85       
#10 69.85       For further information please check the 'doc/src/install.rst' file (also at
#10 69.85       <<http://initd.org/psycopg/docs/install.html>>).
#10 69.85       
#10 69.85       [end of output]
#10 69.85   
#10 69.85   note: This error originates from a subprocess, and is likely not a problem with pip.
#10 69.86 error: metadata-generation-failed
#10 69.86 
#10 69.86 × Encountered error while generating package metadata.
#10 69.86 ╰─> See above for output.
#10 69.86 
#10 69.86 note: This is an issue with the package mentioned above, not pip.
#10 69.86 hint: See above for details.
#10 70.05 
#10 70.05 [notice] A new release of pip is available: 23.0.1 -> 23.2.1
#10 70.05 [notice] To update, run: pip install --upgrade pip
#10 ERROR: process "/bin/sh -c if [ -f \"setup.py\" ]; then         pip install .;     fi" did not complete successfully: exit code: 1
------
 > [4/9] RUN if [ -f "setup.py" ]; then         pip install .;     fi:
69.86 error: metadata-generation-failed
69.86 
69.86 × Encountered error while generating package metadata.
69.86 ╰─> See above for output.
69.86 
69.86 note: This is an issue with the package mentioned above, not pip.
69.86 hint: See above for details.
70.05 
Notice: 70.05 [notice] A new release of pip is available: 23.0.1 -> 23.2.1
Notice: 70.05 [notice] To update, run: pip install --upgrade pip
------
Dockerfile:13
--------------------
  12 |     # copying all other files
  13 | >>> RUN if [ -f "setup.py" ]; then \
  14 | >>>         pip install .; \
  15 | >>>     fi
  16 |     
--------------------
ERROR: failed to solve: process "/bin/sh -c if [ -f \"setup.py\" ]; then         pip install .;     fi" did not complete successfully: exit code: 1
Error: buildx failed with: ERROR: failed to solve: process "/bin/sh -c if [ -f \"setup.py\" ]; then         pip install .;     fi" did not complete successfully: exit code: 1
Our
setup.py
file looks like this
j
hey @Todd de Quincey you might need to set a different python version https://docs.dagster.io/dagster-cloud/deployment/serverless#using-a-different-python-version by default dagster serverless uses 3.8
rerunning the github action with debug logging enabled might also provide a better error
t
Thanks, @Joe. I’ve already updated the image to run 3.9.
I’ll try with debug, but I think the error is sufficiently clear to see where the error is.
Given the time differences, do you have any other ideas? I really need to have this up and running by tomorrow.
j
understood. I think trying to use PEX based deploys might be one way to avoid this issue https://docs.dagster.io/dagster-cloud/deployment/serverless#disabling-pex-based-deploys I'm kinda surprised your not using those atm,
ENABLE_FAST_DEPLOYS: 'true'
in your github action envs
If you can get an error other than an opaque "failed at pip install" you might be able to work around the failure with lifecycle hook scripts https://docs.dagster.io/dagster-cloud/deployment/serverless#using-a-different-base-image-or-using-native-dependencies or switch to using your own base image entirely
I think PEX will be the easiest route
@Todd de Quincey sorry for not looking at your original stacktrace in more detail it looks like
Copy code
#10 69.85       If you prefer to avoid building psycopg2 from source, please install the PyPI
#10 69.85       'psycopg2-binary' package instead.
might be the real issue so maybe adding
psycopg2-binary
to your setup.py or using the lifecycle hook scripts to install postgres
t
Hi @Joe, Unfortunately, I tried both of these paths (probably should have already said)! Originally, I had
ENABLE_FAST_DEPLOYS: 'true'
, but that caused the first step in the build to fail (see the attached). So I set that to False, and everything worked fine and now I get the above error. I tried just installing the binary, but no love either. I will have a play around again today and see where I get.
Ahhhhh. I see why the Dagster serverless step was failing. Missing
dagster_cloud
dependency. I will add that today and see if that fixes it
Thanks for the breadcrumbs. This might have solved it 🤞
Hi @Joe, So fixing / using fast deploys did the trick. Thanks again for the assistance
Although, I need to go back to docker deploys to run a post script. But now I have this issue 😞 We’re slooooowly getting there though 🙂
j
@Todd de Quincey you might need to modify the file permission to be executable
Copy code
chmod +x ./dagster_cloud_pre_install.sh
t
@Joe Shouldn’t the base image already provide access to this file given its copying it over and it’s a “standard Dagster file”?
Otherwise, unless I am misunderstanding, I’ll need to create my own image etc. Which I am trying to avoid at this stage since this is a PoC
j
can you share the contents of the pre install script?
the dockerfile doesn't modify any file permissions so if you haven't allowed execution of a file then i don't think it will run successfully
when you modify the file permission you should be able to commit and push it with that changed file metadata
t
Thanks, @Joe. I am trying to pre-install the postgres binary to resolve the original issue from this thread (i.e.
Error: pg_config executable not found.
). Your original suggestion to more simply use the
pex
deployment worked, but unfortunately I need to use the
dagster_cloud_post_install
hook to forward the local ports via our SSH server. So I have been forced back to the docker deploy. But, obviously I need to resolve this issue first. So as you can see, I’ve gone around in a few loops trying to get this to work so far. Per my quick scour of the web, installing the binary in the pre-hook like so should fix my original issue (if not there are a few other ways to do this, but this is the first port of call)
Copy code
#!/bin/sh
python -m pip install psycopg2-binary
j
oof thanks for working through this stuff. Im going to make sure we update our docs/gh actions to help avoid this for future users. Did you try the chmod change?
python3.8 -m
might also be the issue (or whatever version you have configured)
actually I think dropping
python -m
should work
t
Trying the chmod atm. I presume you meant by updating the git index (i.e.
git update-index --chmod=+x ./dagster_cloud_pre_install.sh
)? I didn’t know you could do this with git until today
j
no just running
chmod +x ./dagster_cloud_pre_install.sh
and then
git add ./dagster_cloud_pre_install.sh
there will be no changes to commit if the file is already executable
which you can also verify with
Copy code
ls -l ./dagster_cloud_pre_install.sh
t
We’re cooking with gas. The build seems to progress and it runs the pre-script. I still get the same
Error: pg_config executable not found.
error, but I should be able to work through that now TIL a few things! So thanks for your patience and assistance, Joe. Much appreciated
j
no problems! happy to help get this stuff setup + thanks for the feedback
t
Tell the boss that I owe you a beer… Just put it onto our Cloud bill
🍻 1
Hi again @Joe, Sorry to bother you again. We are reeeeeeealy struggling to get the ssh port forwarding to work on Dagster Cloud correctly. We have no problems with it working locally. The post deployment script in the CD appears to run without a problem, yet, when we run the tasks in Dagster Cloud, we get the same old connection error, indicating that the ports aren’t forwarded (screenshot attached). I am presuming that post the deployment, the below script is actually run on the Dagster Cloud servers / in our docker containers? So the port forwarding (which the test in the below succeeded on CD) is applied to the actual instance/Serverless workers? I am thinking it might be easier to switch the Hybrid, or potentially self-host so that this is inside our VPC. Also a final note, I have ruled out that it is an IP whitelisting issue, as we temporarily allowed all IPV4 ip addresses. Any breadcrumbs would be most welcome
Copy code
REDACTED
j
I am presuming that post the deployment, the below script is actually run on the Dagster Cloud servers / in our docker containers?
the script will only be run at build time in github actions not on startup in dagster cloud
setting up ssh tunnels is something that you can do in your dagster resource code https://dagster.slack.com/archives/C01U954MEER/p1688624749164789?thread_ts=1688577139.518379&amp;cid=C01U954MEER is an example of a user doing the same thing as you i think and dagster-ssh can help for reference
t
Hmmm ok. Thanks, Joe. Getting a little bit of conflicting info. As that was the approach I first started to take. But in another thread (I’ll go find it), it was suggested to me to setup the port forwarding in the dragster post install script.
j
oh : ( owen might be recommending modifying iptables or other static config that will then be used when the image starts
t
Right. I see the source of confusion
So long story short, I need to create an SSH resource to forward the ports when the dbt commands are invoked
If so, I’ll head back down that path
j
yeah i'd go that route first so you can keep using pex
t
Thanks mate. I’ll see how I get on
For the life of me, I cannot get this to work. This complains that
AttributeError: 'ResourceDefinition' object has no attribute 'get_tunnel'
, but then if I drop the
get_tunnel
, how/where do I define the tunnel details?
Copy code
assets = with_resources(
    load_assets_from_dbt_project(
        profiles_dir=DBT_PROJECT_PATH,
        project_dir=DBT_PROFILES,
        else None,
        use_build_command=False,  # Delete when ready to launch to run tests
        display_raw_sql=True,
    ),
    {
        "dbt": dbt_cli_resource.configured(
            {
                "project_dir": DBT_PROJECT_PATH,
                "profiles_dir": DBT_PROFILES,
            },
        ),
        "ssh": ssh_resource.configured(
            {
                "remote_host": os.getenv("SSH_HOST"),
                "remote_port": 22,
                "username": os.getenv("SSH_USER"),
                "key_file": "~/.ssh/dagster_rsa",
            }
        ).get_tunnel(remote_port=5439, remote_host="localhost", local_port=5439),
    },
)
j
https://docs.dagster.io/concepts/resources#using-ops-and-jobs in the op/asset definition you should be able to access an instance of the ssh resource and call get_tunnel
t
Yeah I gathered that after reading the docs. But bit I don’t get is how to do this when we are just blindly loading the assets with
load_assets_from_dbt_project
. I have asked this in the main support channel
j
oh good point yeah i'm not sure what the best way to do that would be