Hello ! I might have missed something but I'm tryi...
# integration-bigquery
j
Hello ! I might have missed something but I'm trying to have an asset that queries a table on BigQuery, transform data and then push on BigQuery To query I'm trying the brand new
BigQueryResource
(and it works like a charm 🎉 ) but if I then try to add an io_manager linked to BigQuery on the same asset I have the following error:
Resource config error: gcp_credentials config for BigQuery resource cannot be used if GOOGLE_APPLICATION_CREDENTIALS environment variable is set.
My asset is declared this way:
Copy code
@asset(io_manager_key="bigquery_io_manager")
def apiBatch(bigquery: BigQueryResource):
And my definition is:
Copy code
defs = Definitions(
    assets = all_assets,
    resources={
        "bigquery": BigQueryResource(
                project = "PROJECT",
                location = "EU", 
                gcp_credentials = EnvVar("GCP_CREDS"),
        ),
        "bigquery_io_manager": BigQueryPandasIOManager(
            project="PROJECT",  
            location="EU",  
            dataset="mydataset",  
            gcp_credentials = EnvVar("GCP_CREDS"),
            timeout=15.0,  
        )
    },
)
I don't have this issue if I declare the io_manager_key to another asset; is it normal?
q
I think it's because you already have the environment variable
GOOGLE_APPLICATION_CREDENTIALS
set so you cannot have the line
Copy code
gcp_credentials=EnvVar("GCP_CREDS")
in the bigqueryPandasIOManager configuration
What happens when you remove this that line?
j
Aaahhh I thought that we had to declare it on each source, never thought it was cross ressources you're totally right thank you for your help !
q
Now that I think about it, I think it will be nice to have this
gcp_credential
argument override the
GOOGLE_APPLICATION_CREDENTIALS
when it is set cc @jamie
j
hey Justin, thanks for bringing this up. this is actually a funky edge case where the BQ resource will set GOOGLE_APPLICATION_CREDENTIALS but then the io manager also tries to set it. you can get around it by only setting creds on the resource, but the issue with that is that if the resource gets garbage collected, then the env var gets unset so then the io manager no longer has creds. let me think about the best way to solve this
one option would be to not fail, and instead print a warning but that could lead to silent issues if someone doesn’t realize the env var has already been set.
q
@jamie Is the environment variable set by Dagster or the google cloud sdk
j
it’s set by Dagster
i think there are better options for doing the auth by using some of googles python clients, but i need to research those and see what they offer
đź’Ż 1
q
I used google's python client with a service account and never had to set any environment variable
j
Thx for both your answers ! Btw I tried to run my code completely after removing the
gcp_credentials
on
bigquery_io_manager
and I no longer have the error at the initialization of the code but I have a new one when it's supposed to load data to BigQuery:
google.auth.exceptions.DefaultCredentialsError: Your default credentials were not found. To set up Application Default Credentials, see <https://cloud.google.com/docs/authentication/external/set-up-adc> for more information.
Seems like this way it's not finding the credentials 🤔
q
Are you using a service account?
j
Yes, and I don't have this issue if I separate into two different assets (to try on another project with the same definitions)
q
If you are developing locally, I'd suggest you set an environment variable with
GOOGLE_APPLICATION_CREDENTIALS
and the value will be the path to the service account. Without a service account, download the google-cloud-sdk and just do
gcloud auth login
This way you dont have to provide gcp_credentials on the io_manager or the resource.
j
Unfortunately I can't dev like that because of the way our CI/CD works, I need to keep the base64 GCP CREDS in the environment variable
For example, this works:
Copy code
@asset
def getBigQuery(bigquery: BigQueryResource) -> pd.DataFrame:
    with bigquery.get_client() as client:
        df =  client.query(
            (
                'SELECT * FROM testTable'
            ),
        ).to_dataframe()

        <http://logger.info|logger.info>(df.head())
        return df

@asset(io_manager_key="bigquery_io_manager")
def testExportDagster(getBigQuery: pd.DataFrame):
    return getBigQuery.dropna().drop_duplicates()
But the following code gives me the error `Resource config error: gcp_credentials config for BigQuery resource cannot be used if GOOGLE_APPLICATION_CREDENTIALS environment variable is set.`if I declare gcp_credentials on bigquery & bigquer_io_manager OR
google.auth.exceptions.DefaultCredentialsError: Your default credentials were not found. To set up Application Default Credentials, see <https://cloud.google.com/docs/authentication/external/set-up-adc> for more information.
if I remove the gcp_credentials from "bigquery_io_manager" as you proposed:
Copy code
@asset(io_manager_key="bigquery_io_manager")
def testExportDagster(bigquery: BigQueryResource):
    with bigquery.get_client() as client:
        df =  client.query(
            (
                'SELECT * FROM testTable'
            ),
        ).to_dataframe()

        <http://logger.info|logger.info>(df.head())
    return df.dropna().drop_duplicates()
j
hey Justin, so i think the “your default credentials were not found” error is due to the second issue i mentioned before where on resource cleanup, the env var is unset, causing the io manager to not find credentials
it’s sort of a catch-22 situation that we need to fix on our end. I’m working on a fix for it
j
Thx for your answer Jamie, looks like it's what you said last friday yes! Thank you for your answer and good luck! 🙂
h
Hi jamie, any updates related to this?