Running into an error when trying to utilize envir...
# ask-community
t
Running into an error when trying to utilize environment variables. I've tried exporting the variables, setting them in /etc/environment and also tried using a .env file and none of the works. My dagster_op just doesn't see them to pull them in. Just wondering if anyone else has seen this and what you did to fix it?
o
hi @Timothy Elder! can you share how you're trying to access the env variables within your op? and just to double check, have you seen this guide: https://docs.dagster.io/guides/dagster/using-environment-variables-and-secrets#declaring-environment-variables?
t
Using os.getenv in an op, I've already gone through the documentation and things are set up as mentioned in the documentation.
o
gotcha -- and for your local development, are you using
dagster dev
/ launching runs through the UI?
t
Not sure, we have a different group that does the dev, this is from the dagit UI where this is being ran
o
would this be the dagit UI in your deployed version of dagster then? (i.e. not your local machine)
t
Correct
o
in that case, where are those runs executing? you'll need to make sure those environment variables are available there (i.e. if you're using the k8s executor / run launcher, you'll need to set those secrets in k8s) . there are a few infrastructure-specific guides under this tab: https://docs.dagster.io/guides/dagster/using-environment-variables-and-secrets#declaring-environment-variables
t
It's an EC2 instance and the variables are already there.
o
are the secrets tagged with the "dagster" tag? the run launcher will only pull in variables with this tag into the containers that it launches for each run
t
Not sure you're fully understanding, it doesn't matter where the environment variables are stored or how they are tagged, Dagster doesn't pull them in.
d
Hey Timothy - when you say 'the variables are already there', can you share more detail about how you're setting them in your EC2 instance and how you're verifying that they are there? Maybe a set of steps we could follow to try to reproduce the problem that you're seeing ourselves?
t
I've tried setting them as environment variables at the OS level as well as using a .env file and also pulling them in via AWS Secrets Manager. Logs for Dagster don't say if they are pulled from the local os using os.getenv and when using Secrets Manager it says they are being pulled in but there isn't any real output to show that they are in fact being pulled in.
d
Where exactly are you setting them as environment variables? Is it in the same place where your runs are happening - what run launcher are you using?
What re you doing to check that they're pulled in?
t
I'm checking the logs which shows they are being pulled with the following lines... Apr 27 172920 produsa-dagster-01 dagit[359367]: 2023-04-27 172920 +0000 - dagster - DEBUG - secrets_job - c6f32238-21e3-4d44-9689-0ce6f4a8cb6d - 359367 - secretsmanager_secrets_aws_key - HANDLED_OUTPUT - Handled output "result" using IO manager "io_manager" We aren't using a run launcher per other correspondence in the Kubernetes channel as we are using the K8's executre per our code in the gist below. When you ask how we are setting the environment variables I've covered that a few times now in this thread. All sensitive info in the code at the gist below has been redacted https://gist.github.com/telderfts/fa3e52415545580d137aa646ccb985c4
d
the code here is extremely helpful, thanks (I think it would be basically impossible to answer this question without it now that I see what the problem is). this line here doesn't look quite right to me, due to more of a Python problem than a Dagster problem:
Copy code
f"AWS_ACCESS_KEY_ID={secretsmanager_secrets_aws_key}",
secretsmanager_secrets_aws_key
is an op, which is a function, not a string - it looks to me like you're hoping to take the output of that op and read it in as an input to extract_tableau? You'll need to define that value as an input to the op function, rather than referencing the value of another op function directly
Like in the example here from the docs that shows how to pass values between ops: https://docs.dagster.io/concepts/ops-jobs-graphs/jobs#using-the-job-decorator
Copy code
from dagster import job, op


@op
def return_five():
    return 5


@op
def add_one(arg):
    return arg + 1


@job
def do_stuff():
    add_one(return_five())
It's
Copy code
@op
def add_one(arg):
    return arg + 1
not
Copy code
@op
def add_one():
    return return_five + 1
even though the value of arg is the output of the return_five op
I think what i would suggest is replacing secretsmanager_secrets_aws_key in that second job with
os.getenv("aws_access_key_id")
and reference the value directly - that's probably simplest
👍 1
t
Thanks Daniel, It also looks like the IO Manager, we are just using the standard built-in one, is storing the values it pulls in memory rather than on disk? Would it be worthwhile to look at possibly having it store those values to S3 or a different mechanism?
d
I believe the default one writes to the local filesystem - the s3 one will help if you want to store it somewhere more permanent, yeah
👍 1
t
Tried what you mentioned and it doesn't work...I did notice this, could that be the issue per the code in the gist above? Not sure where I would add that if needed.
Copy code
Note that your ops must also declare that they require this resource with
    `required_resource_keys`, or it will not be initialized for the execution of their compute
    functions.
d
can you post your updated code?
d
is your intention to take the values out of secretsmanager in hte first job and set them as environment variables, so that they're available in the second job?
how about taking out the first job and doing at all in the second job? starting as simple as possible with it all happening in a single function/op and then going from there?
That code also doesn't seem to be using the secretsmanager resource the way it is used in the docs: https://docs.dagster.io/_apidocs/libraries/dagster-aws#dagster_aws.secretsmanager.secretsmanager_resource You need to reference a specific secret ARN there. Here's what your op might look like after it's using the secretsmanager resource correctly and doing it all in a single job instead of trying to pass data between two different jobs:
Copy code
@op(required_resource_keys={'secrets'})
def extract_tableau(context, orgs):
    secrets = context.resources.secrets
    aws_access_key_id = secrets.get_secret_value(
        SecretId='arn:aws:secretsmanager:region:aws_account_id:secret:appauthexample-AbCdEf'
    )
    
    execute_k8s_job(
        ...,
        env_vars=[
            f"AWS_ACCESS_KEY_ID={aws_access_key_id)}",
            ...
        ]
        ,,,
    )
t
@daniel We were going with the second option that it provided in the documentation so that we didn't have to list specific ARN's for every secret that needed to be pulled. It should be pulling in the secrets with the specific tag that we give it. I'll give it a shot with everything in one job though and see if that doesn't help.
d
Oh I see - you're using
secretsmanager_secrets_resource
, not
secretsmanager_resource
(confusing names)
OK yeah ignore that part then, my mistake - but the single job is still a good idea to start I think