Bug(?) with `AWS_REGION` env var being overridden ...
# dagster-serverless
t
Bug(?) with
AWS_REGION
env var being overridden in Serverless
Hi all, I have a very strange one which had me tripped up for hours. I’ll try and be concise, but explain all of the details: 1. We have an asset which loads a dataframe into S3 using AWS Wrangler and it writes to a Glue database and table 2. As part of 1, we pass in a boto3 session, which has the region set to
eu-west-1
via an env var of
AWS_REGION
. 3. Running the asset locally, the data is correctly loaded into the S3 bucket / Glue database and table 4. However, on Serverless, this fails, stating that the database could not be found… 5. So to help debug, I inserted
wr.catalog.create_database(_name_=database, _boto3_session_=_self_.session)
to see if it could create the database first, and it DID create the database. In theory this should have errored out as the database already exists 6. Digging through our AWS account, I can see that in fact, a new database has been created, but in
us-west-2
- which is coincidentally, the same region which Dagster Serverless runs in. We do not use
us-west-2
for any services at all in AWS. 7. I dropped into the debugger, and the session passed into AWS Wranger does have the region set to
eu-west-1
8. Logging the AWS region in Dagster cloud does show that it is using
eu-west-1
9. To ensure this isn’t a misconfigured env var, I downloaded the environment variables from Dagster cloud and used them locally. Again though, everything worked fine. This task runs correctly in our existing orchestrator, and locally. So I can only assume something funky is happening on the Dagster side, but that would seem very odd. Any thoughts at all?
j
Hey todd thanks for the detailed repro steps, whats your dagster cloud org name? How are you setting the AWS_REGION env var in serverless?
t
Via an Env Var
midnite
d
todd when you say "has the region set to
eu-west-1
via an env var of
AWS_REGION"
- by this do you mean you are setting the env var and then the intention is that boto will read that env var when figuring out the region? Or are you doing something like
boto3.session(region=os.getenv("AWS_REGION")
where you are the one passing in the region?
t
We pull the env var in.
Copy code
def get_boto3_session(assume_role_arn=None):
    session = boto3.Session(
        aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
        aws_secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
        region_name=os.getenv("AWS_REGION"),
    )
    if not assume_role_arn:
        return session

    fetcher = AssumeRoleCredentialFetcher(
        client_creator=_get_client_creator(session),
        source_credentials=session.get_credentials(),
        role_arn=assume_role_arn,
    )
    botocore_session = botocore.session.Session()
    botocore_session._credentials = DeferredRefreshableCredentials(
        method="assume-role", refresh_using=fetcher.fetch_credentials
    )

    return boto3.Session(botocore_session=botocore_session)
As shown in the screenshot, when I then drop into the debugger, I can see that the region on the session is correct
d
got it - and you're sure that in serverless you're hitting that top path (
if not assume_role_arn
)? asking because i see that after that it is creating sessions without passing in the region (e.g. botocore.session.Session() also takes in a region_name parameter)
er wait, double checking that
t
I can check, but we should be. We do locally and we have the same
AWS_ROLE_ARN
env var set which we pass into the function
d
my mistake - botocore may not take a region, the last return line there doesn't have a region_name though
j
@Todd de Quincey there might be some other place where a boto session is being created/used that isn't using your
AWS_REGION
envvar, boto doesn't use that value by default instead prefering
AWS_DEFAULT_REGION
which will be set to us-west-2 in serverless unless you override it https://docs.aws.amazon.com/sdkref/latest/guide/feature-region.html#feature-region-sdk-compat
t
Not sure where we’d be creating another session. As a first step, I’ll set an environment variable for
AWS_DEFAULT_REGION
in Dagster Cloud and set it to our desired region to see if that solves the issue
There shouldn’t be any other sessions running in this asset. It is a very simple FX rates API --> dataframe --> AWS flow
This didn’t solve the issue unfortunately
j
Gotcha, we’re still digging to see if there’s a way we’re overriding that env
did this start happening recently or are you running into this on new work?
t
In theory, since the session has already been created (and I can see the region is correct), I am not sure how or why this would get overridden, as the session has already been created. Makes me think it has to be on my side, but then again, this works fine locally
This is part of our PoC migration to Dagster
We ran this locally for a while, and all was fine. But we are bumping into this in Serverless
j
maybe
Copy code
boto3.setup_default_session(region_name="")
might help, my assumption is that there is still some place thats pulling a session thats not using your preferred region
t
Hunt no more. It’s definitely something on my side
My guess, is that locally, it is picking up the region from my config file (hence why it works locally)
d
gotta love the 7 different ways boto has to pick configuration
laugh cry 2
t
100%. It’s still very weird. The session is definitely using the env var value (I just updated it to
us-west-1
). But when AWS Wrangler fires, it’s pointing to
eu-west-1
, which has to be in my config… Apologies for the false alarm - it was just pure coincidence that our default region (since our account must be pre-2017) is the same as Dagsters!