https://dagster.io/ logo
#ask-community
Title
# ask-community
l

Leo Qin

09/12/2022, 2:30 PM
hello - I am running aws athena on dagster serverless (via athena-dbt which uses pyathena) and getting the following error when running any query:
botocore.errorfactory.InvalidRequestException: An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: The S3 location provided to save your query results is invalid. Please check your S3 location is correct and is in the same region and try again. If you continue to see the issue, contact customer support for further assistance.
, if I run the image and run it locally this error doesn't happen, any ideas what is happning?
d

daniel

09/12/2022, 2:35 PM
Hi Leo - do you have example code that's triggering this? How are you specifying an s3 bucket / aws credentials?
l

Leo Qin

09/12/2022, 2:39 PM
it's generated by dbt, so it'll be a CTAS-like. I am specifying credentials and s3 output bucket using environmental variables passed during deploy, in particular
AWS_ATHENA_OUTPUT_BUCKET
is set to
s3://{bucket}/dbt/
d

daniel

09/12/2022, 2:42 PM
What region is your S3 bucket in?
l

Leo Qin

09/12/2022, 2:44 PM
us-east-1, but i also have
AWS_DEFAULT_REGION
and
AWS_REGION
set to us-east-1
d

daniel

09/12/2022, 2:47 PM
Yeah I was just wondering about that "Please check your S3 location is correct and is in the same region and try again" error message - i'm not an athena expert, so I'm not sure if that means "the same region as your credentials" or something else?
You're sure that you're running your query in us-east-1? Is there any way to verify that?
If you have a link to a run where this is failing, we can check our logs
playing around with different environmental setups too
d

daniel

09/12/2022, 2:52 PM
If it's possible to share the op that you're using that triggers this that would be helpful
(over DM is fine too if you don't want to share it publicly)
sorry, the op or the asset
l

Leo Qin

09/12/2022, 2:55 PM
this would be a dagster-dbt op, so I think it's probably
dagster_dbt.dbt_cloud_run_op
d

daniel

09/12/2022, 3:00 PM
Got it - er I think it may be executing with dbt locally not dbt cloud? but point taken that it's dagster-dbt. And which env vars are you setting to tell it what AWS credentials to use?
l

Leo Qin

09/12/2022, 3:01 PM
I am setting
AWS_ACCESS_KEY_ID
and
AWS_SECRET_ACCESS_KEY
during deploy
d

daniel

09/12/2022, 3:01 PM
as well as AWS_DEFAULT_REGION and AWS_REGION
l

Leo Qin

09/12/2022, 3:01 PM
yes
d

daniel

09/12/2022, 3:03 PM
When you say "if I run the image and run it locally this error doesn't happen" - what command you running exactly to run it locally?
l

Leo Qin

09/12/2022, 3:04 PM
docker run -it <tag> sh
then
dbt build xyz
also - to clarify we are not using dbt cloud
we need to use a community adapter, so we have to run dbt cli
d

daniel

09/12/2022, 3:11 PM
Serverless tasks currently run in AWS in a fargate task that isn't in us-east-1, but they shouldn't override any of those AWS_ variables that you set, so it's surprising that whatever dbt is using to authenticate with aws is behaving differently...
l

Leo Qin

09/12/2022, 3:13 PM
from what i can tell it's a mix of profile-based and environment-based auth, so I'd like to try setting an aws default profile in the image... i notice that there's a
dagster_cloud_pre_install.sh
that gets copied, would that be a good place to do taht?
d

daniel

09/12/2022, 3:20 PM
checking with the team about that to double-check and will get back to you - that would be set at image build time though. just looking through https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html - I would expect the two env vars you set to work based on the priority order they give there..
🙏 1
although they don't mention the region there..
Just double checking, did you also set the region name here? https://github.com/Tomme/dbt-athena#configuring-your-profile (neither a DBT nor an athena expert, so thanks for bearing with me)
l

Leo Qin

09/12/2022, 3:24 PM
yes, we point it to the
AWS_REGION
variable (and then default it to
us-east-1
d

daniel

09/12/2022, 3:30 PM
which service is expecting AWS_ATHENA_OUTPUT_BUCKET ? is that what you're pointing s3_staging_dir at in the profile?
l

Leo Qin

09/12/2022, 3:30 PM
yes, we map s3_staging_dir to AWS_ATHENA_OUTPUT_BUCKET
d

daniel

09/12/2022, 3:38 PM
p

prha

09/12/2022, 3:43 PM
For the earlier question, you could put your config setup either in
dagster_cloud_pre_install.sh
or
dagster_cloud_post_install.sh
, depending on how you want the image layer caching to work (whether or not it depends on the code being copied over first).
🌈 1
Leo, are you setting the env vars during deploy via the github integration, using github secrets? Or using the
--env
arguments using the CLI? Can you confirm they’re getting set correctly in the image?
l

Leo Qin

09/12/2022, 5:46 PM
we are using gitlab, so they are being set as
--env
arguments; I can see them in the env when I run the image
👍 1
i did also try a profile-based authentication and got the same error
d

daniel

09/12/2022, 6:15 PM
and just to confirm, you're not include an AWS profile or anything like that when you run it locally, right? By mounting volumes, etc.
l

Leo Qin

09/12/2022, 6:47 PM
update - i was running the job from the image using the dbt cli, but i tried out the dagster cli (
dagster job execute
) and it succeeded
(and yes - i can see from the image that the only credentials are the ones I passed into the deploy)
j

Joe

09/13/2022, 2:23 PM
hey @Leo Qin can you try providing
--env AWS_REGION=<your-region>
to the dagster-cloud deploy command?
l

Leo Qin

09/13/2022, 2:24 PM
hey joe - that is already part of the command
j

Joe

09/13/2022, 3:01 PM
@Leo Qin are you sure the user associated with the AWS secrets has the correct permissions to run athena queries? Were you using those same creds when running locally?
l

Leo Qin

09/13/2022, 3:01 PM
the user has Administrator Access to the aws account
j

Joe

09/13/2022, 3:20 PM
@Leo Qin can you look at cloudtrail events? There should be entries for the failed StartQueryExecution requests, if you could share the specifics of the request (userIdentity, region, source, userAgent) that'd be userful. We've verified that your config is being set properly on the run and are a little stumped atm.
l

Leo Qin

09/13/2022, 3:23 PM
i can see them - any particular event/request id to look for?
j

Joe

09/13/2022, 3:26 PM
I don't have an event/request ids available unfortunately, you could maybe log the failed requests ids?
Its possible the athena console will also have your failed queries in its ui, you could get a queryId from that?
l

Leo Qin

09/13/2022, 3:29 PM
interesting, i'm actually not seeing ANY events with the access key i shipped in the past 12 hours (other than a few ListUserPolicies and GetCallerIdentity that I did)
j

Joe

09/13/2022, 3:30 PM
oh that is interesting
l

Leo Qin

09/13/2022, 3:31 PM
access key status is active... plus i feel like you'd get a different error if it weren't
j

Joe

09/13/2022, 3:33 PM
do you see any events that would've happened when you ran locally (using that access key)?
l

Leo Qin

09/13/2022, 3:34 PM
yes, i do see a few
StartQueryExecution
from my source ip address
j

Joe

09/13/2022, 3:46 PM
@Leo Qin just for a sanity check, your aws account doesn't have any exotic deny iam policies that are blocking based on things like ip/region/etc?
l

Leo Qin

09/13/2022, 3:47 PM
don't think so, but I can ask around... any particular IPs or regions?
j

Joe

09/13/2022, 3:48 PM
tbh I'm not sure of specifics but dagster cloud does run in us-west-2
if we could get a requestID from the failed athena calls that'd be useful as well.
have you tried making a more basic boto3 call into your account?
l

Leo Qin

09/13/2022, 4:00 PM
yeah,. basic awscli calls from the docker image seem to work
j

Joe

09/13/2022, 4:06 PM
when its running in dagster serverless or locally?
l

Leo Qin

09/13/2022, 4:06 PM
locally
oh, i see - let me try doing some from serverless
👍 1
j

Joe

09/13/2022, 4:08 PM
sts.get_caller_identity()
would be useful as well as something that gets logged to cloudtrail https://docs.aws.amazon.com/IAM/latest/UserGuide/cloudtrail-integration.html
l

Leo Qin

09/13/2022, 4:52 PM
alright, so some simple boto calls worked! Was able to do sts.GetCallerIdentity and athena.ListDatabases and saw userAgent
"Boto3/1.24.63 Python/3.8.14 Linux/4.14.287-215.504.amzn2.x86_64 exec-env/AWS_ECS_FARGATE Botocore/1.27.71"
j

Joe

09/13/2022, 5:09 PM
nice! was caller identity what you expected? did you see the event in your cloudtrail?
l

Leo Qin

09/13/2022, 5:12 PM
yes, the access key is as expected
trying to get some support from the maintainers of dbt-athena now
👍 1
j

Joe

09/13/2022, 5:12 PM
gotcha so theres some disconnect between your dagster code and dbt
l

Leo Qin

09/15/2022, 5:00 PM
update: i turned on debugging and found that the problematic query is as follows:
Copy code
-- /* {"app": "dbt", "dbt_version": "1.1.2", "profile_name": "data_infra_poc", "target_name": "dev", "connection_name": "list_awsdatacatalog"} */
select distinct schema_name
from awsdatacatalog.INFORMATION_SCHEMA.schemata
j

Joe

09/15/2022, 5:01 PM
ahhh were you missing glue iam permissions?
l

Leo Qin

09/15/2022, 5:02 PM
the credentials have administrator access, so I don't think that is the issue
j

Joe

09/15/2022, 5:02 PM
do other queries work?
l

Leo Qin

09/15/2022, 5:03 PM
yes - locally, from dbt and dagit
j

Joe

09/15/2022, 5:03 PM
what about from serverless?
l

Leo Qin

09/15/2022, 5:04 PM
i was able to do a list databases from serverless using boto, yeah
j

Joe

09/15/2022, 5:05 PM
hmm but no luck getting any dbt-athena operations to work?
l

Leo Qin

09/15/2022, 5:05 PM
not yet, but i'm tracing right now...
👍 1
j

Joe

09/15/2022, 7:18 PM
@Leo Qin i wonder if the aws region for the dbt profile is getting set correctly https://github.com/Tomme/dbt-athena#configuring-your-profile, if it was unset it might be whats throwing that error?
l

Leo Qin

09/15/2022, 7:20 PM
i just added a log to see what region was being passed to boto by pyathena and on cloud its
us-west-2
but in the docker image i have
Copy code
AWS_REGION=us-east-1
AWS_DEFAULT_REGION=us-east-1
d

daniel

09/15/2022, 7:22 PM
Do you have a callsite in pyathena by any chance? Like something we could point to in their github repo?
looking into how dbt-athena determines the region name now
d

daniel

09/15/2022, 7:26 PM
I was under the impression that it came from the region_name in the profile here: https://github.com/Tomme/dbt-athena#configuring-your-profile
l

Leo Qin

09/15/2022, 7:27 PM
and i have mine configured to
us-east-1
in my project
(specifically using the
AWS_REGION
environment variable)
oh... so something is overwriting that variable
d

daniel

09/15/2022, 7:27 PM
and where did you log where it was saying us-west-2?
you could try hardcoding it to us-east-1 instead of AWS_REGION and see if that resolves it
hmmm I wonder if ECS fargate automatically sets AWS_REGION in any tasks that it spins up?
yeah - let me try hardcoding it
alright, it works if i hardcode the region!
d

daniel

09/15/2022, 7:41 PM
hooray!
l

Leo Qin

09/15/2022, 7:41 PM
I guess something is clobbering
AWS_REGION
hardcoding is not a blocker for us at all
j

Joe

09/15/2022, 7:41 PM
wonderful 🎉
d

daniel

09/15/2022, 7:41 PM
OK, we're pretty sure that this is because ECS Fargate (where serverless runs the tasks) automatically overrides those env vars. We'll make that much clearer in the docs / build script
Thanks for powering through that with us
l

Leo Qin

09/15/2022, 7:42 PM
and thanks for help in troubleshooting!
there might be other env vars that get overwritten too..
114 Views