I m running the `quickstart snowflake` example <https github dagster #ask-community

I'm running the `quickstart_snowflake` example (<h...

clay

02/01/2023, 8:59 PM

I'm running the

quickstart_snowflake

example (https://github.com/dagster-io/dagster/tree/master/examples/quickstart_snowflake) but the database and schema specified in my connection are ignored by pandas. Presumably Dagster handles creation of the table after

hackernews_topstory_ids()

returns a pandas Dataframe. Is there a way to force Dagster to make several sql calls (

USE DATABASE ...

and

USE SCHEMA ...

) before trying to write the materialized dataframe to snowflake? Alternatively, any suggestion on how to handle Snowflake ignoring the database and schema connection parameters?

jamie

02/01/2023, 9:49 PM

Hey @clay this is definitely unintended behavior, and something i haven’t run into before when testing the snowflake io manager. Can you elaborate a bit more on what you’re seeing? are the assets being written to the wrong database and schema?

clay

02/01/2023, 9:51 PM

Hi @jamie! No, the connection/session simply is ignoring the DATABASE and SCHEMA parameters, for some reason

clay

02/01/2023, 9:51 PM

maybe because i'm authenticating with a private key instead of password?

clay

02/01/2023, 9:52 PM

I was using the

{"env":"SNOWFLAKE_DATABASE"} ...

pattern and tried doing it just with a string specifying the database name, but that's ignored, too

jamie

02/01/2023, 9:52 PM

ok this is a super basic question but for “ignoring the database and schema”, what behavior are you seeing that’s telling you it’s being ignored?

clay

02/01/2023, 9:53 PM

one sec and i'll share an image of the error -- Slack's not allowed on my work computer, so I have to take a photo of it 😛

jamie

02/01/2023, 9:53 PM

ok no worries!

clay

02/01/2023, 9:54 PM

Here are my settings (I’ve tried both):

clay

02/01/2023, 9:55 PM

Can you read this?

clay

02/01/2023, 9:56 PM

This is a docker deploy, fwiw

clay

02/01/2023, 9:57 PM

Seems like it would be fine with

CREATE TABLE  MYDATABASE.MYSCHEMA.hackernew_topstory_ids....

jamie

02/01/2023, 9:58 PM

yeah i can read it! ok so a couple super basic questions before getting into more complicated stuff: 1. does the SANDBOX db already exist? 2. does the account you’re authenticating with have permission to create tables in SANDBOX?

clay

02/01/2023, 9:58 PM

Yes, I use SANDBOX with these credentials all the time

👍 1

jamie

02/01/2023, 9:59 PM

so the problem with the CREATE TABLE is then we need to manually figure out the names and types of all of the columns. if we use pandas.write_df pandas figures it out for us, but i dont think we can do the explicit table path

jamie

02/01/2023, 9:59 PM

i will confirm that’s the case though

clay

02/01/2023, 9:59 PM

FWIW, we have 77 Snowflake databases with thousands of schemas and 98K tables, so I need to be able to specify different databases and schemas in queries anyhow

jamie

02/01/2023, 9:59 PM

ty for checking on the silly stuff - it’s been the issue way too many times, so best to check early

❤️ 1

clay

02/01/2023, 10:01 PM

and this is our small data warehouse 🥲

jamie

02/01/2023, 10:01 PM

ok! good to know that too. so a couple things - the snowflake io manager currently can only be set to a single database. you’d need to instantiate a different io manager for each database you want to use. you can specify the schema on the dagster asset itself, but i don’t think that’ll fix the issue at hand

clay

02/01/2023, 10:02 PM

Hmm. Ok. Well, I'm probably not going to be able to use dagster if I can't specify those inline with the code, unfortunately. But that's ok -- that's why I was trying it out.

jamie

02/01/2023, 10:03 PM

i think it seems reasonable enough to be able to change the database for an asset the same way you can specify a schema though. i’m about to do a bunch of work on the snowflake io manager (it’s my todo list next week). so i’ll look into doing that as well

❤️ 1

clay

02/01/2023, 10:03 PM

fwiw, we use fully qualified names/calls with all of our sql in our environment. It's the only way that people know what we're accessing when they read our code

clay

02/01/2023, 10:06 PM

Just for feedback, it would be awesome, in Ubisoft's environment, to be able to either have a default for a particular module or to specify the role, database, and schema when needed. We also have to frequently shift roles b/c only the primary snowflake role can write to schemas for which it has access and our access controls are role based. So, I have several roles and have to switch between them when I'm writing data to different schemas for different purposes. I understand this is more complicated that most people's setup, most likely, but the flexibility would be great.

clay

02/01/2023, 10:06 PM

our code has a lot of:

Copy code

USE ROLE SOME_PRIMARY_ROLE;
USE SECONDARY ROLES ALL;

clay

02/01/2023, 10:09 PM

This also needs to be done when we call stored procedures that are in different schemas, etc.

jamie

02/01/2023, 10:09 PM

yeah that does sound pretty complicated. another option would be to use the snowflake resource and manually read/write the assets within the asset functions (and then return None from the function). it’d be more cumbersome because you have to write the reading and writing yourself, but if you have to be super customized like that, it might be the better option

jamie

02/01/2023, 10:09 PM

also can you send the bottom half of the error message?

clay

02/01/2023, 10:09 PM

yeah give me a minute

ty spinny 1

clay

02/01/2023, 10:11 PM

dagster._core.errors.DagsterExecutionHandleOutputError: Error occurred while handling output "result" of step "hackernews_topstory_ids": File "/usr/local/lib/python3.7/site-packages/dagster/_core/execution/plan/execute_plan.py", line 265, in dagster_event_sequence_for_step for step_event in check.generator(step_events): File "/usr/local/lib/python3.7/site-packages/dagster/_core/execution/plan/execute_step.py", line 388, in core_dagster_event_sequence_for_step for evt in _type_check_and_store_output(step_context, user_event, input_lineage): File "/usr/local/lib/python3.7/site-packages/dagster/_core/execution/plan/execute_step.py", line 444, in _type_check_and_store_output for evt in _store_output(step_context, step_output_handle, output, input_lineage): File "/usr/local/lib/python3.7/site-packages/dagster/_core/execution/plan/execute_step.py", line 639, in _store_output handle_output_gen, File "/usr/local/lib/python3.7/site-packages/dagster/_utils/__init__.py", line 459, in iterate_with_context return File "/usr/local/lib/python3.7/contextlib.py", line 130, in _exit self.gen.throw(type, value, traceback) File "/usr/local/lib/python3.7/site-packages/dagster/_core/execution/plan/utils.py", line 90, in op_execution_error_boundary ) from e The above exception was caused by the following exception: sqlalchemy.exc.ProgrammingError: (snowflake.connector.errors.ProgrammingError) 090105 (22000): Cannot perform CREATE TABLE. This session does not have a current database. Call 'USE DATABASE', or use a qualified name. [SQL: CREATE TABLE hackernews_topstory_ids ( "ITEM_IDS" BIGINT ) ] (Background on this error at: https://sqlalche.me/e/14/f405) File "/usr/local/lib/python3.7/site-packages/dagster/_core/execution/plan/utils.py", line 55, in op_execution_error_boundary yield File "/usr/local/lib/python3.7/site-packages/dagster/_utils/__init__.py", line 457, in iterate_with_context next_output = next(iterator) File "/usr/local/lib/python3.7/site-packages/dagster/_core/execution/plan/execute_step.py", line 621, in _gen_fn gen_output = output_manager.handle_output(output_context, output.value) File "/usr/local/lib/python3.7/site-packages/dagster/_core/storage/db_io_manager.py", line 114, in handle_output self._handlers_by_type[obj_type].handle_output(context, table_slice, obj) or {} File "/usr/local/lib/python3.7/site-packages/dagster_snowflake_pandas/snowflake_pandas_type_handler.py", line 90, in handle_output method=pd_writer, File "/usr/local/lib/python3.7/site-packages/pandas/core/generic.py", line 2882, in to_sql method=method, File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 728, in to_sql **engine_kwargs, File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 1758, in to_sql dtype=dtype, File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 1650, in prep_table table.create() File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 867, in create self._execute_create() File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 853, in _execute_create self.table.create() File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/schema.py", line 962, in create bind._run_ddl_visitor(ddl.SchemaGenerator, self, checkfirst=checkfirst) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 3228, in _run_ddl_visitor conn._run_ddl_visitor(visitorcallable, element, **kwargs) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2211, in _run_ddl_visitor visitorcallable(self.dialect, self, **kwargs).traverse_single(element) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/visitors.py", line 524, in traverse_single return meth(obj, **kw) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/ddl.py", line 900, in visit_table include_foreign_key_constraints, # noqa File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1380, in execute return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/ddl.py", line 81, in _execute_on_connection self, multiparams, params, execution_options File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1478, in _execute_ddl compiled, File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1944, in _execute_context e, statement, parameters, cursor, context File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2125, in _handle_dbapi_exception sqlalchemy_exception, with_traceback=exc_info[2], from_=e File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 211, in raise_ raise exception File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1901, in _execute_context cursor, statement, parameters, context File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute cursor.execute(statement, parameters) File "/usr/local/lib/python3.7/site-packages/snowflake/connector/cursor.py", line 827, in execute Error.errorhandler_wrapper(self.connection, self, error_class, errvalue) File "/usr/local/lib/python3.7/site-packages/snowflake/connector/errors.py", line 279, in errorhandler_wrapper error_value, File "/usr/local/lib/python3.7/site-packages/snowflake/connector/errors.py", line 330, in hand_to_other_handler cursor.errorhandler(connection, cursor, error_class, error_value) File "/usr/local/lib/python3.7/site-packages/snowflake/connector/errors.py", line 216, in default_errorhandler cursor=cursor, The above exception was caused by the following exception: snowflake.connector.errors.ProgrammingError: 090105 (22000): Cannot perform CREATE TABLE. This session does not have a current database. Call 'USE DATABASE', or use a qualified name. File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1901, in _execute_context cursor, statement, parameters, context File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute cursor.execute(statement, parameters) File "/usr/local/lib/python3.7/site-packages/snowflake/connector/cursor.py", line 827, in execute Error.errorhandler_wrapper(self.connection, self, error_class, errvalue) File "/usr/local/lib/python3.7/site-packages/snowflake/connector/errors.py", line 279, in errorhandler_wrapper error_value, File "/usr/local/lib/python3.7/site-packages/snowflake/connector/errors.py", line 330, in hand_to_other_handler cursor.errorhandler(connection, cursor, error_class, error_value) File "/usr/local/lib/python3.7/site-packages/snowflake/connector/errors.py", line 216, in default_errorhandler cursor=cursor,_

clay

02/01/2023, 10:12 PM

I think that's the entire thing

clay

02/01/2023, 10:15 PM

In the meantime, if you know why it's not using the database and schema connection params, that would be awesome to know. I can always have pandas execute sql to do reads from other parts of snowflake and keep the materialized assets in the specified schema

jamie

02/01/2023, 10:17 PM

yeah i’m trying to think of why it wouldn’t be getting those parameters. i vaguely remember having some errors like this when i was doing dev work on the io manager, but it was usually a permissions error or i have misspelled something. i think i’ll probably need to run the quickstart example myself to see if i can replicate the issue to do some better debugging

clay

02/01/2023, 10:18 PM

Let me know if there's anything I can do to help you debug it

clay

02/01/2023, 10:18 PM

FWIW, DBeaver seemed to stop respecting connection parameters recently, so maybe it's a change on the Snowflake side?

jamie

02/01/2023, 10:18 PM

what version of

dagster

dagster-snowflake

dagster-snowflake-pandas

and

snowflake-python-connector

are installed?

clay

02/01/2023, 10:26 PM

Copy code

dagster = 1.1.14
dagster-snowflake = 0.17.14
dagster-snowflake-pandas = 0.17.14
snowflake-connector-python = 2.9.0
snowflake-sqlalchemy = 1.4.4
SQLAlchemy = 1.4.46

jamie

02/01/2023, 10:26 PM

ok ty! i’ll try to replicate tomorrow and let you know where i end up

clay

02/01/2023, 10:27 PM

I'll try falling back to password auth to see if it helps.

clay

02/01/2023, 10:39 PM

This is similar-ish... not sure if it will help at all. https://github.com/snowflakedb/snowflake-connector-python/issues/865

clay

02/02/2023, 3:44 PM

Hi @jamie! Thanks for taking the time to help yesterday. I contacted our Snowflake admins about my account. They simply did an edit/save on my default role, namespace, and warehouse. Those had been set several years ago. Simply re-saving them seems to have addressed the issue with my connection. I guess something changed in Snowflake that didn't catch my account in a refresh. The good news is that I'm able to run the Hacker News example now! Next up, I'm going to write my own simple example and see if I can figure out how to work with the different roles, databases, schemas, etc.

jamie

02/02/2023, 3:44 PM

that’s so strange

👍 1

jamie

02/02/2023, 3:45 PM

thanks for checking! snowflake never ceases to surprise and frustrate

clay

02/02/2023, 3:45 PM

Yeah, it's still ignoring my connection uri parameters when i create a new session in DBeaver. Who knows...

jamie

02/02/2023, 3:46 PM

def let me know if you run into issues or need help thinking through the different role/schema stuff. the #dagster-snowflake channel is a good place to reach me

❤️ 1

clay

02/02/2023, 3:47 PM

ok! Thanks!

jamie

02/02/2023, 3:47 PM

in case you haven’t seen this yet, here’s the getting started guide for snowflake and a more advanced reference page

clay

02/02/2023, 3:48 PM

Thanks! I'll mainly need to figure out how to change my primary role and use my secondary roles between DAG steps, I think.

jamie

02/02/2023, 3:54 PM

ok! right now your best bet will probably be instantiating a snowflake io manager for each role and then setting an

io_manager_key

for each step so that it uses the right io manager sort of like this but instead of having an s3 io manager you would have a second snowflake io manager

❤️ 1

163 Views

Open in Slack

Previous Next