I'm running the `quickstart_snowflake` example (<h...
# ask-community
c
I'm running the
quickstart_snowflake
example (https://github.com/dagster-io/dagster/tree/master/examples/quickstart_snowflake) but the database and schema specified in my connection are ignored by pandas. Presumably Dagster handles creation of the table after
hackernews_topstory_ids()
returns a pandas Dataframe. Is there a way to force Dagster to make several sql calls (
USE DATABASE ...
and
USE SCHEMA ...
) before trying to write the materialized dataframe to snowflake? Alternatively, any suggestion on how to handle Snowflake ignoring the database and schema connection parameters?
j
Hey @clay this is definitely unintended behavior, and something i haven’t run into before when testing the snowflake io manager. Can you elaborate a bit more on what you’re seeing? are the assets being written to the wrong database and schema?
c
Hi @jamie! No, the connection/session simply is ignoring the DATABASE and SCHEMA parameters, for some reason
maybe because i'm authenticating with a private key instead of password?
I was using the
{"env":"SNOWFLAKE_DATABASE"} ...
pattern and tried doing it just with a string specifying the database name, but that's ignored, too
j
ok this is a super basic question but for “ignoring the database and schema”, what behavior are you seeing that’s telling you it’s being ignored?
c
one sec and i'll share an image of the error -- Slack's not allowed on my work computer, so I have to take a photo of it 😛
j
ok no worries!
c
Here are my settings (I’ve tried both):
Can you read this?
This is a docker deploy, fwiw
Seems like it would be fine with
CREATE TABLE  MYDATABASE.MYSCHEMA.hackernew_topstory_ids....
j
yeah i can read it! ok so a couple super basic questions before getting into more complicated stuff: 1. does the SANDBOX db already exist? 2. does the account you’re authenticating with have permission to create tables in SANDBOX?
c
Yes, I use SANDBOX with these credentials all the time
👍 1
j
so the problem with the CREATE TABLE is then we need to manually figure out the names and types of all of the columns. if we use pandas.write_df pandas figures it out for us, but i dont think we can do the explicit table path
i will confirm that’s the case though
c
FWIW, we have 77 Snowflake databases with thousands of schemas and 98K tables, so I need to be able to specify different databases and schemas in queries anyhow
j
ty for checking on the silly stuff - it’s been the issue way too many times, so best to check early
❤️ 1
c
and this is our small data warehouse 🥲
j
ok! good to know that too. so a couple things - the snowflake io manager currently can only be set to a single database. you’d need to instantiate a different io manager for each database you want to use. you can specify the schema on the dagster asset itself, but i don’t think that’ll fix the issue at hand
c
Hmm. Ok. Well, I'm probably not going to be able to use dagster if I can't specify those inline with the code, unfortunately. But that's ok -- that's why I was trying it out.
j
i think it seems reasonable enough to be able to change the database for an asset the same way you can specify a schema though. i’m about to do a bunch of work on the snowflake io manager (it’s my todo list next week). so i’ll look into doing that as well
❤️ 1
c
fwiw, we use fully qualified names/calls with all of our sql in our environment. It's the only way that people know what we're accessing when they read our code
Just for feedback, it would be awesome, in Ubisoft's environment, to be able to either have a default for a particular module or to specify the role, database, and schema when needed. We also have to frequently shift roles b/c only the primary snowflake role can write to schemas for which it has access and our access controls are role based. So, I have several roles and have to switch between them when I'm writing data to different schemas for different purposes. I understand this is more complicated that most people's setup, most likely, but the flexibility would be great.
our code has a lot of:
Copy code
USE ROLE SOME_PRIMARY_ROLE;
USE SECONDARY ROLES ALL;
This also needs to be done when we call stored procedures that are in different schemas, etc.
j
yeah that does sound pretty complicated. another option would be to use the snowflake resource and manually read/write the assets within the asset functions (and then return None from the function). it’d be more cumbersome because you have to write the reading and writing yourself, but if you have to be super customized like that, it might be the better option
also can you send the bottom half of the error message?
c
yeah give me a minute
ty spinny 1
dagster._core.errors.DagsterExecutionHandleOutputError: Error occurred while handling output "result" of step "hackernews_topstory_ids": File "/usr/local/lib/python3.7/site-packages/dagster/_core/execution/plan/execute_plan.py", line 265, in dagster_event_sequence_for_step for step_event in check.generator(step_events): File "/usr/local/lib/python3.7/site-packages/dagster/_core/execution/plan/execute_step.py", line 388, in core_dagster_event_sequence_for_step for evt in _type_check_and_store_output(step_context, user_event, input_lineage): File "/usr/local/lib/python3.7/site-packages/dagster/_core/execution/plan/execute_step.py", line 444, in _type_check_and_store_output for evt in _store_output(step_context, step_output_handle, output, input_lineage): File "/usr/local/lib/python3.7/site-packages/dagster/_core/execution/plan/execute_step.py", line 639, in _store_output handle_output_gen, File "/usr/local/lib/python3.7/site-packages/dagster/_utils/__init__.py", line 459, in iterate_with_context return File "/usr/local/lib/python3.7/contextlib.py", line 130, in _exit self.gen.throw(type, value, traceback) File "/usr/local/lib/python3.7/site-packages/dagster/_core/execution/plan/utils.py", line 90, in op_execution_error_boundary ) from e The above exception was caused by the following exception: sqlalchemy.exc.ProgrammingError: (snowflake.connector.errors.ProgrammingError) 090105 (22000): Cannot perform CREATE TABLE. This session does not have a current database. Call 'USE DATABASE', or use a qualified name. [SQL: CREATE TABLE hackernews_topstory_ids ( "ITEM_IDS" BIGINT ) ] (Background on this error at: https://sqlalche.me/e/14/f405) File "/usr/local/lib/python3.7/site-packages/dagster/_core/execution/plan/utils.py", line 55, in op_execution_error_boundary yield File "/usr/local/lib/python3.7/site-packages/dagster/_utils/__init__.py", line 457, in iterate_with_context next_output = next(iterator) File "/usr/local/lib/python3.7/site-packages/dagster/_core/execution/plan/execute_step.py", line 621, in _gen_fn gen_output = output_manager.handle_output(output_context, output.value) File "/usr/local/lib/python3.7/site-packages/dagster/_core/storage/db_io_manager.py", line 114, in handle_output self._handlers_by_type[obj_type].handle_output(context, table_slice, obj) or {} File "/usr/local/lib/python3.7/site-packages/dagster_snowflake_pandas/snowflake_pandas_type_handler.py", line 90, in handle_output method=pd_writer, File "/usr/local/lib/python3.7/site-packages/pandas/core/generic.py", line 2882, in to_sql method=method, File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 728, in to_sql **engine_kwargs, File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 1758, in to_sql dtype=dtype, File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 1650, in prep_table table.create() File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 867, in create self._execute_create() File "/usr/local/lib/python3.7/site-packages/pandas/io/sql.py", line 853, in _execute_create self.table.create() File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/schema.py", line 962, in create bind._run_ddl_visitor(ddl.SchemaGenerator, self, checkfirst=checkfirst) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 3228, in _run_ddl_visitor conn._run_ddl_visitor(visitorcallable, element, **kwargs) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2211, in _run_ddl_visitor visitorcallable(self.dialect, self, **kwargs).traverse_single(element) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/visitors.py", line 524, in traverse_single return meth(obj, **kw) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/ddl.py", line 900, in visit_table include_foreign_key_constraints, # noqa File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1380, in execute return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/ddl.py", line 81, in _execute_on_connection self, multiparams, params, execution_options File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1478, in _execute_ddl compiled, File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1944, in _execute_context e, statement, parameters, cursor, context File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2125, in _handle_dbapi_exception sqlalchemy_exception, with_traceback=exc_info[2], from_=e File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 211, in raise_ raise exception File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1901, in _execute_context cursor, statement, parameters, context File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute cursor.execute(statement, parameters) File "/usr/local/lib/python3.7/site-packages/snowflake/connector/cursor.py", line 827, in execute Error.errorhandler_wrapper(self.connection, self, error_class, errvalue) File "/usr/local/lib/python3.7/site-packages/snowflake/connector/errors.py", line 279, in errorhandler_wrapper error_value, File "/usr/local/lib/python3.7/site-packages/snowflake/connector/errors.py", line 330, in hand_to_other_handler cursor.errorhandler(connection, cursor, error_class, error_value) File "/usr/local/lib/python3.7/site-packages/snowflake/connector/errors.py", line 216, in default_errorhandler cursor=cursor, The above exception was caused by the following exception: snowflake.connector.errors.ProgrammingError: 090105 (22000): Cannot perform CREATE TABLE. This session does not have a current database. Call 'USE DATABASE', or use a qualified name. File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1901, in _execute_context cursor, statement, parameters, context File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute cursor.execute(statement, parameters) File "/usr/local/lib/python3.7/site-packages/snowflake/connector/cursor.py", line 827, in execute Error.errorhandler_wrapper(self.connection, self, error_class, errvalue) File "/usr/local/lib/python3.7/site-packages/snowflake/connector/errors.py", line 279, in errorhandler_wrapper error_value, File "/usr/local/lib/python3.7/site-packages/snowflake/connector/errors.py", line 330, in hand_to_other_handler cursor.errorhandler(connection, cursor, error_class, error_value) File "/usr/local/lib/python3.7/site-packages/snowflake/connector/errors.py", line 216, in default_errorhandler cursor=cursor,_
I think that's the entire thing
In the meantime, if you know why it's not using the database and schema connection params, that would be awesome to know. I can always have pandas execute sql to do reads from other parts of snowflake and keep the materialized assets in the specified schema
j
yeah i’m trying to think of why it wouldn’t be getting those parameters. i vaguely remember having some errors like this when i was doing dev work on the io manager, but it was usually a permissions error or i have misspelled something. i think i’ll probably need to run the quickstart example myself to see if i can replicate the issue to do some better debugging
c
Let me know if there's anything I can do to help you debug it
FWIW, DBeaver seemed to stop respecting connection parameters recently, so maybe it's a change on the Snowflake side?
j
what version of
dagster
,
dagster-snowflake
,
dagster-snowflake-pandas
and
snowflake-python-connector
are installed?
c
Copy code
dagster = 1.1.14
dagster-snowflake = 0.17.14
dagster-snowflake-pandas = 0.17.14
snowflake-connector-python = 2.9.0
snowflake-sqlalchemy = 1.4.4
SQLAlchemy = 1.4.46
j
ok ty! i’ll try to replicate tomorrow and let you know where i end up
c
I'll try falling back to password auth to see if it helps.
This is similar-ish... not sure if it will help at all. https://github.com/snowflakedb/snowflake-connector-python/issues/865
Hi @jamie! Thanks for taking the time to help yesterday. I contacted our Snowflake admins about my account. They simply did an edit/save on my default role, namespace, and warehouse. Those had been set several years ago. Simply re-saving them seems to have addressed the issue with my connection. I guess something changed in Snowflake that didn't catch my account in a refresh. The good news is that I'm able to run the Hacker News example now! Next up, I'm going to write my own simple example and see if I can figure out how to work with the different roles, databases, schemas, etc.
j
that’s so strange
👍 1
thanks for checking! snowflake never ceases to surprise and frustrate
c
Yeah, it's still ignoring my connection uri parameters when i create a new session in DBeaver. Who knows...
j
def let me know if you run into issues or need help thinking through the different role/schema stuff. the #dagster-snowflake channel is a good place to reach me
❤️ 1
c
ok! Thanks!
j
in case you haven’t seen this yet, here’s the getting started guide for snowflake and a more advanced reference page
c
Thanks! I'll mainly need to figure out how to change my primary role and use my secondary roles between DAG steps, I think.
j
ok! right now your best bet will probably be instantiating a snowflake io manager for each role and then setting an
io_manager_key
for each step so that it uses the right io manager sort of like this but instead of having an s3 io manager you would have a second snowflake io manager
❤️ 1
163 Views