https://dagster.io/ logo
e

Eric

01/24/2020, 8:56 PM
Hi All, I'm looking to create a sql_solid similar to the one in the air demo. Essentially just a solid that executes arbitrary sql statements. The tutorial does a great job explaining how I can have an input parameter and specify that in the config yaml file. However, further into the tutorial it goes over hydrating inputs. My understanding is I could make a custom data type, say for example, SqlDataFrame that uses a sqlalchemy engine to execute a sql statement (e.g "select * from datawarehouse_table") and wraps a pandas DataFrame. With this couldn't I avoid having to create a "sql_solid" and instead just have a parameter of a solid of type SqlDataFrame ?
As a follow up, is it possible to define a hydration_input for a specific datatype for a single solid instead of defining a new custom datatype (SqlDataFrame) ? For example, I would like to use the standard pandas DataFrame but have the input hydration defined differently per solid because it's use of DataFrame in each might vary.
a

alex

01/24/2020, 9:02 PM
I think you are right - note that the input_hydration is only used when hydrating the type from config
your options are to create different types or move that variance in to the configuration part of the input_hydration_config
having custom types for each table does have its own benefits - such as allowing you to enforce expectations on columns - so there are upsides to that approach
e

Eric

01/24/2020, 9:11 PM
I see. I think that's the answer I was looking for.
thanks Alex !
I have my hello_world_sql_pipeline all setup and ready to test. is there a way to use dagit to help create the yaml file ? when I try to run dagit I get a DagsterInvalidDefinitionError. This is a bit of a catch 22 since I want to use dagit to create the config but can't run it without already specifying it ?
a

alex

01/27/2020, 4:02 PM
Can you share more details about the error?
2 Views