Hi Team I am try to add data validations and creating user d dagster #announcements

Hi Team, I am try to add data validations and crea...

achintamiri

09/03/2019, 3:39 PM

Hi Team, I am try to add data validations and creating user defined data type.I want to perform a check if the file read is csv from one node and the next node also receives csv . I am getting error dagster.core.errors.DagsterInvalidDefinitionError: Input "pdd" in solid "Input1" is not connected to any outputs and can not be hydrated from configuration, creating an impossible to execute pipeline Sample code from dagster import execute_pipeline, pipeline, solid,as_dagster_type,lambda_solid,input_hydration_config import pandas as pd

DataFrame = as_dagster_type(

pd.DataFrame,

name='PandasDataFrame',

description='''Two-dimensional size-mutable, potentially heterogeneous

tabular data structure with labeled axes (rows and columns).

See <http://pandas.pydata.org/>''',

@lambda_solid

def Input1(pdd: DataFrame) -> DataFrame:

r = pdd.read_csv('file1.csv')

return r

@lambda_solid

def Merge(r: DataFrame,r2: DataFrame,pdd: DataFrame) -> DataFrame:

r3=pdd.concat([r,r2], axis=1)

return

@lambda_solid

def Input2(pdd: DataFrame) -> DataFrame:

r2 = pdd.read_csv('file2.csv')

return r2

@lambda_solid

def Result_output(y: DataFrame) -> DataFrame:

y3=y

return

@pipeline

def actual_dag_pipeline() :

y=Merge(Input1(),Input2())

Result_output(y)

alex

09/03/2019, 3:44 PM

the

pdd

input to

Input1/2

is the source of the problem - based on the fact your are reading a csv from a known path, i think you want to make a new data frame in the solid instead of defining an input

alex

09/03/2019, 3:45 PM

The error is there because the pipeline has inputs that are not wired up to anything - so the only way to satisfy them would be to provide values from config. The issue is that your custom input type does not yet define how to do that (

input_hydration_config

)

alex

09/03/2019, 3:47 PM

ie i think you want:

Copy code

@lambda_solid
def Input1() -> DataFrame:
    r = pd.read_csv('file1.csv')
    return r`

achintamiri

09/03/2019, 3:58 PM

Thanks alex for you response .Do you mean I should add input_hydration_config also

alex

09/03/2019, 4:02 PM

you should do that if you want to be able to provide for example the csv path as config and load the dataframe that way

alex

09/03/2019, 4:04 PM

you could also consider using the DataFrame type from

dagster-pandas

which already has this set up

alex

09/03/2019, 4:04 PM

https://github.com/dagster-io/dagster/blob/master/python_modules/libraries/dagster-pandas/dagster_pandas/data_frame.py

achintamiri

09/03/2019, 5:06 PM

I am continuing with my existing code which I shared and did the changes as suggested by you

@lambda_solid

def Input1() -> DataFrame:

r = pd.read_csv('file1.csv')

return r

I can see the pipline execution is failing for agster.core.definitions.events.Failure: Value None should be of type DataFrame. this is for merge node in my code

achintamiri

09/03/2019, 5:07 PM

achintamiri

09/03/2019, 5:08 PM

r and r2 are coming as any type where as i am expecting dataframe

achintamiri

09/03/2019, 5:09 PM

And sorry I didnot understand how to use DataFrame type from

dagster-pandas

in my current python code file

achintamiri

09/04/2019, 12:48 PM

Hi ,PLease ignore this issue

achintamiri

09/04/2019, 12:48 PM

it is solved

achintamiri

09/04/2019, 12:49 PM

I used

def Merge(r:DataFrame,r2:DataFrame) -> DataFrame:

Open in Slack

Previous Next