https://dagster.io/ logo
#ask-community
Title
# ask-community
m

Michael Cowling

12/09/2022, 3:28 PM
Hello, I’ve inherited a dagster pipeline and still getting to grips with it. We’re doing some refactoring. The original pipeline has some pandas dataframe objects which go through some ML model training. The minor changes I’m trying to implement are some filtering on the dataframe. e.g. df = df[df[‘my_column’] == 0] But this runs into some Type issues. e.g.
AttributeError: 'InputMappingNode' object has no attribute 'copy'
Now obviously the type of this object should be a dataframe, but there’s some interaction which changes the type? Does anyone know how to interact with dataframe objects in the ‘normal’ way? Struggling to find the right information in the dagster docs or google
dagster bot responded by community 1
a

Auster Cid

12/09/2022, 4:47 PM
From that error I'd assume you're trying to do said filtering inside a job/pipeline definition.
Job/pipeline definitions only really define dependencies between ops/solids. Your filtering should be done inside one of these ops/solids.
m

Michael Cowling

12/09/2022, 4:49 PM
it’s inside a @graph at runtime, and the graph is inside a job. So I need to make an op to do this filtering, and place the op inside the graph?
a

Auster Cid

12/09/2022, 4:49 PM
yep, a graph is also only defining dependencies between ops, forgot to include it, sorry
m

Michael Cowling

12/09/2022, 4:50 PM
great, thank you, will try that
a

Auster Cid

12/09/2022, 4:50 PM
yeah, or doing it inside one of the existing ops if it makes sense there
👍 1
33 Views