Michael Lynton
03/22/2021, 9:45 PMread_csv
concept from the tutorials. Since I wanted to use two csvs and pass each value into that solid, I created a composite solid (not sure if that is unneeded here - is that the right approach?). I did this, so that I could pass each url/csv into the read_csv function and get a dataframe output to work with downstream.
I was able to run the pipeline just with my composite solid, which I felt was a win. I get some basic logging with number of lines in each dataframe. Pretty cool.
My composite solid looks like this:
@composite_solid
def read_csv_multi():
read_intl = read_csv.alias("read_intl")
read_us = read_csv.alias("read_us")
return (read_intl(), read_us())
Now, where I am stuck is, I’m not sure if this is the right approach because, like I said, I want to use each dataframe (Intl & US) in other solids downstream, but probably separate solids because they are different structures and need to be processed differently.
Do I need to do something with yielding output definitions on the composite solid, so that I can use the output as input in each of my “rename/prep” solids? I am pretty sure that I tried to do something like yield Output(read_us(), "us_data")
inside of the composite solid above, but I got a weird message like:
@composite_solid read_csv_multi returned problematic value of type <class 'generator'>. Expected return value from invoked solid or dict mapping output name to return values from invoked solids
Kinda feels like I’m spinning my wheels all to avoid duplicating my read_csv solid. Sure, I could duplicate it but I feel like I don’t really need to, right? Thanks for any pointersalex
03/22/2021, 9:52 PMcomposite_solid
piece for now - that should help simplify things.
The problem you are running in to specifically is a composite_solid
+ multiple outputs is cumbersome / unintuitive - to have multiple outputs from a composite solid, it would look like this:
@composite_solid(output_defs=[OutputDefinition(name='us'), OutputDefinition(name='intl')])
def read_csv_multi():
read_intl = read_csv.alias("read_intl")
read_us = read_csv.alias("read_us")
return {'intl': read_intl(), 'us': read_us()}
Dagster Bot
03/22/2021, 9:54 PMMichael Lynton
03/22/2021, 10:55 PM