Hi - I am really new at this but looking for some ...
# announcements
m
Hi - I am really new at this but looking for some direction. I went through the tutorials and I know just enough to be dangerous. I’m trying to re-build an existing python script which takes a couple urls separately, downloads them (csv), then does some renaming and deduping, and then combines the dataframes and loads to a db. (The existing script doesn’t leveral any functional programming either) I started with a basic solid, repurposing the
read_csv
concept from the tutorials. Since I wanted to use two csvs and pass each value into that solid, I created a composite solid (not sure if that is unneeded here - is that the right approach?). I did this, so that I could pass each url/csv into the read_csv function and get a dataframe output to work with downstream. I was able to run the pipeline just with my composite solid, which I felt was a win. I get some basic logging with number of lines in each dataframe. Pretty cool. My composite solid looks like this:
@composite_solid
def read_csv_multi():
read_intl = read_csv.alias("read_intl")
read_us = read_csv.alias("read_us")
return (read_intl(), read_us())
Now, where I am stuck is, I’m not sure if this is the right approach because, like I said, I want to use each dataframe (Intl & US) in other solids downstream, but probably separate solids because they are different structures and need to be processed differently. Do I need to do something with yielding output definitions on the composite solid, so that I can use the output as input in each of my “rename/prep” solids? I am pretty sure that I tried to do something like
yield Output(read_us(), "us_data")
inside of the composite solid above, but I got a weird message like:
Copy code
@composite_solid read_csv_multi returned problematic value of type <class 'generator'>. Expected return value from invoked solid or dict mapping output name to return values from invoked solids
Kinda feels like I’m spinning my wheels all to avoid duplicating my read_csv solid. Sure, I could duplicate it but I feel like I don’t really need to, right? Thanks for any pointers
a
I think I would recommend skipping the
composite_solid
piece for now - that should help simplify things. The problem you are running in to specifically is a
composite_solid
+ multiple outputs is cumbersome / unintuitive - to have multiple outputs from a composite solid, it would look like this:
Copy code
@composite_solid(output_defs=[OutputDefinition(name='us'), OutputDefinition(name='intl')])
def read_csv_multi():
    read_intl = read_csv.alias("read_intl")
    read_us = read_csv.alias("read_us")
    return {'intl': read_intl(), 'us': read_us()}
@Dagster Bot docs composite_solid multiple outputs
d
m
@alex thanks for the feedback!