Hi All, I have two hopefully quick questions. 1. I...
# announcements
e
Hi All, I have two hopefully quick questions. 1. Is there a way to leverage parallel processing in Dagster ? 2. What's the recommended way to daemonize dagit to act as a constant portal for all pipelines ? (see comments on thread for more detail on question 1)
Here is a brief summary for question 1. Say we have a large (~15GB) excel file we want to load into a SQL database. In the Dagster documentation under the section "Multiple and Conditional outputs" it describes how a single solid could have multiple outputs. My first thought is to create a solid that would read the excel file and yield each chunk. Each chunk would be fed into another solid that would insert it into the database. However, the documentation specifies particular outputs and names them for the solid with multiple outputs ("hot_cereals", "cold_cereals"). Is there a way for a solid to have multiple outputs that are not named ?
a
We do not yet have any support for a
mapping
type operation in the dagster space. You can use
List
types and do a manual mapping process as seen in this example https://github.com/dagster-io/dagster/blob/master/examples/dagster_examples/toys/sleepy.py . This combined with using the multi process executor would give some fixed amount of parallelism. https://dagster.readthedocs.io/en/0.6.6/sections/deploying/deploying.html#execution. Another option would be to do that parallel processing within a single solid and defer to a system designed to solve that type of problem.
As for daemonizing dagit - I would probably recommend using a detached
screen
session as a simple place to start.
e
I see. I think that would work, this is great. Thanks Alex !
One quick follow up, the sleepy example still has named outputs. If I create a solid that is a generator will each output still be able to be piped to the next solid (aka process each chunk sequentially) ?
Something along the lines of this structure.
a
No - this is that
mapping
functionality I alluded to which we have not yet implemented
e
Got it. Thanks Alex !
b
how far down the roadmap is
mapping
?
Is there an Issue on GH that we can follow?
b
Thanks. Wanted to make sure I was following the right one. Appreciate it, alex