Question: how do I reduce latency between short du...
# announcements
j
Question: how do I reduce latency between short duration solids? Background. I'm not using any executor, just testing on my on windows PC. (I'm translating a pipeline I used in airflow to dagster, so far dagster is much faster.) I have a composite solid that calls 3 subsolids, it reads a list from a web service, then sends writes that list to a folder, and then publishes that same list over kafka. The latency between each solid is about 1/3 of second. If this one done in one solid, the operations occur almost immediately. Is there a way to reduce that latency via configuration?
a
I think at this time just putting it in to one solid is your best bet.
what leads you to want them to be individual solids?
j
In my particular use case, the solids/subsolids represent an activity/activities we want to report on and in real-time get the latest status. So for this particular customer, traceability of the process is just as important as doing the process. An analogue would be a purchase order from Amazon, it's nice to see the logistical status of how that order is going (order received, warehouse identified, shipped from Az to CA, out for delivery, .....). So I'm using the pipelines and solids as boundaries for those kind of reports.
a
cool, thanks for providing that perspective
There is some overhead cost to each solid, the events emitted, type checks, data persistence / checkpointing. 300ms is a bit much but depending on the data being passed isn’t outside of reason. If you do some profiling you may reveal some performance issues in the library for your workflow. I am personally a big fan of
py-spy
with the
speedscope
format.
j
thanks. will give those a shot.