Sandeep Aggarwal
01/18/2022, 1:06 PMexecute_in_process
API to process a graph with ~15 ops.
I am observing significant performance drop when switching to a persistent dagster instance with SQLLite/Postgres based run & event log storages. The execution time increases to 4 seconds
which was earlier taking around 250ms
. The executor is still the in-process one, so I guess its DB writes that are causing this overhead. Is that expected?
Below are screenshots for execution times.sandy
01/18/2022, 4:26 PMalex
01/18/2022, 7:19 PMpy-spy
to determine where exactly the slowdown is.
SQLLite/PostgresThe details here will have a big impact. Is it sqlite or postgres? If posgres where is the DB running?
Sandeep Aggarwal
01/19/2022, 10:20 AMlog_dagster_event
is taking significantly more time, 20ms
with in-memory compared to 800ms - 1200ms
with persistent storage. You might have more insights.
I am attaching the files for your reference. Can you please take a look?alex
01/19/2022, 3:27 PMop
runtimes on the order of minutes or even hours, so these 100s of millisecond overheads per event have not yet been a focus.Sandeep Aggarwal
01/19/2022, 6:13 PMalex
01/19/2022, 6:22 PMis possible to use multiprocess_executor or dask_executor with ephemeral Dagster instance?Not easily, and i expect you to hit further latency issues from the per-process overhead. I believe https://github.com/dagster-io/dagster/issues/4041 is what you would need for your use case.
Sandeep Aggarwal
01/19/2022, 6:32 PMalex
01/19/2022, 6:38 PM