Hi team I d like to ask if it is good idea to use DynamicOut dagster #ask-community

Hi team! I'd like to ask if it is good idea to use...

Mykola Palamarchuk

10/21/2022, 1:02 PM

Hi team! I'd like to ask if it is good idea to use DynamicOutput for the big data sets (~10k records). I experience very big overhead (even with

mem_io_manager

dagster bot responded by community 1

Zach

10/21/2022, 2:00 PM

I've ran jobs of similar size, but we kept the max_concurrent_runs relatively low (~30ish running in parallel at one time). I didn't have any real problems, aside from just needing to make sure logging wasn't too verbose as I found that slowed down Dagit a lot after all the logs piled up (although it seems from recent release notes that that may have been improved recently)

Mykola Palamarchuk

10/21/2022, 2:06 PM

@Zach which

executor_def

and

io_manager

did you use? And which type of

runLauncher

Zach

10/21/2022, 2:10 PM

I used the MultiprocessExecutor, with a EcsRunLauncher with 4vcpu / 8GB ECS tasks. IO was through a custom tool for writing data to Delta tables. One important thing about my setup was that most of the actual compute was taking place in Databricks, using a custom version of the databricks_pyspark_step_launcher from dagster-databricks

Zach

10/21/2022, 2:14 PM

What kind of overhead are you seeing? All the steps are going to run on a single Dagster worker, so if you're not sending most of your compute to another provider like Databricks or EMR you'll probably need to limit the parallelism and up the resources for the worker. The

mem_io_manager

might actually make things worse for large fan-outs as all the outputs have to be held in memory

Mykola Palamarchuk

10/21/2022, 2:21 PM

My job is just traversing through some http api collecting hierarchical data. So no heavy computation except waiting for the response. I run it locally, so memory should not be a problem. But we run it in K8s on prod, so may be we will need some tweaks. Probably amount of logs is the problem #1 as it runs in debug mode locally.

Open in Slack

Previous Next