Chris Histe
03/16/2023, 3:11 PMgcs_pickle_io_manager
since it’s required by the multiprocess executor. Unfortunately, this makes one of our pipeline run extremely slow compared to local development sharing data in memory.
Nodes in our graph spend almost the whole time in “preparing” as you can see in the screenshot. This pipeline is scheduled every hour and takes almost one hour being at risk of not completing in time.
Can you explain what happens in that “preparing” phase making it take so long? And potentially ways to improve performance?
I’m thinking to use the single process executor for that job. Is it common to have some jobs run in multi process and some in single process? Would you recommend not to do that? If yes why?
I can’t increase the runs concurrency because I would risk hitting BigQuery’s concurrency limit.Chris Histe
03/16/2023, 3:14 PMZach P
03/16/2023, 8:42 PMChris Histe
03/16/2023, 8:57 PMChris Histe
03/16/2023, 8:57 PMZach P
03/16/2023, 9:01 PMChris Histe
03/16/2023, 9:10 PMZach P
03/16/2023, 9:15 PMdagster dev
command to run it locally.
We also have debug profiles set up to do this in VSCODE, and have resources parameterized such that we can control what settings/resources are used via environment variables. (EG: Local memory testing by default, staging/‘live’ tests when it’s equal to another one, than final prod is another one still.
Setting breakpoints itself however may be a bit challenging to figure out where exactly it’s taking time. (I mean you could just click ‘next’ until it works).Chris Histe
03/16/2023, 9:22 PMChris Histe
03/16/2023, 9:23 PMZach P
03/16/2023, 9:23 PMZach P
03/16/2023, 9:24 PM"name": "STAGING - Debug Dagit",
"type": "python",
"request": "launch",
"module": "dagit",
"cwd": "${workspaceFolder}",
"args": [
"-w",
"workspace.yaml"
],
"console": "integratedTerminal",
"justMyCode": false,
"env": {
"DAGSTER_DEPLOYMENT": "staging",
"DAGSTER_CLOUD_IS_BRANCH_DEPLOYMENT": "1",
"DAGSTER_CLOUD_PULL_REQUEST_ID": "0"
}
},
(also note, this uses dagit
instead of dagster dev
, I’ve been procrastinating on updating them 😬 )Chris Histe
03/16/2023, 9:27 PMChris Histe
03/16/2023, 9:28 PMJordan
03/16/2023, 10:06 PMChris Histe
03/17/2023, 2:45 PM