I hooked up OpenTelemetry trace context propagation in Dagster so that we can get traces of a job's `@op`s across multiple threads / CPUs. I'm considering making the effort to open-source it, would folks be interested in that?
For example you get traces like what's attached, where each op gets a span (line in the waterfall), ops within a subgraph are grouped, and also arbitrary traced functions internal to the computation can get spans as well. Relative to the Dagster job waterfall, you get some more tracing details and the performance/query oriented UI of whatever OTel app you choose. Also attached is a heatmap from Honeycomb, showing max resident set size of ops in our production pipeline.
10/17/2022, 11:56 PM
Yes! We're heavy Otel users across other services but I haven't sat down to add any to our dagster stuff. At least partially because you already get reasonably good perf breakdowns direct from dagit but it's still on my radar
10/24/2022, 7:26 PM
also super interested in this, would love to see it open-sourced