wave daggy celebrate wave We ll be hosting our next Dagster dagster #announcements

:wave::daggy-celebrate::wave: We'll be hosting our...

Odette Harary

05/23/2023, 6:56 PM

👋🎉👋 We'll be hosting our next Dagster community call on June 13th at 12PM ET. We see lots of developers doing with Dagster across data engineering, machine learning, and analytics, and we want to give you the opportunity to showcase some of those projects to the broader community. If this sounds exciting to you or someone on your team reply to this thread 🧵 to showcase on the next call. Looking forward to seeing what you've been working on!

D 1

Andrew Grigorev

05/26/2023, 1:31 PM

Hi Odette! I can tell about our usecase on the community call (maybe not this one, two weeks are pretty tight deadline for me to prepare for a talk). In summary, we have some date-partitioned assets for Clickhouse data processing with PySpark involving a custom Spark/S3 IOManager and a MetadataIOManager (a hacky thing, storing just an S3 path value in AssetMaterialization event metadata), some assets in a single partitions but with a complex graph ending with a dagstermill assets to get our data monitoring alerts instrumented by notebooks with precomputed tables and charts. We use a customized copy-pasted asset_reconciliation_sensor to schedule runs, but we are working on migration to the mainline asset auto materialization (and maybe it would be better to tell about our experience with the asset materialization after we finish this migration). Other interesting things to tell about - our docker image with our Python envirnonment including a custom Spark build with updated and extra jars, ECS RunLauncher and ECS cluster autoscaling config (though it is pretty much default, and the ECS deployment itself should be a separate topic probably).

Odette Harary

05/26/2023, 3:39 PM

Great - Will DM you

16 Views

Open in Slack

Previous Next