https://dagster.io/ logo
#random
Title
s

Simon Späti

03/18/2021, 7:38 AM
I posted a new blog post where dagster plays a major role. It’s very hands-on, and the code is open. So maybe it helps some of you guys starting with dagster: https://sspaeti.medium.com/building-a-data-engineering-project-in-20-minutes-85c37cad4d87 Included are tools like beautifulsoup, S3, MinIO, spark, delta lake, jupyter notebooks, druid, superset. And everything is managed together with dagster, of course! 😉 Plus, everything runs on top of #kubernetes and deploys in any cloud and locally on your machine. Hope you enjoy, let me know what you think, always enjoy comments and feedback as I’m doing this in my free time 😅.
🙌🏻 1
👀 3
🙌 17
🙌🏼 1
Just in case you don’t have a paid Medium account, you can read everything also on my blog: https://www.sspaeti.com/blog/data-engineering-project-in-twenty-minutes/
🙌 1
l

Luke S

03/19/2021, 10:24 PM
@Simon Späti this is a great write up. I bookmarked it a few days before you posted it here and have been referencing it since.
s

Simon Späti

03/20/2021, 10:03 AM
Thanks @Luke S, very happy to hear 😀👍🏻
l

Luke S

04/20/2021, 12:20 AM
@Simon Späti I'm working on a pipeline implementing delta tables. I've integrated my delta lake with Athena, which has JDBC / ODBC endpoints I can use to connect to BI tools. I'm wondering if you've found other ways to query delta tables without provisioning a SQL server. It'd be nice if Databricks would release a serverless version of their SQL Engine.
From what I can tell, you have to have a cluster running to use the Databricks SQL engine... which I assume means you're billed for on-time, not just usage (like you are with Athena).
Delta + Presto integration guide: https://docs.delta.io/0.7.0/presto-integration.html
4 Views