https://dagster.io/ logo
Title
s

Simon Späti

10/19/2022, 12:35 PM
Has any of you played with Rust already? I think it will play a significant role in data engineering in the future; still early, but some signs (ideal for DE, most loved languages, etc.) are there. I wrote a small post if anyone has an opinion or is curious. Ideas, comments, and statements are most welcome. I guess working with dagster, someone would wrap the rust library into a python package for now. Or do you have any other thoughts about it? 🙂
m

manny schneck

10/19/2022, 2:58 PM
My experience with "let's stop using python because it's too dynamic/slow/flaky etc" was that when I went back to python, I was immediately much more productive because I had tight feedback loops available to experiment with the data in pandas/ipython.
It's probably true that the smartest people who can write the code correctly the first time and don't need feedback loops/experimentation can achieve better outcomes with rust, but I've given up on making it into that set.
👍 1
z

Zach P

10/19/2022, 8:38 PM
I don’t see python going anywhere, but I’m also a big fan of rust. I think it’ll probably slowly take over as a “Data Engineering Systems” Language as time goes on, dethroning things like C++, C, Java, and Scala. While I agree the feedback loops possible in python are really great and I love them, I also find that a huge part of my time goes into fixing things that would be impossible to compile in rust. One things I also am curious about is we also see python evolving to support more complex typing, and improving runtime performance. There’s even tools such as beartypes that allow runtime checking and dagster has their own built in runtime checking. Part of me wonders if rust will take over, or if it will instead cause a paradigm shift in tooling in other platforms that lessons it’s unique usefulness. (EG: Will python 3.14 some day be 80% as fast as rust with most of the safety features as optional?, Will new c++ versions enable a “safe mode” similar to rust?
👍 2
:dagster: 1
☝️ 1
m

manny schneck

10/19/2022, 8:54 PM
I found pandera to be really great at bringing static types to relational programming.
z

Zach P

10/19/2022, 8:58 PM
Oh interesting, it looks quite cool 🙂 May use it in the next pipeline I write. How was the integration with dagster? The data synthesis and pyspark typing looks especially cool!
m

manny schneck

10/19/2022, 9:26 PM
I never got to find out about the integration with Dagster--I was trying to pivot a startup's stack at the time, and burned out in a pretty hard way. What I really liked about it is that I could exit the type system to to do aggregations, add temp columns, and then run a validator function that would take my dataframe, check that it matched the schema, and give me a typed value back.
OTOH, when I tried to build the project to try contributing, I got stuck in python version/packaging hell (I was on arch linux at the time, so not on the happy path)--which kind of reinforces the anti-python point.
Although this article doesn't seem to have been written by anyone with deep experience in either: "Rust makes it easy to integrate and communicate with other languages through a so-called _foreign function interface (FFI)_."
z

Zach P

10/19/2022, 9:45 PM
I suppose that parts relative. “Easy” maybe not, but it’s a path becoming quite well trodden & battle tested. At the end of the day, I dont think rust can become the new “lingua franca” for data engineering. Python is already a little too hard to learn for many data users (hence many people still using excel, sql, etc.). And despite it’s many benefits, is probably even harder than things like scala that while foundational never found the “lingua franca” status due to their complexity imo.
👍 2
c

Chris Comeau

10/20/2022, 12:24 AM
Following the "functional core, imperative shell" pattern... I see python sticking around for the shell, then rust in the core. Stability of arrow-odbc over turbodbc is a good example for moving to rust over c++.
m

manny schneck

10/20/2022, 2:12 PM
n

Nicolas Parot Alvarez

10/26/2022, 3:36 PM
I think Python in data is a bit like SQL, an easier API to use more complex tools that have been coded with more performant languages. Rust will probably replace those more performant languages, not the easy API Python provides. So it will impact the backend engineers who build the data engineering tools (such as Dagster people), but not the data engineers who only use those data engineering tools. Gotta say that I enjoy a lot that I can easily dive into Dagster's source code when I need to understand some feature or hack some use case, I probably wouldn't be able to do that if it was in Rust. Btw, if you're a bit strict with your Python type hints as I like to be, a good IDE will catch part of your mistakes at coding time too.
👍 1