Q: Why are solids called “solids”? After learning...
# announcements
m
Q: Why are solids called “solids”? After learning (and falling in love with) dagster over the last few weeks; every concept I’ve come across has been named in a highly intuitive way that instantly makes sense. Except solids. Wouldn’t it be better to call these “units_of_work” or “compute_units”? Where did the name “solids” come from?
s
Hey David thanks for the kind words! Re: the name solid. We wanted to differentiate the core abstraction from other systems. In particular, using the term “task” was an anti-goal because the solid is actually a different level of abstraction from a task in a system like Airflow and similar systems. In those systems, a task is a deployed unit of work bound to a particular configuration and environment. Our logical graphs can be constructed prior to computation in an environment-agnostic fashion and meant to be reusable and executable in many different contexts. When we actually start execution the logical graph is compiled into an execution plan, which is comprised of steps. Steps are much more analogous to tasks. It is the execution plan that is bound to a particular configuration or environment. The name solids stems from the term “software-structured data”. Nearly all managed data in modern systems is produced by computation. It is all software-structured data. Dagster links data to computation and ideally every produced data asset if associated with one of our leaf nodes of compute. Therefore our leaf nodes are units of software-structured data. SSD for short. SSD already has a meaning (solid state drive) so we just made the name “solid.” The system has diverged a bit since that initial formulation (indeed the entire system was going to be called “Solid” rather than Dagster but Tim Berners-Lee already had a project called Solid) so it makes a little less sense now and can be a bit off-putting since it is a bit obtuse. I’ve considered alternative names (e.g. “op”) but they have their own problems and there would be a massive switching cost.
m
Thanks for the background!
Idea: Maybe the explanation above could be added in the docs somewhere; for future curious users like myself.
👍 1
@Xu Zhang’s answer that its because its a "solid" name could be the tldr; 😉
j
I liked the name solid. btw, I just came across dagster today. I spent two weeks evaluating airflow and reading about dagster today has been very comforting. I found the name solid intutive, because, it kind of is the re-usable and less changing aspect compared to the data and the execution environment which could change more frequently. Low entropy is solid.
👍 2