Yeah to reiterate, this type of software abstracti...
# announcements
Yeah to reiterate, this type of software abstraction can’t magically fix everything. This is the way we think about it.
Copy code
Principal: Errors that can be caught by unit tests should be caught by unit tests.
Corollary: Do not attempt to unit test for errors than cannot be caught by unit tests.
E.g. you’ll still want to run spark against large data sets in a staging environment because there are bugs that can only be caught at that stage. Our goal is to structure this so that we can catch more bugs earlier in the process. E.g. we shouldn’t be encountering any config parsing errors or python syntax errors in prod or a late stage integration test. Then once you trust those earlier stage processes more, you can start, for example, doing refactoring tasks with confidence that you will very likely not break prod. Our eventual goal is to then kickstart an ecosystem based on these new abstractions where folks do build things like fakes for common cases that are in fact fake-able. I would bucket (ha ha) s3 in this category, as an example.
❤️ 3
definitely is still quite a bit of work to write this stuff in a testable fashion, but it was actually pretty fun and satisfying to write these. was able to get a lot of s3 stuff working without actually touching s3.
The reason for my question (probably) has its roots in the same problems you are trying to address with dagster, ie. the fact that these things are notoriously difficult to solve in a real world scenario. On a high level, it is often easy to understand how the business logic should work, but making that actually testable, when the full pipeline is a combo of every format, protocol and language under the sun, can make it very difficult to do in practice. In a perfect world I would prefer to be able to construct unit tests that are tech agnostic, but in practice this can be very difficult, as the real world boils down to complex integration chains and huge datasets that divert attention away from the business logic to the lower level tech. Like I said, I'm still very novice to this framework so will continue looking at the tutorials/examples, and probably don't yet understand all the parts that well. So please bear that in mind while reading this 🙂
👍 1