07/12/2019, 6:03 PM
Just heard about this a few days ago and feel this is very promising. I'm still very novice and still need to digest the docs more, but one thing I find unclear is how does Dagster address unit/integration testing of complex transformations that need to be scaled horizontally due to size with e.g. Spark when run in production? Are the unit tests in practice integration tests, where one defines a very limited set of inputs (say 10-100), runs them through the full cluster and then asserts that those produce the expected outputs? If inputs and outputs get serialized and deserialized, I can see potential for serious resource consumption. Has there been any thought given to integrating this with eg. Apache Arrow to deal with these types of issues? Again I apologize if I'm asking questions with obvious answers.
❤️ 1