Hi, where can I read more about how solids send da...
# announcements
Hi, where can I read more about how solids send data to each other depending on runtime engine used? We finished data importer for our use case lately which we consider implementing in Dagster but I am concerned about performance as data is huge. Also is there any information on how to control batches of data and error handling. For example if I have one file which is batched as 100 batches each having 3 steps, how can I control what happens if batch 56 fails at step 2. We currently implemented this in code and DB state and we have complete flexibility on what to do (cancel only the failed batch but accept import, cancel import, retry batch, accept import but log the batch failure and revert it's changes, etc.). We could easily convert this code to Dagster as is currently with DB state of import progress, however I have a feeling that state we maintain about batches, stages, imports could come as freebies if we implemented it in Dagster in a stateless way but then again I am afraid of losing flexibility on error handling I mentioned above. Any advice? Thanks.
There’s a lot in this question 🙂 High level, Dagster has no special handling of big data. Dagster is meant orchestrate computation, not provide too much management for the computations themselves. Big Data orchestrated with Dagster is meant to be actually processed in frameworks like Spark or systems like a data warehouse (e.g. Snowflake, BigQuery, Presto or Redshift)
Thank you for the clarification. We already looked at both Spark and Redshift but for some reason no ready made solution satisfies our data error handling and response requirements.