Hey Dagster team, very inspirational medium post. ...
# announcements
Hey Dagster team, very inspirational medium post. We're a new ML startup and many primary pain points were discussed. Looking for advice/examples to get up and running, our needs: • Primary data Postgres + Spark • Secondary data flow is ingestion of large corpus of scanned text pages (order of tens of mlilions), Need to ocr, extract, de-identify, store as unstructured (evaluating spark tesseract and GCP DLP) • evaluating SparkNLP for further text preprocessing / reduction • Spark SQL DataFrames for aggregation pipeline (order of millions or tens of million of aggregate rows generated, offline batch) • Have not selected "learning framework" yet, but likely TFX or Torch. TFX runs on AirFlow so could be a fit.
our most substantive multi-tech example like this is the “airline demo”
we still need to write some prose describing what is going on
but if check out the repo
install dagster and dagit
the cd to examples
and run dagit
here are screenshots from dagit
and the code lives in examples/dagster_examples/airline_demo
✔️ 1