The cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability.

dagster

Has anyone come up with a way to facilitate near-real-time tabular previews of materialized Pandas/PySpark assets using DuckDB or something else?

I tried using adding a step in my IO manager after parquet export to additionally create tables within DuckDB, which works, but I didn't realize the concurrency model only supports 1 connection at a time if you're doing any writes. I created an <https://github.com/duckdb/duckdb/discussions/7666|issue for that here> – the relevant friction point being:

&gt; I currently have another window open in the DuckDB shell/DBeaver to inspect the result tables, but if I leave that window open, I can't make any changes [or rebuild an asset in Dagster], and I need to manually disconnect, build and then reconnect.
The ideal preview workflow would be:

• Make a code change to one of my assets
• Re-materialize in Dagster
• Switch tabs or windows (if I need to refresh, that's fine) and see an updated preview of my dataset with &gt;100 rows, in some kind of scrollable tabular/spreadsheet preview which is higher-fidelity than a Markdown or JSON metadata preview
Next attempt  I can imagine is dumping everything to Postgres or another DB and having a SQL client open in another window, but keen to hear if anyone has engineered a more integrated solution.