Has anyone come up with a way to facilitate near-r...
# ask-community
k
Has anyone come up with a way to facilitate near-real-time tabular previews of materialized Pandas/PySpark assets using DuckDB or something else? I tried using adding a step in my IO manager after parquet export to additionally create tables within DuckDB, which works, but I didn't realize the concurrency model only supports 1 connection at a time if you're doing any writes. I created an issue for that here – the relevant friction point being:
I currently have another window open in the DuckDB shell/DBeaver to inspect the result tables, but if I leave that window open, I can't make any changes [or rebuild an asset in Dagster], and I need to manually disconnect, build and then reconnect.
The ideal preview workflow would be: • Make a code change to one of my assets • Re-materialize in Dagster • Switch tabs or windows (if I need to refresh, that's fine) and see an updated preview of my dataset with >100 rows, in some kind of scrollable tabular/spreadsheet preview which is higher-fidelity than a Markdown or JSON metadata preview Next attempt I can imagine is dumping everything to Postgres or another DB and having a SQL client open in another window, but keen to hear if anyone has engineered a more integrated solution.
Hi folks, any ideas here?