Umar Hussain
01/05/2023, 8:35 PMFraser Marlow
01/11/2023, 6:47 PMVinnie
02/09/2023, 5:08 PMDavid Fernández Calle
02/14/2023, 9:12 AMAdebayo Adejare
02/20/2023, 8:20 AMFraser Marlow
02/22/2023, 10:30 PMFraser Marlow
03/06/2023, 8:14 PMFraser Marlow
03/08/2023, 3:18 AMHow we deploy faster with warm Docker containers
Fraser Marlow
03/08/2023, 4:58 PMJake Kagan
03/09/2023, 5:01 PMFraser Marlow
03/16/2023, 6:44 PMFraser Marlow
03/18/2023, 12:49 PMFraser Marlow
03/20/2023, 2:36 PMFraser Marlow
03/20/2023, 3:35 PMFraser Marlow
03/21/2023, 3:00 PMNicolas Galland
03/21/2023, 4:18 PMDario De Stefano
03/22/2023, 9:24 AMVinnie
03/23/2023, 4:13 PMBennett Norman
03/31/2023, 6:49 PMFraser Marlow
04/10/2023, 2:39 PMFraser Marlow
04/11/2023, 1:05 AMAndrea
04/12/2023, 12:49 PMTrinoQuery
type handler that allows the user to pushdown storage and compute of assets to Trino without taking the data out, passing a reference to the Trino table. (the Dagster DbTypeHandler
system is really amazing, the library heavily use it, super thumbs up for the person who had the idea!)
• A set of type handlers that allow to "side-load" Trino data, ie, when using a Trino catalog with Hive metastore, it allows Dagster to automatically find underlying Parquet Data and directly load it from S3/GCS/HDFS/etc... which is a lot faster than using the Trino Client, especially for larger data (i have a benchmark in the example folder showing 10x speed of read even for a small-ish 300MB dataframe).
• The type handlers are composable, so a Parquet File type handler is used to build an Arrow Table handler, which is used to build a Pandas handler... it makes it very simple to build custom type handlers. In the example folder i have an example showing a custom Polars type handler in just a couple of lines of code (just converting from/to arrow instead of going all the way to Trino).
• I adapted the dagster dbt jaffle shop example to work with dagster-trino instead of duckdb, next i plan to add an example showing the use of ibis (python dataframe library that can be used with Trino) and distributed system such as Spark/Dask/Ray with distributed reads on Trino data using the parquet type handler.
The library is still rough around the edges, but if anyone here is interested in having a look and get me some feedback, it would make my day!Arun Kumar
04/12/2023, 10:44 PMAdebayo Adejare
04/24/2023, 6:48 PMFraser Marlow
04/24/2023, 7:44 PMSanidhya Singh
05/02/2023, 4:16 AMRoei Jacobovich
05/10/2023, 10:42 AMFraser Marlow
06/02/2023, 2:56 AMAlfie Johnson
06/02/2023, 1:04 PM