Hi everyone! I'm a bit new to dagster and I need a...
# ask-community
d
Hi everyone! I'm a bit new to dagster and I need an advice. I'm building an ETL orchestrator tool and don't know what abstraction is the best. What do we have: • Templates of DataSources (SQL, s3, etc) - resources? • DataSources (connection string) - io_manager? • Dataset (table, file) - asset? • Transformation (sql, dataframe, etc) - op? Basic task: Get some file form s3, transform it, save to SQL. DataSources and transformations should be parametrized. Maybe you know good examples for similar use case. Thanks in advance!
z
Just a clarification - transformation occurs within an asset, so an asset represents a dataset and the transformation required to create the dataset from its dependencies (other datasets). Here's a pretty cool example a user from the community put together using mostly sql-based transformations
d
Thanks, looks pretty similar.