Hi I work at Hugging Face and we are on our way to enhance t dagster #introductions

Hi! I work at Hugging Face, and we are on our way ...

Sylvain Lesage

01/04/2023, 3:46 PM

Hi! I work at Hugging Face, and we are on our way to enhance the Hub dataset pages. We already show some details about the datasets, such as the 100 first rows, and we want to preprocess more insights to be displayed on the dataset page. The pre-processing steps are currently managed with ad-hoc code. We are investigating Dagster to manage the jobs/dependencies/storage/etc. stuff, which would help us focus more on the data processing itself. The project is open source, btw. Dagster is a very nice tool, and after reading the doc and doing tests, I still have conceptual questions, so... possibly I’ll be looking for help. Thanks for your patience!

wave anim 9

Sylvain Lesage

01/04/2023, 3:59 PM

well, my first question has already been answered ✅ https://dagster.slack.com/archives/C01U954MEER/p1672785675955469?thread_ts=1672775848.665809&cid=C01U954MEER

Stephen Bailey

01/04/2023, 4:25 PM

I wouldn't count them out -- personally, I think assets are the most compelling reason to use Dagster, and operating from an asset-first mindset keeps things much simpler in the long run. I started off going bananas on dynamic graphs and ops, and have moved to using assets for nearly everything. You can still use assets to generate changing outputs -- for example, you could have an

email_publisher

asset that gets materialized with different configurations. But there are certainly trade-offs

Sylvain Lesage

01/04/2023, 4:27 PM

Oh OK, thanks. I’ll ask my question in #dagster-support with that in mind then

Sylvain Lesage

01/04/2023, 4:31 PM

https://dagster.slack.com/archives/C01U954MEER/p1672849829826089

sandy

01/04/2023, 4:35 PM

Hey Sylvain- I’m the lead eng on Dagster and a fan of Hugging Face. Let me know if it would be helpful to chat

❤️ 1

Sylvain Lesage

01/04/2023, 4:43 PM

Sure! I would love to. I’m not a data engineer and many concepts are new to me. I asked my first question in #dagster-support, but overall I’m still wondering how well Dagster (or other similar tools like Airflow) are adapted to my problem, and if so: how to organize my code when it will depend on Dagster (for example: how to trigger jobs, via the GraphQL API, directly in Python, etc.)

9 Views

Open in Slack

Previous Next