Hi Dagsters, I know that Dagster doesn't have sch...
# announcements
o
Hi Dagsters, I know that Dagster doesn't have scheduler out-of-the-box. But can you recommend any schedulers which will be easy to pick up and integrate?
a
take a look at this https://dagster.readthedocs.io/en/0.5.5/sections/deploying/deploying.html and let me know if you have any other questions
t
These are really great guides I hadn't seen before! Thanks @alex!
😄 1
o
Ok, that is what I tried to avoid - Airflow and Dask. I'm a data analyst, not an engineer, so I was hoping for something which will be lightweight. And here I see yet another framework I need to learn. So, going with cron is a simplest solution for me right now.
a
Ya Im not personally aware of any other light weight alternatives besides just using cron.
o
Anyway, thanks for a help. Will try to learn more about Airflow or Dask.
a
If you have a second, could you describe how your dream dagster integrated scheduler would work? It would be nice to better understand what people in different scenarios are looking for.
o
Sure. Ideally I'd like to have a page (maybe in dagit) where I will see a list of all scheduled pipelines. I can configure time, job/pipeline to execute, toggle them, see last runs, etc. No need of cross dependency between jobs - it should done with dagster itself. One crontab line per pipeline so to say. Closest tool I've found is https://github.com/thieman/dagobah. But it is for 2.7 and most likely outdated.
a
Thanks!
s
^ Seconded ideal scheduler requirements
b
I have found that Cloudwatch events and step functions executing an ecs task in fargate is a good way to get something scheduled. You also get an execute button from the aws console with step function and console logs from ecs to Cloudwatch. You could even trigger alerts off execution errors. My plan was to give people the ability to create dagster apps and push to GitHub then we would take the app up load it to s3 and download it into the container on run and built all the aws stuff on deploy
If you needed to split the solids onto separate containers like in the Airflow use case. I was thinking of a way to generate the step functions definition from the dagster pipeline definition similar to the Airflow deployment example. Might be a cool project to work on.
o
@braunk Yeah, serverless functions are a good way to schedule something and what you've described is pretty cool. But I have a limitation - no aws or google clouds. We have our data on a private servers, and we can deploy only there, which means I have a bare hardware and Docker -_-