Hello everyone, my name is Horatio
I am a beginner in using Dagster and also new to the field of data science. However, my upcoming job requires me to work with data pipelines, so I learned about Dagster, but I am still quite confused.
I have a question: if I need to regularly sync data from MongoDB to Elasticsearch, what are the best practices?
Since the data volume is huge, with hundreds of millions of large JSONs, I don’t plan to read them all at once, but instead to read them in pages. In this case, how should I define a MongoDB collection as an asset, or should I not define it as an asset, or is Dagster not even the best tool to handle this task?