Hello everyone! I am currently employed as a softw...
# random
a
Hello everyone! I am currently employed as a software engineer at an outbound marketing company where we extensively handle people data. Our objective is to streamline and enhance our data processing tasks by leveraging automation and we are investigating the use of Dagster. None of the team (including myself) are familiar with Dagster and are looking into its capabilities. The task of data processing currently involves several manual steps, including data validation, error checking (such as identifying trailing spaces, emojis, and invalid inputs), all of which is done within the Google Sheets platform. Our vision is to establish a comprehensive data processing workflow that encompasses the following stages: Data Collection: We gather data in CSV format using data scraping tools like Skrapp, SalesNav, and Apollo. Data Validation and Cleaning: Our aim is to automate the validation, cleaning, and sanitization processes for the collected data. We intend to employ techniques that identify common errors such as trailing spaces, emojis, and incorrect inputs. External Validation: Additionally, we plan to implement external validation using tools such as ZeroBounce, which will further enhance the accuracy of our data. Database Integration: Our ultimate goal is to seamlessly integrate the processed and validated data into a database system. In this case, we’re considering using PostgreSQL. In pursuit of this ambitious vision, we are exploring the utilization of Dagster. We are particularly interested in understanding the feasibility and efficiency of using Dagster for our data processing needs. We are seeking insights from the community to better understand whether implementing this approach is both possible and practical for our scenario. Any advice, recommendations, or experiences in using Dagster for similar data processing workflows would be greatly appreciated. Many thanks😊
❤️ 2
daggy love 1
p
Welcome to the community! Let us know if we can help in any way. You might want t join #data-quality-asset-checks as we're thinking a lot about how Dagster can improve the ability ot ensure data quality, which is in line with your 2nd point too. Would love your feedback and good luck with your journey