https://dagster.io/ logo
s

sephi

03/04/2020, 9:07 AM
We want to work with dask dataframe as a dagaster type . What is the best strategy for running such process in dagster?
a

abhi

03/04/2020, 3:33 PM
To better understand your question. Are you asking how to use dask dataframes as Dagster types? Or are you asking about how Dagster works with the dask executor? For the former, it would be like any other Dagster type, I would check out the custom type section in the tutorial. For the later, I would check out our Dagster dask integration: https://dagster.readthedocs.io/en/master/sections/deploying/dask.html
s

sephi

03/04/2020, 3:49 PM
We are thinking about the former. from your reply we understand that we should follow Dagster type tutorial. Is it relevant to contribute this type to the project?
a

abhi

03/04/2020, 4:44 PM
That’s right. I would use that pattern and see if it expresses everything you need of not, let us know. As far as contributing it, I’d say implement it in your pipelines first and as it becomes a recurring pattern in your pipelines file and PR and we can discuss there. Super curious to see how you all end up using it in yalls pipelines!
s

sephi

03/05/2020, 3:20 PM
We have implemented most of dagster_dask_dataframe_type, hope to share the code in the future. regarding the test_date_column function in pandas test suite - what is the test checking for? the current test puts the
date
as the column name and not as value in the column
In the test_config_driven_df.py there is an overwrite of the
filename
parameter - we implemented the test without the overwrite - is that OK? Once we verify the implementation in our pipeline we will be happy to submit a PR
please let us know if a PR is relevant
5 Views