Luke07/28/2022, 4:38 AM
Huib Keemink07/28/2022, 7:21 AM
Luke07/28/2022, 3:35 PM
Pipeline nodes could be primitive transformations (
from pipelines.features import SeaonalFeaturizer # create feature extraction pipeline from default boilerplate pipeline = SeaonalFeaturizer() # inspect default pipeline # returns `yaml` definition? # a display method would be nice too (DOT, html, etc) pipeline.inspect() # create custom nodes . . . # modify pipeline # drop unneeded node / step # add two new custom ones that are project specific custom_pipeline = pipeline .drop_node( . . . ) .add_node( . . . ) .add_node( . . . ) # could be local (single process or local spark) # or remote (spark) results_df = custom_pipeline.run(df, config) # save transformed data frame results_df.save( . . . )
), flow control (
), other pipelines, etc… Sklearn Pipelines, neuraxle, feature_engine, pdpipe, sspipe, Apache Beam, etc… allow for portions of the workflow I'm looking for. Seems like Dagster with just the Python API for pipeline definition and inspection (without dagit, logging, scheduling, etc…) could be a fit.