https://dagster.io/ logo
#ask-community
Title
# ask-community
d

Dias Wesley

03/14/2023, 2:25 PM
Hi great community, I am looking for someone who can help me to understand how to write an IOmanager for PySpark.
t

Tim Castillo

03/14/2023, 7:43 PM
Hi Dias! Glad to help you out here. Which IO manager are you basing yours off of and what part doesn't work?
d

Dias Wesley

03/14/2023, 9:14 PM
Hi tim, I'm trying to do spark job  and it looks like if I want to read data from various sources dat I have to set up an IOmanager (define path file, handle input and handle output ) I'm new in dagster and I don't really understand the main concepts of those files. I thought it will be straightforward but obviously not for me. class LocalParquetIOManager(IOManager): def _get_path(self, context): return os.path.join(context.run_id, context.step_key, context.name) def handle_output(self, context, obj): obj.write.parquet(self._get_path(context)) def load_input(self, context): spark = SparkSession.builder.getOrCreate() return spark.read.parquet(self._get_path(context.upstream_output))@io_managerdef local_parquet_io_manager(): return LocalParquetIOManager() HOW TO UNDERSTAND THE BLUE LINE ABOVE????? Thanks for your help. regards, Dias