how is the pickling/unpickling of internal dagster...
# announcements
b
how is the pickling/unpickling of internal dagster objects supposed to be handled? i'm following the code in the dagster-aws
emr
module to see how it serializes and stores
DagsterEvent
s from a pipeline executed remotely (trying to create a similar Databricks integration), but it looks like events aren't rehydrated properly?
Copy code
@>>> event
DagsterEvent(event_type_value='STEP_START', pipeline_name='pyspark_pipe', step_key='make_df_solid.compute', solid_handle=SolidHandle(name='make_df_solid', parent=None), step_kind_value='COMPUTE', logging_tags={'pipeline': 'pyspark_pipe', 'step_key': 'make_df_solid.compute', 'solid': 'make_df_solid', 'solid_definition': 'make_df_solid'}, event_specific_data=None, message='Started execution of step "make_df_solid.compute".')
@>>> pickle.loads(pickle.dumps(event))
_DagsterEvent(event_type_value='STEP_START', pipeline_name='pyspark_pipe', step_key='make_df_solid.compute', solid_handle=_SolidHandle(name='make_df_solid', parent=None), step_kind_value='COMPUTE', logging_tags={'pipeline': 'pyspark_pipe', 'step_key': 'make_df_solid.compute', 'solid': 'make_df_solid', 'solid_definition': 'make_df_solid'}, event_specific_data=None, message='Started execution of step "make_df_solid.compute".')
s
Weird -
_DagsterEvent
is the superclass of
DagsterEvent
.What version of python are you using? I just tried to reproduce this on python 3.6.8 and seemed to get the right thing back from pickle.
Copy code
import pickle
from dagster import DagsterEvent
from dagster.core.events import EngineEventData

event = DagsterEvent(
    event_type_value='ENGINE_EVENT',
    pipeline_name='b',
    event_specific_data=EngineEventData(),
)

print(type(event))
print(pickle.loads(pickle.dumps(event)))
which produced:
Copy code
<class 'dagster.core.events.DagsterEvent'>
DagsterEvent(event_type_value='ENGINE_EVENT', pipeline_name='b', step_key=None, solid_handle=None, step_kind_value=None, logging_tags={}, event_specific_data=EngineEventData(metadata_entries=[], error=None, marker_start=None, marker_end=None), message=None)
b
figured it out on this issue (https://github.com/dagster-io/dagster/issues/2458), it looks like pyspark hijacks the serialization of namedtuples