sean
12/14/2020, 10:47 PMdagstermill.yield_result(my_dataarray, 'name')
at the end of my notebook solid, I get an error-- it appears dagstermill is attempting to serialize the arrays using scrapbook
, and this fails.
How do I bypass scrapbook and use my own serialization logic? What I want to do is just keep an in-memory xarray dataset that stores my arrays. I attempted to do this by creating a custom asset_store
, but this makes no difference-- dagstermill is still trying to use scrapbook when I call dagstermill.yield_result
. Here is the code I used for the custom asset store, am I perhaps missing something important?
python
import dagster as dg
class DatasetAssetStore(dg.AssetStore):
def __init__(self):
super(DatasetAssetStore, self)
self.dataset = xr.Dataset()
def get_asset(self, context):
name = context.output_name
return self.dataset[name]
def set_asset(self, context, obj):
name = context.output_name
self.dataset[name] = obj
@dg.resource
def dataset_asset_store(_):
return DatasetAssetStore()
dataset_mode = dg.ModeDefinition(
resource_defs={"asset_store": dataset_asset_store}
)
my_pipeline = dg.PipelineDefinition(
name='test',
solid_defs=[...],
dependencies={...},
mode_defs=[dataset_mode]
)
max
12/14/2020, 10:50 PMsean
12/14/2020, 10:51 PMmax
12/14/2020, 10:51 PMyield_result
serializes an output back from the notebook to the dagster machinerynetCDF
not the right thing to do (too slow, etc)?sean
12/14/2020, 10:54 PMmax
12/14/2020, 10:55 PMsean
12/14/2020, 10:56 PMmax
12/14/2020, 10:56 PMsean
12/14/2020, 10:56 PMmax
12/14/2020, 10:57 PMsean
12/14/2020, 10:57 PMyield_result
call set_asset
on my provided asset_store
resourcemax
12/14/2020, 10:57 PMsean
12/14/2020, 10:57 PMscrapbook
max
12/14/2020, 10:58 PMfile_manager
resource to get a file handle to write out to, and then pass that handlesean
12/14/2020, 10:58 PMmax
12/14/2020, 10:58 PMsean
12/14/2020, 10:59 PMpapermill.exceptions.PapermillExecutionError:
---------------------------------------------------------------------------
Exception encountered at "In [4]":
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-4-6e97b5f4f935> in <module>
1 for k in ['neuron', 'orientation_bin', 'position_bin', 'position_tuning', 'orientation_tuning']:
----> 2 dm.yield_result(DS[k], k)
~/stm/code/apps/buzhc/analysis/.conda-env/lib/python3.8/site-packages/dagstermill/manager.py in yield_result(self, value, output_name)
270
271 out_file = os.path.join(self.marshal_dir, "output-{}".format(output_name))
--> 272 scrapbook.glue(output_name, write_value(dagster_type, value, out_file))
273
274 def yield_event(self, dagster_event):
~/stm/code/apps/buzhc/analysis/.conda-env/lib/python3.8/site-packages/scrapbook/utils.py in wrapper(*args, **kwds)
63 if not is_kernel():
64 warnings.warn("No kernel detected for '{fname}'.".format(fname=f.__name__))
---> 65 return f(*args, **kwds)
66
67 return wrapper
~/stm/code/apps/buzhc/analysis/.conda-env/lib/python3.8/site-packages/scrapbook/api.py in glue(name, data, encoder, display)
86 name, scrap_to_payload(encoder_registry.encode(Scrap(name, data, encoder))), encoder
87 )
---> 88 ip_display(ipy_data, metadata=metadata, raw=True)
89
90 # Only display data that is marked for display
~/stm/code/apps/buzhc/analysis/.conda-env/lib/python3.8/site-packages/IPython/core/display.py in display(include, exclude, metadata, transient, display_id, *objs, **kwargs)
309 for obj in objs:
310 if raw:
--> 311 publish_display_data(data=obj, metadata=metadata, **kwargs)
312 else:
313 format_dict, md_dict = format(obj, include=include, exclude=exclude)
~/stm/code/apps/buzhc/analysis/.conda-env/lib/python3.8/site-packages/IPython/core/display.py in publish_display_data(data, metadata, source, transient, **kwargs)
117 kwargs['transient'] = transient
118
--> 119 display_pub.publish(
120 data=data,
121 metadata=metadata,
~/.local/lib/python3.8/site-packages/ipykernel/zmqshell.py in publish(self, data, metadata, source, transient, update)
127 # hooks before potentially sending.
128 msg = self.session.msg(
--> 129 msg_type, json_clean(content),
130 parent=self.parent_header
131 )
~/.local/lib/python3.8/site-packages/ipykernel/jsonutil.py in json_clean(obj)
189 out = {}
190 for k,v in iteritems(obj):
--> 191 out[unicode_type(k)] = json_clean(v)
192 return out
193 if isinstance(obj, datetime):
~/.local/lib/python3.8/site-packages/ipykernel/jsonutil.py in json_clean(obj)
189 out = {}
190 for k,v in iteritems(obj):
--> 191 out[unicode_type(k)] = json_clean(v)
192 return out
193 if isinstance(obj, datetime):
~/.local/lib/python3.8/site-packages/ipykernel/jsonutil.py in json_clean(obj)
189 out = {}
190 for k,v in iteritems(obj):
--> 191 out[unicode_type(k)] = json_clean(v)
192 return out
193 if isinstance(obj, datetime):
~/.local/lib/python3.8/site-packages/ipykernel/jsonutil.py in json_clean(obj)
189 out = {}
190 for k,v in iteritems(obj):
--> 191 out[unicode_type(k)] = json_clean(v)
192 return out
193 if isinstance(obj, datetime):
~/.local/lib/python3.8/site-packages/ipykernel/jsonutil.py in json_clean(obj)
195
196 # we don't understand it, it's probably an unserializable object
--> 197 raise ValueError("Can't clean for JSON: %r" % obj)
ValueError: Can't clean for JSON: <xarray.DataArray 'neuron' (neuron: 1000)>
array([ 1, 2, 3, ..., 998, 999, 1000])
Coordinates:
* neuron (neuron) int64 1 2 3 4 5 6 7 8 ... 993 994 995 996 <tel:9979989991000|997 998 999 1000>
File "/Users/smackesey/stm/code/apps/buzhc/analysis/.conda-env/lib/python3.8/site-packages/dagster/core/errors.py", line 180, in user_code_error_boundary
yield
File "/Users/smackesey/stm/code/apps/buzhc/analysis/.conda-env/lib/python3.8/site-packages/dagstermill/solids.py", line 217, in _t_fn
raise exc
File "/Users/smackesey/stm/code/apps/buzhc/analysis/.conda-env/lib/python3.8/site-packages/dagstermill/solids.py", line 186, in _t_fn
papermill.execute_notebook(
File "/Users/smackesey/stm/code/apps/buzhc/analysis/.conda-env/lib/python3.8/site-packages/papermill/execute.py", line 108, in execute_notebook
raise_for_execution_errors(nb, output_path)
File "/Users/smackesey/stm/code/apps/buzhc/analysis/.conda-env/lib/python3.8/site-packages/papermill/execute.py", line 192, in raise_for_execution_errors
raise error
max
12/14/2020, 11:00 PMsean
12/14/2020, 11:01 PMdm.yield_result(array, name)
max
12/14/2020, 11:03 PMsean
12/14/2020, 11:08 PMAsset Stores
and SystemStorageDefinition
where it is not really clear how they interact.max
12/14/2020, 11:09 PMsean
12/14/2020, 11:12 PMscrapbook
max
12/14/2020, 11:13 PMimport dagstermill as dm
from dagster import ModeDefinition, local_file_manager
context = dm.get_context(
mode_def=ModeDefinition(resource_defs={"file_manager": local_file_manager}),
run_config={"resources": {"file_manager": {"config": {"base_dir": ...}}}},
)
file_handle = context.resources.file_manager.write_data(dataset.to_netcdf())
dm.yield_result('out_path', file_handle.path)
sean
12/14/2020, 11:14 PMmax
12/14/2020, 11:14 PMsean
12/14/2020, 11:14 PMmax
12/14/2020, 11:15 PMsean
12/14/2020, 11:16 PMmax
12/14/2020, 11:16 PMsean
12/14/2020, 11:22 PM