https://dagster.io/ logo
#announcements
Title
# announcements
w

Will Brown

04/02/2020, 4:28 PM
I'm trying to add dagstermill to my pipeline and having a few problems. The first 2 I was able to workaround, but the last is going to a bug fix in dagstermill.
The 
Can't execute a dagstermill solid from a pipeline that wasn't instantiated using an ExecutionTargetHandle
 error. I'm doing development of pipeline one task at a time and re-executing using execute_pipeline so i can iterate in debugger quickly. I was able to work around this by building from execute handle. This could be documented somewhere.
Copy code
handle = ExecutionTargetHandle.for_pipeline_python_file('rightsizedag.py', 'rightsize_pipeline')
pipeline = handle.build_pipeline_definition()
result = execute_pipeline(pipeline, instance=instance, environment_dict=env, run_config=rc)
The second issue was an exception thrown from papermill here in papermill/execute.py
Copy code
# Fetch the kernel name if it's not supplied
kernel_name = kernel_name or nb.metadata.kernelspec.name
I was able to work around this by injecting a kernel name in the ipynb. I created the notebook with dagstermill cli and it didnt have this. It would be nice if kernel name could be specified as config to the solid. edit: the exception was a key error that 'kernelspec' didnt exist.
a

alex

04/02/2020, 4:33 PM
ya 1) should definitely be documented better - we are also exploring ways to improve https://dagster.phacility.com/D2357
w

Will Brown

04/02/2020, 4:35 PM
The third issue I'm stuck on is my config contains ' characters. The parameters injected in to the notebook...
context = __dm_dagstermill._reconstitute_pipeline_context(**__dm_seven.json.loads('...
This string doesn't have ' escaped so the whole cell fails with syntax error. I tried manually editing the cell to escape all the ' and the notebook can run the cell and loads all my inputs.
a

alex

04/02/2020, 4:38 PM
cc @max
thanks for the reports - we’ll get this fixed
👍 1
thanks for the kernelspec report too
@Will Brown I'm having trouble reproducing your string escaping issue
did this occur when running a pipeline using the python API, or using config from YAML and dagit
w

Will Brown

04/02/2020, 9:41 PM
Copy code
solids:
  query_vm_resources:
    config:
      custom_projections: "vm_size = tostring(properties['hardwareProfile']['vmSize'])"
I have this in my config ^. Those
'
get serialized right in to the cell as is. inside the
__dm_seven.json.loads('...')
so the
'
terminates the string and breaks the syntax of the cell.
m

max

04/02/2020, 9:43 PM
thanks much
🙏 1
w

Will Brown

04/03/2020, 12:14 AM
I can't seem to find a clean way to access the notebook in a subsequent solid. I want to follow up the notebook solid with a solid that uses nbconvert to create a report. The path is yielded in a materialization only. It would be great if there was an option to have the solid yield and output with the output notebook path (or content).
m

max

04/03/2020, 12:52 AM
yep
i'll take a look at that tonight
w

Will Brown

04/03/2020, 11:59 AM
0.7.6
* Fixed an issue when executing dagstermill solids with config that contained quote characters.
* The Jupyter kernel to use may now be specified when creating dagster notebooks with the --kernel flag.
🙂 so fast!
😄 2
m

max

04/03/2020, 7:42 PM
I think we have something to offer for the solid notebook output as well 🙂
https://dagster.phacility.com/D2431 will merge to master today -- basically this lets you specify an additional argument to
define_dagstermill_solid
that configures it to yield the notebook as an output
w

Will Brown

04/03/2020, 7:53 PM
Oh dang. i already implemented the functionality myself. I went a simpler route, deferring to the automatic materialization of outputs. I have the same flag and add the extra output definition exactly like you did. However, i output the actual NotebookNode (it's returned by papermill execute, you dont have to actually read the notebook off disk). I added NotebookNodeDagsterType. In my downstream solid i can just take the incoming NotebookNode object and hand it to nbconvert.
I can share the patch if you're interested.
If nothing else.... there is a little optimization you can make
instead of using scrapbook.read_notebook you can just store the result of your papermill execute call (it returns a NotebookNode). and create a scrapbook.models.Notebook with that result as the arg. It accepts a str path or a NotebookNode already in mem.
m

max

04/03/2020, 8:17 PM
yes would love to see the patch!
w

Will Brown

04/03/2020, 8:28 PM
This is against 0.7.6
heh, i forgot i used the wrapped notebooknode exposed by the scrapbook notebook object, but its identical to what execute returns. the read from disk by scrapbook could be eliminated.
2 Views