Hi how do I change the base dir of dagstermill output notebo dagster #ask-community

Hi, how do I change the base_dir of dagstermill ou...

jasono

04/04/2022, 12:39 AM

Hi, how do I change the base_dir of dagstermill output notebook? The source code indicates the base_dir can be defined in the init_context argument so I'm trying to create an init_context object reflecting the desired output base_dir. I'm stuck at creating this init_context object, especially this part. init_context.instance.storage_directory()

claire

04/04/2022, 5:01 PM

Hi jasono. The dagstermill library contains a

get_context

object that should be what you're looking for: https://docs.dagster.io/_apidocs/libraries/dagstermill#dagstermill.get_context You can specify

base_dir

through the run config

Copy code

import dagstermill as dm
from dagster import ModeDefinition, local_file_manager
context = dm.get_context(
    mode_def=ModeDefinition(resource_defs={"file_manager": local_file_manager}),
    run_config={"resources": {"file_manager": {"config": {"base_dir": ...}}}},
)

jasono

04/04/2022, 10:34 PM

Thank you for your response. Unfortunately I’m getting this error. I seem to be putting

context

in the wrong place?

jasono

04/04/2022, 10:34 PM

Copy code

Dagster.check.ParameterCheckError: Param "init_context" is not a InitResourceContext. Got <dagstermill.context.DagstermillExecutionContext object at 0x00000013E96B7D90> which is type <class 'dagstermill.context.DagstermillExecutionContext'>.

jasono

04/04/2022, 10:36 PM

Here is my code

Copy code

context1 = dm.get_context(
        mode_def = ModeDefinition(
            resource_defs = {
                "file_manager": local_file_manager}
        ),
        run_config={
            "resources": {"file_manager":
                {"config": {"base_dir":"t:/data/output/me_recon/recon_8510R" }}
            }
        }
    )
notebook_op_8510R = dm.define_dagstermill_op(
    "recon_8510R",
script_relative_path("datapipeline/me_recon_supports/recon_8510R.ipynb"),

    output_notebook_name="recon_8510R_output",


    config_schema={

        "text1": Field(

        Int,

        default_value=777,

        is_required=False,

        description="The number of clusters to find",

    )},

)

 

@job(

    resource_defs={

        "output_notebook_io_manager": dm.local_output_notebook_io_manager(context1)

    }

)

def run_8510R():

    notebook_op_8510R()

claire

04/04/2022, 11:09 PM

Hi jasono, got it. So in this case because you are providing config to a specific resource

output_notebook_io_manager

, you can just provide run config to the job directly without creating a context object:

Copy code

@job(
    resource_defs={
        "output_notebook_io_manager": dm.local_output_notebook_io_manager
    },
    config={
        "resources": {"output_notebook_io_manager":
            {"config": {"base_dir":"t:/data/output/me_recon/recon_8510R" }}
        }
    }
)

def run_8510R():
    notebook_op_8510R()

jasono

04/04/2022, 11:16 PM

It worked!! One more question if you don’t mind. I noticed the output notebook is placed in a folder within a randomly named zip file. Is there a way to make it just save a file (ideally with a custom name) without a folder or zipping?

claire

04/04/2022, 11:50 PM

cc @yuhan

2 Views

Open in Slack

Previous Next