https://dagster.io/ logo
Title
s

Scott Peters

06/16/2021, 10:20 PM
ideally,
dagster
will be running as a background service, where end users are simply passing arguments to some kind of
cli
. Are there methods or practices within
dagster
that lower the front-end complexity such that we don't have to map every argument so explicitly? This is especially necessary for our tooling since the methods will always required user input for every run and will not likely be tied to any scheduler
o

owen

06/16/2021, 10:29 PM
hi @Scott Peters! You probably want the Permissive config type in this case (docs here: https://docs.dagster.io/_apidocs/config#dagster.Permissive). This would allow you to create a single resource def with
@resource(config_schema=Permissive({'thing':str, 'other_thing':str}))
etc.
s

Scott Peters

06/16/2021, 10:30 PM
ah thanks! ... yeah I ended up using a dict for the moment and got it working.
I do have one small question if you're still there
?
o

owen

06/16/2021, 10:30 PM
dict and Permissive({}) are identical 🙂, Permissive just lets you require some fields (and accept any additional ones)
still here!
s

Scott Peters

06/16/2021, 10:30 PM
oh great!
so, when running a pipeline, say:
@pipeline
def pipe():
    this_value = do_something()
    print(this_value)

    do_another_thing(do_something())
when I call `print(this_value)`` I get the instance statement for the return type, but when I nest the calls
do_another_thing(do_something())
it seems to actually unpack that value, in this case a string... is there a reason that values returned by solids are not accessible from the pipeline scope?
path <dagster.core.definitions.composition.InvokedSolidOutputHandle object at 0x7faf265c9ee0>
vs
/my/resolved/path
I guess I see that leading to something like this if you have calls that depend on one another:
do_a_thing(do_another_thing(do_other_things())))
I guess it's possible to say that I am confused in general about getting return values out of a solid. even in the above case, I get the:
AttributeError: 'InvokedSolidOutputHandle' object has no attribute 'split'
so clearly, the receiving method is not unpacking the return value from the
InvokedSolidOutputHandle
o

owen

06/16/2021, 10:45 PM
Ah sorry, hopped into a meeting. When you define a pipeline, you're not actually executing any code, you're just telling dagster what the shape of your pipeline is (the @solid wrapper turns the function you wrote into an object that is manipulated by dagster internals). Generally solids will only be run in the context of a pipeline (or unit tests)
so solid outputs should only really be consumed by other solids
s

Scott Peters

06/16/2021, 10:46 PM
ah ok ... so, say I have 2 solids called:
get_path() -> str
and
exists() -> bool
and a pipeline that does this:
pipe():
    path = get_path()
    exists(path)
when the
exists
method get's ahold of the
path
value, it attempts to treat it like a string, but it is, of course still an instance of
SolidOutput
how can I unpack that
str
from the first solid?
which seems to point to being able to access output from various
solids
and use them as input for others without having to nest them
solid_1(solid_2())
that could get messy quick
o

owen

06/16/2021, 10:59 PM
hm maybe I'm not understanding properly, but dagster will handle that "unpacking" stuff for you (you really shouldn't even have to think of that part of the system). so something like
@pipeline:
def my_pipe():
    path = get_path()
    exists(path)
    some_other_solid(path)
will work just fine
you're only really going to run into trouble if you try to put non-solid stuff inside the @pipeline function (like a print statement), because the pipeline is just a fancy syntax for defining a PipelineDefinition
s

Scott Peters

06/16/2021, 11:02 PM
hmmm ... I assume that this sort of solid definition should be unpacked as a string?
@solid(
    config_schema={'create_data' : dict},
    required_resource_keys={'create_data'}
    )
def get_path(context) -> str:

    endpoint   = context.resources.create_data.get('endpoint')
    collection = context.resources.create_data.get('collection')
    asset_type = context.resources.create_data.get('asset_type')
    asset_name = context.resources.create_data.get('asset_name')

    path = f'{endpoint}/{collection}/{asset_type}/{asset_name}'
    print('path', path)
    return path
but when I run the pipeline it complains:
tokens = path.split(self.path_delimiter)
AttributeError: 'InvokedSolidOutputHandle' object has no attribute 'split'
this is my pipeline:
def create():
    path = get_path()
    exists(path)
if I call for the `dict`` from the path variable, I get:
{'solid_name': 'get_path', 'output_name': 'result'}
o

owen

06/16/2021, 11:06 PM
where is the tokens = path.split(...) call happening? and how are you invoking the pipeline?
s

Scott Peters

06/16/2021, 11:08 PM
that is happening in a library that is being passed in... it is responsible for determining if a path exists via
rclone
, so it does some string parsing
and... calling the pipeline like this:
if __name__ == '__main__':

    args = {
        'endpoint':'/home/speters',
        'collection': 'stage_manager',
        'asset_type': 'model',
        'asset_name': 'billy'
    }

    output = execute_pipeline(
        create,
        run_config={
            'resources': {
                'create_data':{
                        'config': args
                    }
            },
            'solids':{
                'get_path':{
                        'config':{
                            'create_data': args
                        }
                    }
            }
        },
    instance=DagsterInstance.get()
    )
also tried this one:
if __name__ == '__main__':

    args = {
        'endpoint':'/home/speters',
        'collection': 'stage_manager',
        'asset_type': 'model',
        'asset_name': 'billy'
    }

    output = execute_pipeline(
        create,
        run_config={
            'resources': {
                'create_data':{
                        'config': args
                    }
            },
            'solids':{
                'get_path':{
                        'config':{
                            'create_data': args
                        }
                    },
                'exists': {'config': {}}
            }
        },
    instance=DagsterInstance.get()
    )
where I included
exists
in the
solids
dict
o

owen

06/16/2021, 11:12 PM
can I see the definition of the exists() solid real quick?
this is pretty weird, my first thought is that you might have missed the @solid decorator on that function
s

Scott Peters

06/16/2021, 11:13 PM
🤦‍♂️
def exists(path: str) -> bool:
    rco = RcloneObject(path)
    return rco.exists
yup
omg
ok .. it ran
o

owen

06/16/2021, 11:14 PM
🙌
s

Scott Peters

06/16/2021, 11:15 PM
so, is there an easy way to unpack results from solids to step through and troubleshoot?
I would imagine people want to be able to see intermediate results by
printing
values as they come back?
path = get_path()
print(path.result)
or something
thanks so much for catching that
@solid
decorator
o

owen

06/16/2021, 11:17 PM
I would say that the most convenient development loop with dagster involves dagit (https://docs.dagster.io/concepts/dagit/dagit#dagit-ui).
s

Scott Peters

06/16/2021, 11:19 PM
ah, hmmm .. ok.
o

owen

06/16/2021, 11:19 PM
but yeah you can't really print within the @pipeline definition, you'd probably just want to do that within the solid itself
s

Scott Peters

06/16/2021, 11:19 PM
thank you
o

owen

06/16/2021, 11:20 PM
if you're using dagit, you get all sorts of nice options for structured outputs that will show up in a pretty way in the UI
but of course print() is always an option 🙂
s

Scott Peters

06/16/2021, 11:20 PM
yeah, ideally, everything would stay in python except for keeping track of the runs, logs etc
right right
ok
I'll get to know it
thanks again for your help... baby steps
o

owen

06/16/2021, 11:22 PM
yep no problem! I will say that using dagit doesn't really change what code you write, just makes it easier to run it and iterate on it while visualizing what it's doing
but nothing wrong with just starting out simple
glad it's working now! 😄