ideally, `dagster` will be running as a background...
# ask-community
s
ideally,
dagster
will be running as a background service, where end users are simply passing arguments to some kind of
cli
. Are there methods or practices within
dagster
that lower the front-end complexity such that we don't have to map every argument so explicitly? This is especially necessary for our tooling since the methods will always required user input for every run and will not likely be tied to any scheduler
o
hi @Scott Peters! You probably want the Permissive config type in this case (docs here: https://docs.dagster.io/_apidocs/config#dagster.Permissive). This would allow you to create a single resource def with
@resource(config_schema=Permissive({'thing':str, 'other_thing':str}))
etc.
s
ah thanks! ... yeah I ended up using a dict for the moment and got it working.
I do have one small question if you're still there
?
o
dict and Permissive({}) are identical 🙂, Permissive just lets you require some fields (and accept any additional ones)
still here!
s
oh great!
so, when running a pipeline, say:
Copy code
@pipeline
def pipe():
    this_value = do_something()
    print(this_value)

    do_another_thing(do_something())
when I call `print(this_value)`` I get the instance statement for the return type, but when I nest the calls
do_another_thing(do_something())
it seems to actually unpack that value, in this case a string... is there a reason that values returned by solids are not accessible from the pipeline scope?
path <dagster.core.definitions.composition.InvokedSolidOutputHandle object at 0x7faf265c9ee0>
vs
/my/resolved/path
I guess I see that leading to something like this if you have calls that depend on one another:
do_a_thing(do_another_thing(do_other_things())))
I guess it's possible to say that I am confused in general about getting return values out of a solid. even in the above case, I get the:
AttributeError: 'InvokedSolidOutputHandle' object has no attribute 'split'
so clearly, the receiving method is not unpacking the return value from the
InvokedSolidOutputHandle
o
Ah sorry, hopped into a meeting. When you define a pipeline, you're not actually executing any code, you're just telling dagster what the shape of your pipeline is (the @solid wrapper turns the function you wrote into an object that is manipulated by dagster internals). Generally solids will only be run in the context of a pipeline (or unit tests)
so solid outputs should only really be consumed by other solids
s
ah ok ... so, say I have 2 solids called:
get_path() -> str
and
exists() -> bool
and a pipeline that does this:
Copy code
pipe():
    path = get_path()
    exists(path)
when the
exists
method get's ahold of the
path
value, it attempts to treat it like a string, but it is, of course still an instance of
SolidOutput
how can I unpack that
str
from the first solid?
which seems to point to being able to access output from various
solids
and use them as input for others without having to nest them
solid_1(solid_2())
that could get messy quick
o
hm maybe I'm not understanding properly, but dagster will handle that "unpacking" stuff for you (you really shouldn't even have to think of that part of the system). so something like
Copy code
@pipeline:
def my_pipe():
    path = get_path()
    exists(path)
    some_other_solid(path)
will work just fine
you're only really going to run into trouble if you try to put non-solid stuff inside the @pipeline function (like a print statement), because the pipeline is just a fancy syntax for defining a PipelineDefinition
s
hmmm ... I assume that this sort of solid definition should be unpacked as a string?
Copy code
@solid(
    config_schema={'create_data' : dict},
    required_resource_keys={'create_data'}
    )
def get_path(context) -> str:

    endpoint   = context.resources.create_data.get('endpoint')
    collection = context.resources.create_data.get('collection')
    asset_type = context.resources.create_data.get('asset_type')
    asset_name = context.resources.create_data.get('asset_name')

    path = f'{endpoint}/{collection}/{asset_type}/{asset_name}'
    print('path', path)
    return path
but when I run the pipeline it complains:
Copy code
tokens = path.split(self.path_delimiter)
AttributeError: 'InvokedSolidOutputHandle' object has no attribute 'split'
this is my pipeline:
Copy code
def create():
    path = get_path()
    exists(path)
if I call for the `dict`` from the path variable, I get:
Copy code
{'solid_name': 'get_path', 'output_name': 'result'}
o
where is the tokens = path.split(...) call happening? and how are you invoking the pipeline?
s
that is happening in a library that is being passed in... it is responsible for determining if a path exists via
rclone
, so it does some string parsing
and... calling the pipeline like this:
Copy code
if __name__ == '__main__':

    args = {
        'endpoint':'/home/speters',
        'collection': 'stage_manager',
        'asset_type': 'model',
        'asset_name': 'billy'
    }

    output = execute_pipeline(
        create,
        run_config={
            'resources': {
                'create_data':{
                        'config': args
                    }
            },
            'solids':{
                'get_path':{
                        'config':{
                            'create_data': args
                        }
                    }
            }
        },
    instance=DagsterInstance.get()
    )
also tried this one:
Copy code
if __name__ == '__main__':

    args = {
        'endpoint':'/home/speters',
        'collection': 'stage_manager',
        'asset_type': 'model',
        'asset_name': 'billy'
    }

    output = execute_pipeline(
        create,
        run_config={
            'resources': {
                'create_data':{
                        'config': args
                    }
            },
            'solids':{
                'get_path':{
                        'config':{
                            'create_data': args
                        }
                    },
                'exists': {'config': {}}
            }
        },
    instance=DagsterInstance.get()
    )
where I included
exists
in the
solids
dict
o
can I see the definition of the exists() solid real quick?
this is pretty weird, my first thought is that you might have missed the @solid decorator on that function
s
🤦‍♂️
Copy code
def exists(path: str) -> bool:
    rco = RcloneObject(path)
    return rco.exists
yup
omg
ok .. it ran
o
🙌
s
so, is there an easy way to unpack results from solids to step through and troubleshoot?
I would imagine people want to be able to see intermediate results by
printing
values as they come back?
path = get_path()
print(path.result)
or something
thanks so much for catching that
@solid
decorator
o
I would say that the most convenient development loop with dagster involves dagit (https://docs.dagster.io/concepts/dagit/dagit#dagit-ui).
s
ah, hmmm .. ok.
o
but yeah you can't really print within the @pipeline definition, you'd probably just want to do that within the solid itself
s
thank you
o
if you're using dagit, you get all sorts of nice options for structured outputs that will show up in a pretty way in the UI
but of course print() is always an option 🙂
s
yeah, ideally, everything would stay in python except for keeping track of the runs, logs etc
right right
ok
I'll get to know it
thanks again for your help... baby steps
o
yep no problem! I will say that using dagit doesn't really change what code you write, just makes it easier to run it and iterate on it while visualizing what it's doing
but nothing wrong with just starting out simple
glad it's working now! 😄