hey, quick question. one of my solids calls a comm...
# announcements
c
hey, quick question. one of my solids calls a command line tool that writes a large json file (gigabytes) to disk (or s3) and then a downstream solid needs to read that file from disk (or s3). not sure if the tool that i'm using can handle stdin / stdout. i'm using celery + s3 for intermediates. what's the right way to pass this json file around? should i just pass the s3 url as the intermediate between solids?
a
ya you can make a little dagster type class if you dont want to just use string check out
PythonObjectDagsterType
c
ok cool, so something like create a type for the s3 url, pass that to downstream solid, then have downstream solid read from that url?
a
yep
c
cool. thanks!
s
to add on this, may I ask @alex, if the json is rather small (I hit a API with a JSON return), would I return this in my solid as type
Any
, or is there a way to directly have a type JSON. Or is that a PythonObjectDagsterType as well? I just don’t like Any so much. Thanks a lot. I gues a Type would make sense, so I can also introduce checks if the JSON is valid and so on. Correct? Any example of this already maybe? =)
a
double check https://docs.dagster.io/tutorial/types for examples - but I think in your case a direct
DagsterType
will allow you to write the
type_check_fn
that does as many checks as you want
PythonObjectDagsterType
works best when your check would just be
isinstance
, which for this json i am guessing won’t be what you are looking for
s
thanks alex. I will try with
DagsterType
as below:
Copy code
def isjson(_, value):
    try:
        json.loads(value)
        return True
    except ValueError:
        return False


JsonType = DagsterType(
    name="JsonType",
    description="A valid representation of a JSON, validated with json.loads().",
    type_check_fn=isjson,
)