Auster Cid
02/26/2020, 8:06 PMdwall
02/26/2020, 11:37 PM@usable_as_dagster_type
have required_resource_keys
?user
02/27/2020, 2:41 AMcat
02/27/2020, 2:55 AMVincent Goffin
02/27/2020, 9:36 AMSimon Späti
02/27/2020, 2:06 PMabhi
02/27/2020, 11:19 PMSlackbot
02/28/2020, 11:00 PMosayamen omigie
02/29/2020, 7:19 AMSimon Späti
03/02/2020, 6:18 AMMikael Ene
03/02/2020, 4:51 PMKris Wilson
03/03/2020, 6:32 AMEric
03/03/2020, 10:34 PMread_excel
and read_csv
) . For example, being able to define options for read_csv
in a yaml file like this is great. Having the intellisense in Dagit with config makes it a really useful tool for beginners and others on the team developing pipelines:
solids:
employees_csv:
config:
csv:
header=true,
date_format='%m/%d/%Y',
sep='|',
...
However, this is something I keep running into repeatedly. One of the arguments to read_csv
is converters
which is defined like this:
converters: dict, optional
Dict of functions for converting values in certain columns. Keys can either be integers or column labels.
How would you represent an argument like this in a yaml file ? Does it even belong in a yaml file despite it being a kwarg like the rest of the args ? If it doesn't belong in a yaml config file, isn't it strange that some arguments are able to fit neatly in the config while others aren't ?
I understand these are a bit of a loaded question but what I'm getting at is, instead of yaml files is there any reason using python files to define the environment_dict
should not be used over yaml ? By using python dicts as the config instead of yaml this would allow the creation of any config dictionary, including covering the case for the converters
argument above. Thoughts ?sephi
03/04/2020, 9:07 AMAndrew Madonna
03/04/2020, 7:24 PMAuster Cid
03/04/2020, 10:02 PMuser
03/05/2020, 1:40 AMyuhan
03/05/2020, 1:49 AMPedram
03/05/2020, 5:07 AMSimon Späti
03/05/2020, 8:05 AMJoshua Marango
03/05/2020, 4:33 PMEric
03/05/2020, 6:40 PMdef test_my_dummy_pipeline():
res = execute_pipeline(
my_pipeline,
environment_dict=<yaml file ?>
)
Something similar to this except for config:
preset_defs=[
PresetDefinition.from_files(
'development',
mode='development',
environment_files=[
file_relative_path(__file__, 'environments/dev_database_resources.yaml'),
file_relative_path(__file__, 'environments/dev_file_system_resources.yaml'),
file_relative_path(__file__, 'environments/trips.yaml'),
],
),
],
Kate Ho
03/05/2020, 7:27 PMBasil V
03/05/2020, 7:57 PMdagit
gets executed. Can anyone point me to further documentation or examples for dagster.yaml etc (beyond what is in https://docs.dagster.io/latest/deploying/instance/)? Maybe what the "default" dagster.yaml would look like and specifically options for dagit_settings
. Thanks!Simon Späti
03/06/2020, 10:47 AMBasil V
03/06/2020, 10:13 PMdagit
how do you specify the mode to run for a pipeline / where can I configure this? I'm getting this error running dagit
:
dagster.check.CheckError: Failure condition: Could not find mode default in pipeline <PIPELINE_NAME>
because I am trying to run my pipeline in a mode called "development" rather than "default" — when I run via execute_pipeline
I'm able to pass in the mode via RunConfig
but I can't find the equivalent way to configure in a yaml file for instance (sorry if this is in the docs and I just haven't been able to locate it). Thanks for any help! (I guess I should caveat and ask, is the only way to specify mode via the dagit UI)Darin
03/07/2020, 12:30 AMPedram
03/07/2020, 12:46 AMTravis Cline
03/07/2020, 11:24 PMPedram
03/08/2020, 5:12 AMfile_handle_to_s3
solid, which expects a FileHandle. My previous step is a postgres job that writes a CSV file to disk. How do I give this file handle to the next task? I tried returning the file name, and returning open(fn)
but neither seem right.