I'm trying to setup the airline demo: downloaded t...
# announcements
k
I'm trying to setup the airline demo: downloaded the repo for dagster=0.10.6, created a venv, switched to the airline_demo directory, ran pip install .[full], pip install dagit==0.10.6 and then dagit. Expected for the repo to be loaded properly, but got the following error:
dagster.core.errors.DagsterUserCodeProcessError: dagster.core.errors.DagsterInvariantViolationError: Encountered error attempting to parse yaml. Loading YAMLs from package resources [('airline_demo.environments', 'local_base.yaml'), ('airline_demo.environments', 'local_fast_ingest.yaml')] on preset "local_fast".
What do I need to do to get it running?
a
I think you have found a bug - we dont appear to specify
package_data
in the
setup.py
for that example
huh - though I am not able to repro
I did
dagit -m airline_demo.repository
after
pip install -e examples/airline_demo
and the presets are loading do you have the full error output?
k
if I use your commands it works wfor me to... I used the exact commands I posted above which seems to be wrong
would be great if it was documented how to launch the examples
now I get the following error on step april_on_time_s3_to_df:
botocore.exceptions.NoCredentialsError: Unable to locate credentials
also I get config errors if I select the preset local_full or prod_fast
seems I need to add S3 credentials
I've created an AWS account and added the credentials as described. Now I get the following Engine Error in local_fast mode:
Copy code
OSError: [Errno 5] Input/output error

  File "/home/klst/PycharmProjects/dagster/venv/lib/python3.8/site-packages/dagster/grpc/impl.py", line 76, in core_execute_run
    yield from execute_run_iterator(recon_pipeline, pipeline_run, instance)
  File "/home/klst/PycharmProjects/dagster/venv/lib/python3.8/site-packages/dagster/core/execution/api.py", line 798, in __iter__
    yield from self.iterator(
  File "/home/klst/PycharmProjects/dagster/venv/lib/python3.8/site-packages/dagster/core/execution/api.py", line 722, in pipeline_execution_iterator
    for event in pipeline_context.executor.execute(pipeline_context, execution_plan):
  File "/home/klst/PycharmProjects/dagster/venv/lib/python3.8/site-packages/dagster/core/executor/in_process.py", line 36, in execute
    yield from inner_plan_execution_iterator(pipeline_context, execution_plan)
  File "/home/klst/PycharmProjects/dagster/venv/lib/python3.8/site-packages/dagster/core/execution/plan/execute_plan.py", line 55, in inner_plan_execution_iterator
    with pipeline_context.instance.compute_log_manager.watch(
  File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/home/klst/PycharmProjects/dagster/venv/lib/python3.8/site-packages/dagster/core/storage/compute_log_manager.py", line 55, in watch
    with self._watch_logs(pipeline_run, step_key):
  File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/home/klst/PycharmProjects/dagster/venv/lib/python3.8/site-packages/dagster/core/storage/local_compute_log_manager.py", line 47, in _watch_logs
    with mirror_stream_to_file(sys.stderr, errpath):
  File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/home/klst/PycharmProjects/dagster/venv/lib/python3.8/site-packages/dagster/core/execution/compute_logs.py", line 30, in mirror_stream_to_file
    with redirect_to_file(stream, filepath):
  File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/home/klst/PycharmProjects/dagster/venv/lib/python3.8/site-packages/dagster/core/execution/compute_logs.py", line 22, in redirect_to_file
    with redirect_stream(file_stream, stream):
  File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/home/klst/PycharmProjects/dagster/venv/lib/python3.8/site-packages/dagster/core/execution/compute_logs.py", line 57, in redirect_stream
    from_stream.flush()
I changed into the example directory and ran dagit from there, now it is working
the warehouse_pipline is also successfully running, however when I download the pdf file from the s3 bucket the graphs seem to be broken
e.g.:
is this as intended for local fast mode?
Also I wonder if the presets for local_full and prod_fast are outdated? The config editor shows errors.
As suspected the local_full preset was outdated. In case anyone is interested, the fixed local_full_ingest.yaml file looks like this:
Copy code
solids:
  process_q2_coupon_data:
    inputs:
      s3_coordinate:
        bucket: dagster-airline-demo-source-data
        key: Origin_and_Destination_Survey_DB1BCoupon_2018_2.zip
      archive_member:
        value: Origin_and_Destination_Survey_DB1BCoupon_2018_2.csv
    config:
      subsample_pct: 100
      table_name: q2_coupon_data
  process_q2_market_data:
    inputs:
      s3_coordinate:
        bucket: dagster-airline-demo-source-data
        key: Origin_and_Destination_Survey_DB1BMarket_2018_2.zip
      archive_member:
        value: Origin_and_Destination_Survey_DB1BMarket_2018_2.csv
    config:
      subsample_pct: 100
      table_name: q2_market_data
  process_q2_ticket_data:
    inputs:
      s3_coordinate:
        bucket: dagster-airline-demo-source-data
        key: Origin_and_Destination_Survey_DB1BTicket_2018_2.zip
      archive_member:
        value: Origin_and_Destination_Survey_DB1BTicket_2018_2.csv
    config:
      subsample_pct: 100
      table_name: q2_ticket_data
  april_on_time_s3_to_df:
    inputs:
      s3_coordinate:
        bucket: dagster-airline-demo-source-data
        key: On_Time_Reporting_Carrier_On_Time_Performance_1987_present_2018_4.zip
      archive_member:
        value: On_Time_Reporting_Carrier_On_Time_Performance_(1987_present)_2018_4.csv
  may_on_time_s3_to_df:
    inputs:
      s3_coordinate:
        bucket: dagster-airline-demo-source-data
        key: On_Time_Reporting_Carrier_On_Time_Performance_1987_present_2018_5.zip
      archive_member:
        value: On_Time_Reporting_Carrier_On_Time_Performance_(1987_present)_2018_5.csv
  june_on_time_s3_to_df:
    inputs:
      s3_coordinate:
        bucket: dagster-airline-demo-source-data
        key: On_Time_Reporting_Carrier_On_Time_Performance_1987_present_2018_6.zip
      archive_member:
        value: On_Time_Reporting_Carrier_On_Time_Performance_(1987_present)_2018_6.csv
  master_cord_s3_to_df:
    inputs:
      archive_member:
        value: "954834304_T_MASTER_CORD.csv"
      s3_coordinate:
        bucket: dagster-airline-demo-source-data
        key: 954834304_T_MASTER_CORD.zip
  join_q2_data:
    config:
      subsample_pct: 100
  load_q2_on_time_data:
    config:
      table_name: q2_on_time_data

  download_q2_sfo_weather:
    inputs:
      s3_coordinate:
        bucket: dagster-airline-demo-source-data
        key: sfo_q2_weather.txt
  load_q2_sfo_weather:
    config:
      table_name: q2_sfo_weather
a
Thanks for following up with details, I filed
#3724
for getting this back in to a good state