https://dagster.io/ logo
#announcements
Title
# announcements
c

Chris Roth

04/21/2020, 3:18 PM
p

prha

04/21/2020, 3:19 PM
What gets output when you
crontab -l
?
c

Chris Roth

04/21/2020, 3:22 PM
Copy code
0 3 * * * /opt/dagster/dagster_home/schedules/scripts/pipeline_repository.staging_daily_esri.sh # dagster-schedule: pipeline_repository.staging_daily_esri
p

prha

04/21/2020, 3:23 PM
hmm. and when you click through to
staging_daily_esri
link, you see no tick attempts?
c

Chris Roth

04/21/2020, 3:24 PM
nope
p

prha

04/21/2020, 3:24 PM
what’s the underlying OS?
c

Chris Roth

04/21/2020, 3:24 PM
ubuntu-small-latest
p

prha

04/21/2020, 3:25 PM
@sashank ^^
c

Chris Roth

04/21/2020, 3:25 PM
strangely there is no
/var/log/syslog
makes me think cron is not set up correctly on the os
p

prha

04/21/2020, 3:25 PM
very curious
does that
staging_daily_esri.sh
file exist?
i guess i’d also try editing the crontab manually to check cron… something like
* * * * * date -u >> /opt/dagster/dagster_home/test_cron_log
c

Chris Roth

04/21/2020, 3:40 PM
yup it exists
hm
s

sashank

04/21/2020, 3:52 PM
Hey Chris, investigating now
Just setup the same environment as you to try and reproduce
m

max

04/21/2020, 4:47 PM
@sashank this reminds me of my issue from a few weeks ago
which ended up being a permissions issue
(tho on osx not ubuntu)
s

sashank

04/21/2020, 5:05 PM
Hey @Chris Roth, was able to reproduce the issue
c

Chris Roth

04/21/2020, 5:07 PM
nice! what was the issue?
s

sashank

04/21/2020, 5:07 PM
It seems that cron can’t run anything from 
usr/local/bin
, even python
c

Chris Roth

04/21/2020, 5:07 PM
interesting
hm
s

sashank

04/21/2020, 5:07 PM
So the crontab and script are running as expected, but
dagster-graphql
fails to run
Trying a few things to fix
❤️ 1
r

Res Dev

04/21/2020, 8:22 PM
could it be timezone difference in docker container?
it's usually UTC in docker containers
s

sashank

04/21/2020, 8:23 PM
Hm idts - I was able to reproduce this with a schedule running every minute
r

Res Dev

04/21/2020, 8:26 PM
I don't see any issue with with entryscript.sh
#!/bin/sh
# This block may be omitted if not packaging a repository with cron schedules #################################################################################################### # see: https://unix.stackexchange.com/a/453053 - fixes inflated hard link count touch /etc/crontab /etc/cron./ service cron start export DAGSTER_HOME=/opt/dagster/dagster_home # Add all schedules dagster schedule up # Restart previously running schedules dagster schedule restart --restart-all-running #################################################################################################### DAGSTER_HOME=/opt/dagster/dagster_home dagit -h 0.0.0.0 -p 3000
However my baseimage in dockerfile is
FROM continuumio/miniconda3:latest
s

sashank

04/21/2020, 8:43 PM
@Res Dev and schedules work as expected for you?
r

Res Dev

04/21/2020, 8:44 PM
yes
c

Chris Roth

04/27/2020, 4:42 PM
@sashank did you manage to get the schedule to run after you reproduced this?
s

sashank

04/27/2020, 7:26 PM
Hey, managed to do some more digging this weekend and I think I know what you’re running into. @John Mav had run into the same thing and we were able to fix his setup
It seems to be a python path issue - when we load your repository we’re probably running into an import error
I’ll be adding better tooling to surface these errors automatically in the next release, but for now if you have access to the machine running the scheduler, you can run
crontab -e
and edit your schedule like so:
Copy code
* * * * * /dagter_home/schedules/scripts/my_<http://repo.my|repo.my>_pipeline.sh >> debug_file.txt 2>&1  # dagster-schedule: my_<http://repo.my|repo.my>_pipeline
(We’re adding a small output redirect at the end to a file)
In that file, you’ll probably see an import error like this
Copy code
truncated…
  File "<frozen importlib._bootstrap>", line 941, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'pipelines'
The workaround for now is just to manually add the path to your folder containing your modules in the file you define your repository definition. For example, if you have a
repos.py
, add this to the top:
Copy code
SCRIPT_PATH = os.path.join(os.path.dirname(os.path.abspath(__file__)))
sys.path.append(SCRIPT_PATH)
^ This is 100% not ideal and we’ll have a cleaner error reporting next release, and happy to jump on a call with you to go through this fix
c

Chris Roth

05/18/2020, 8:53 PM
i'm looking at this now, finally. it seems like the issue is that
dagster-graphql
is actually in
/usr/local/bin/dagster-graphql
(local)
so i can add a quick fix by doing this:
ln -s /usr/local/bin/dagster-graphql /usr/bin/dagster-graphql
and then i get
No module named 'repos'
but it can find graphql now
getting this:
Copy code
{
  "data": {
    "startScheduledExecution": {
      "__typename": "PythonError",
      "cause": null,
      "message": "UnboundLocalError: local variable 'execution_plan_index' referenced before assignment\n",
      "stack": [
        "  File \"/usr/local/lib/python3.8/dist-packages/dagster_graphql/implementation/utils.py\", line 14, in _fn\n    return fn(*args, **kwargs)\n",
        "  File \"/usr/local/lib/python3.8/dist-packages/dagster_graphql/implementation/execution/scheduled_execution.py\", line 138, in start_scheduled_execution\n    raise exc\n",
        "  File \"/usr/local/lib/python3.8/dist-packages/dagster_graphql/implementation/execution/scheduled_execution.py\", line 123, in start_scheduled_execution\n    run, result = _execute_schedule(graphene_info, external_pipeline, execution_params, errors)\n",
        "  File \"/usr/local/lib/python3.8/dist-packages/dagster_graphql/implementation/execution/scheduled_execution.py\", line 169, in _execute_schedule\n    execution_plan_snapshot=execution_plan_index.execution_plan_snapshot,\n"
      ]
    }
  }
}
s

sashank

05/18/2020, 9:24 PM
This is in the most recent release? Taking a look
c

Chris Roth

05/18/2020, 9:29 PM
yup
👍 1
two issues actually - it's not running the cron jobs, which i am debugging atm - i suspect it is trying to run them in the user's home directory and failing to find repos.py when it should be running it in /opt/dagster/app
s

sashank

05/18/2020, 9:31 PM
try redirecting the output of the cron command and seeing that error is showing up
Copy code
*/2 * * * * /opt/dagster/dagster_home/schedules/scripts/my_<http://repo.my|repo.my>_pipeline.sh >> ~/log.txt 2>&1 # dagster-schedule: my_<http://repo.my|repo.my>_pipeline
Just adding the
>> ~/log.txt 2>&1
at the end
c

Chris Roth

05/18/2020, 9:32 PM
just tried this to fix the repos.py not found issue
ENV PYTHONPATH="$PYTHONPATH:/opt/dagster/app"
now i can run
/opt/dagster/dagster_home/schedules/scripts/pipeline_repository.test_scheduler.sh
from the root home directory, but it still isn't running on its own
will try what you just suggested
s

sashank

05/18/2020, 9:33 PM
let me know what you see
i see what’s going on
whenever you have invalid config for the scheduler, there’a framework error being thrown
which is the one you’re seeing:
UnboundLocalError: local variable 'execution_plan_index' referenced before assignment
what I would recommend to unblock is make sure your config is valid, and you’ll skip past this error. we’ll have a fix out for this soon
c

Chris Roth

05/18/2020, 10:40 PM
interesting
would that be
scheduler.yaml
?
s

sashank

05/18/2020, 10:41 PM
Sorry, I meant the environment dict returned from your schedule definition
c

Chris Roth

05/18/2020, 10:41 PM
oh ok
s

sashank

05/18/2020, 10:41 PM
There’s a validation error happening with the environment dict you return
c

Chris Roth

05/18/2020, 10:42 PM
side question - is scheduler.yaml deprecated since the scheduler config was moved into the
schedule_defs
s

sashank

05/18/2020, 10:43 PM
Yup it’s currently backward compatible, but you should move the schedule defs to the repository definition
Under
schedule_defs
c

Chris Roth

05/18/2020, 10:44 PM
ok great. i deleted it today since i added schedule_defs last week
s

sashank

05/18/2020, 10:44 PM
Did you have a separate repository.yaml and scheduler.yaml?
c

Chris Roth

05/18/2020, 10:44 PM
i did yeah
my scheduler.yaml was:
Copy code
repository:
  file: repos.py
  fn: pipeline_repository
scheduler:
  file: repos.py
  fn: define_schedules
s

sashank

05/18/2020, 10:46 PM
And your repository.yaml was just the repository part?
c

Chris Roth

05/18/2020, 10:46 PM
schedule_defs is now:
repository.yaml:
Copy code
repository:
  module: repos
  fn: define_repo
schedule_defs:
Copy code
schedule_defs=[
            ScheduleDefinition(
                name='staging_daily_esri',
                mode='staging',
                cron_schedule='0 3 * * *',
                pipeline_name='esri_pipeline_all',
                environment_dict={},
            ),
            ScheduleDefinition(
                name='test_scheduler',
                mode='staging',
                cron_schedule='* * * * *',
                pipeline_name='reseed_geoserver_pipeline',
                environment_dict={},
            ),
        ],
s

sashank

05/18/2020, 10:46 PM
Yup that’s perfect
c

Chris Roth

05/18/2020, 10:47 PM
i can't think of what would be wrong with the environment though that causes that cron error
s

sashank

05/18/2020, 10:47 PM
Try pasting into the playground?
c

Chris Roth

05/18/2020, 10:47 PM
OH, i see what you're saying
i think i know what's going on
s

sashank

05/18/2020, 10:50 PM
there’s this open in playground button that will put the config in the playground for you
c

Chris Roth

05/18/2020, 11:32 PM
so it nows runs successfully if i run
/opt/dagster/dagster_home/schedules/scripts/pipeline_repository.test_scheduler.sh
from the home directory - you were right that there was an environment error
but for some reason it's not running on it's own through cron
(i have it set to run every minute)
now debugging - gonna tell cron to log errors to a file
👍 1
s

sashank

05/18/2020, 11:33 PM
that should tell you what’s going on
c

Chris Roth

05/18/2020, 11:34 PM
nope 😕
i did
* * * * * /opt/dagster/dagster_home/schedules/scripts/pipeline_repository.test_scheduler.sh > ~/wtf.log # dagster-schedule: pipeline_repository.test_scheduler
wtf.log was created but is empty
s

sashank

05/18/2020, 11:35 PM
try piping stderr as well
c

Chris Roth

05/18/2020, 11:35 PM
oh good point
s

sashank

05/18/2020, 11:35 PM
Copy code
>> ~/log.txt 2>&1
c

Chris Roth

05/18/2020, 11:57 PM
that did it, thank you!!
i had to give cron access to environment variables
s

sashank

05/18/2020, 11:57 PM
sorry for the rough debugging process - trying to improve this currently
here’s how to pass through the env vars. alternatively, you can source your environment within your crontab to copy all your env vars
c

Chris Roth

05/19/2020, 12:06 AM
interesting. i did this: https://stackoverflow.com/a/41938139 which also worked
it seems like the disable button might not be working
s

sashank

05/19/2020, 12:28 AM
the turn off schedule button?
c

Chris Roth

05/19/2020, 12:28 AM
yup
i tried disabling but it's still going
s

sashank

05/19/2020, 12:29 AM
hm i can’t repro that one - try running
dagster schedule debug
and see if it shows you anything
c

Chris Roth

05/19/2020, 12:29 AM
good idea
2 Views