Hi there, I'm exploring Dagster (and Dagit), this ...
# announcements
s
Hi there, I'm exploring Dagster (and Dagit), this awesome tool that you guys gave to the community. My question is, what's the best way to set the directory (
base_dir
) for sqlite to store run information? Can I use the sqlite that comes shipped with Python? Currently, when it's set to
base_dir: /home/stany/anaconda3/bin/sqlite3
or
base_dir: /home/stany/anaconda3/envs/dagster/bin/sqlite3
, it returns the following error:
yaml.scanner.ScannerError: mapping values are not allowed here
Any suggestion? Thanks, Stany
d
Hi Stany - would you mind posting the contents of your dagster.yaml file?
s
Hi Daniel, Thanks a lot for willing to help me out. Here's the content:
# there are two ways to set run_storage to SqliteRunStorage
`# this config manually sets the directory (
base_dir
) for Sqlite to store run information in:`
run_storage:
  
module: dagster.core.storage.runs
  
class: SqliteRunStorage
  
config:
    
base_dir: /home/stany/anaconda3/bin/sqlite3 #/home/stany/anaconda3/envs/dagster/bin/sqlite3
# and this config grabs the directory from an environment variable
run_storage:
  
module: dagster.core.storage.runs
  
class: SqliteRunStorage
  
config:
    
base_dir:
      
env: SQLITE_RUN_STORAGE_BASE_DIR
`# there are two ways to set 
event_log_storage
 to SqliteEventLogStorage` `# the first manually sets the directory (
base_dir
) to write event log data to:`
event_log_storage:
  
module: dagster.core.storage.event_log
  
class: SqliteEventLogStorage
  
config:
    
base_dir: /home/stany/anaconda3/bin/sqlite3
 
# and the second grabs the directory from an environment variable
event_log_storage:
  
module: dagster.core.storage.event_log
  
class: SqliteEventLogStorage
  
config:
    
base_dir: /home/stany/anaconda3/bin/sqlite3
        
env: SQLITE_EVENT_LOG_STORAGE_BASE_DIR
I initially added the sqlite db that I created but didn't work either.
d
Ah, this is a problem with parsing the yaml file here - you can set base_dir to "env:" or a string but not both.
base_dir: /home/stany/anaconda3/bin/sqlite3
        
env: SQLITE_EVENT_LOG_STORAGE_BASE_DIR
The error message isn't very clear here, but some YAML parsers include a line number that sometimes gives you a clue where to look to find the problem
(taking out the "/home/stany/anaconda3/bin/sqlite3" after base_dir should fix it
Separate from the specific problem you posted about - on my machine at least, usr/bin/sqlite3 points to an executable file that runs sqlite, not a directory. You probably want to store your runs somewhere specific to dagster rather than in the /bin/ folder. The default location if you don't configure the base_dir at all is to put the data in a "runs" subfolder in your DAGSTER_HOME folder.
s
I'm using VS Code which is giving me line numbers. I did try to store it to "/home/stany/runs.db" but didn't work. When I took out the path after base_dir, I got the following error message:
Traceback (most recent call last):
File "/home/stany/anaconda3/envs/dagster/bin/dagster-daemon", line 11, in <module>
sys.exit(main())
File "/home/stany/anaconda3/envs/dagster/lib/python3.8/site-packages/dagster/daemon/cli/__init__.py", line 139, in main
cli(obj={})  # pylint:disable=E1123
File "/home/stany/anaconda3/envs/dagster/lib/python3.8/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/home/stany/anaconda3/envs/dagster/lib/python3.8/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/home/stany/anaconda3/envs/dagster/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/stany/anaconda3/envs/dagster/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/stany/anaconda3/envs/dagster/lib/python3.8/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/stany/anaconda3/envs/dagster/lib/python3.8/site-packages/dagster/daemon/cli/__init__.py", line 30, in run_command
with DagsterInstance.get() as instance:
File "/home/stany/anaconda3/envs/dagster/lib/python3.8/site-packages/dagster/core/instance/__init__.py", line 292, in get
return DagsterInstance.from_config(_dagster_home())
File "/home/stany/anaconda3/envs/dagster/lib/python3.8/site-packages/dagster/core/instance/__init__.py", line 325, in from_config
return DagsterInstance.from_ref(instance_ref, skip_validation_checks=skip_validation_checks)
File "/home/stany/anaconda3/envs/dagster/lib/python3.8/site-packages/dagster/core/instance/__init__.py", line 340, in from_ref
run_storage=instance_ref.run_storage,
File "/home/stany/anaconda3/envs/dagster/lib/python3.8/site-packages/dagster/core/instance/ref.py", line 220, in run_storage
return self.run_storage_data.rehydrate()
File "/home/stany/anaconda3/envs/dagster/lib/python3.8/site-packages/dagster/serdes/config_class.py", line 84, in rehydrate
raise DagsterInvalidConfigError(
dagster.core.errors.DagsterInvalidConfigError: Errors whilst loading configuration for {'base_dir': <dagster.config.source.StringSourceType object at 0x7f0a5771fe80>}.
Error 1: Post processing at path root:base_dir of original value {'env': 'SQLITE_RUN_STORAGE_BASE_DIR'} failed:
dagster.config.errors.PostProcessingError: You have attempted to fetch the environment variable "SQLITE_RUN_STORAGE_BASE_DIR" which is not set. In order for this execution to succeed it must be set in this environment.
Stack Trace:
File "/home/stany/anaconda3/envs/dagster/lib/python3.8/site-packages/dagster/config/post_process.py", line 72, in _post_process
new_value = context.config_type.post_process(config_value)
File "/home/stany/anaconda3/envs/dagster/lib/python3.8/site-packages/dagster/config/source.py", line 42, in post_process
return str(_ensure_env_variable(cfg))
File "/home/stany/anaconda3/envs/dagster/lib/python3.8/site-packages/dagster/config/source.py", line 16, in _ensure_env_variable
raise PostProcessingError(
The default location doesn't allow me to schedule anything.
Should I create the environment variable SQLITE_RUN_STORAGE_BASE_DIR?
d
Yeah, if you’re using the env: format in your yaml, it assumes that that is an environment variable that you have created and set to the location that you want. If you want to specify the base_dir yourself, that’s totally fine and you can set the base_dir field directly rather than using ‘env’.
Could you say more about why the default location doesn’t let you schedule anything, what error does it give you?
s
Sorry Daniel, it was my bad. I thought I had to specify the sqlite db that I created for it. It's working now 🙂 Thank you so much, Daniel!! I now need to go and explore it. You guys did an amazing job by developing this awesome tool,
condagster 1