Klaus Stadler
02/03/2021, 2:14 PMCharles Lariviere
02/03/2021, 2:45 PMCaleb
02/03/2021, 2:56 PMRyan
02/03/2021, 5:20 PMDylan Bienstock
02/03/2021, 7:24 PMMust pass the output from previous solid invocations or inputs to the composition function as inputs when invoking solids during composition.
Does anyone have any ideas on why this might be happening. Thanks 🙂
Code example
config = {
"solids": {
"solid_b": {
"config": {'path': 'test.txt'}
}
}
}
@pipeline
def test_pipeline():
test_solid = solid_b(solid_a, config_schema=config)
Edward Chen
02/03/2021, 8:07 PM@solid
def hello(_):
return json.loads("{"msg":"hello"}")
@solid
def send_http_call(context, results):
// Does an HTTP POST using the value from hello()
@pipeline
Def hello_pipeline():
send_http_call(hello())
The actual error message is Dependencies for hello_pipeline().compute failed
so trying to figure out how I can dive deeper to where .compute is occurring. Currently been reading up on the docs in case there’s a concept I missedSimon Späti
02/04/2021, 12:34 PM0.10.0
. In earlier versions we returned a DataFrame as in the airline demo (see also image attached). I understand from the docs that
PySpark DataFrames cannot be pickled, which means that IO Managers like the fs_io_manager won’t work for them.If we use the
LocalParquetStore
as illustrated above, do we need to add this IO-Manager to Pipeline. Does it mean that other outputs in these pipeline must be parquets as well? Or can there be multiple? Or how does default pickle work along side with such LocalParquetStore?
Haven’t gotten my hands on that, but my co-worker is struggling with it. Thought I will ask quickly for a guideline before he starts adding stuff to io_manager.py in dagster-aws.
In my understanding was, that with io-manger not much to the solids need to be changed. But for DataFrames and Spark, we need to change our way we output data frames in our solids? Thanks for your help. Not sure if others have already update to 0.10 with spark?danil
02/04/2021, 7:00 PMworkspace.yaml
with 2 pipelines under the same directory. Reading documentation was of no help since it covers only simple use cases.
Here is the structure of the folder from where I run `dagit`:
- local_repo.py
- workspace.yaml
- dagster_baby_pipeline (folder with pipeline definition)
- dagster_try (folder with pipeline definition)
In local_repo.py
I try to import baby_pipeline.py
from dagster_baby_pipeline
and hello_cereal.py
from dagster_try
to return from @repository
function. From the logs I am suspecting that those module aren’t getting loaded into Dagster process hence the relative imports don’t work.
If I specify working_directory
for ONLY dagster_baby_pipeline
folder in workspace.yaml
and don’t import dagster_try
then it works. It seems like if I create a repository for each pipeline in its respective modules and import them separately in workspace.yaml
then it will achieve what I want but will require boilerplate overhead.
Can you please clarify what is happening under the hood with imports in this scenario and what is the best practice on using these abstractions? Is there a way to load both dagster_baby_pipeline
and dagster_try
modules so they can be accessed by a single local_repo.py
?user
02/04/2021, 10:10 PMowen
02/04/2021, 10:18 PMEdward Chen
02/04/2021, 11:38 PMload_from:
- python_environment:
executable_path: /usr/local/bin/python
target:
python_file:
relative_path: /opt/dagster/dags/example/hello_dagster.py
working_directory: /opt/dagster/dags
The curl call that I do is also pretty simple:
curl -v '<http://localhost:3000/graphql>' \
-H 'Accept-Encoding: gzip, deflate, br' \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H 'Connection: keep-alive' \
-H 'DNT: 1' \
-H 'Origin: <http://localhost:3000>' \
--data-binary '{"query":"query RepositoriesQuery {\n repositoriesOrError {\n ... on RepositoryConnection {\n nodes {\n name\n location {\n name\n }\n }\n }\n }\n}"}' --compressed
user
02/05/2021, 4:58 AMjohann
02/05/2021, 6:01 AMKlaus Stadler
02/05/2021, 8:51 AMRyan
02/05/2021, 11:53 AMSensorExecutionContext
has a DagsterInstance
, but not clear how to get from there to any resource information?Jason
02/05/2021, 3:16 PMschedules
and presets
- If i have a pipeline with one or multiple presents (say "dev" and "prod" related) and i'm using the UI, I can easily pick which preset I need. And I understand the function decorated by @schedule
must return a run config
but how do I get it to return a preset? (how to I get the pipeline to run a specific preset on a schedule)Josh Karlin
02/05/2021, 5:31 PM<http://log.info|log.info>("the next train is at %s", never)
Brian Abelson
02/05/2021, 7:39 PM0.10.4
(from 0.9.22post0) and am having issues installing dagster
and dagit
together using docker
. for one, there doesnt seem to be an explicitly stateded requirement for gevent
in dagit
:
The 'gevent' distribution was not found and is required by dagit, gevent-websocket
I then seem to have to jump through an endless string of adding different dependencies or addressing version mismatches (most related to graphql). is there some reason why these dependency issues would occur in a docker env and not in my local dev env (Mac OS X)? My requirements.txt
and Dockerfile
are pasted below:
requirements:
dagster>=0.10.4
dagit>=0.10.4
dagster-pandas>=0.10.4
dagster-slack>=0.10.4
dagster_postgres>=0.10.4
dagster_dbt>=0.10.4
pandas<=1.1.4
marketorestpython==0.5.8
pandasql==0.7.3
python-dotenv==0.10.3
retry>=0.9.2
simple-salesforce==1.10.1
SQLAlchemy==1.3.20
PyMySQL>=1.0.0
pytz<2021.0
pyaml==20.4.0
psycopg2>=2.8.0
CensusData>=1.11.post1
Dockerfile:
FROM python:3.9.0-slim
RUN apt-get update -yqq && \
apt-get install -yqq cron gcc git && \
apt-get install -yqq libpq-dev && \
apt-get install -yqq default-libmysqlclient-dev && \
apt-get install libmariadbclient-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
ENV DAGSTER_HOME=/opt/dagster/dagster_home/
COPY . /opt/dagster/dagster_home
WORKDIR /opt/dagster/dagster_home
RUN python3 setup.py install
EXPOSE 3000
ENTRYPOINT ["dagit", "-h", "0.0.0.0", "-p", "3000"]
Felipe Saldana
02/05/2021, 10:02 PMNoah K
02/05/2021, 10:03 PMNoah K
02/05/2021, 10:03 PMThomas
02/06/2021, 10:06 AMThomas
02/06/2021, 4:42 PMDavid Farnan-Williams
02/06/2021, 6:58 PMdanil
02/07/2021, 12:28 AMcomposite_solid
abstraction?
I am getting UserWarning: Error loading repository location local_repo.py:dagster.core.errors.DagsterUserCodeProcessError: AttributeError: 'InputMappingNode' object has no attribute 'startswith'
when passing a value to composite_solid
that will then based on that value will determine which solid
to execute. Code is attached in the thread to spare the channel.Avinash Varma Kanumuri
02/07/2021, 5:27 PMdagster.core.errors.DagsterSubprocessError: dagster.check.CheckError: Member of list mismatches type. Expected <class 'dagster.core.execution.plan.inputs.StepInput'>. Got UnresolvedStepInput(name='region', dagster_type_key='String', source=FromPendingDynamicStepOutput(step_output_handle=StepOutputHandle(step_key='get_regions', output_name='result', mapping_key=None), solid_handle=SolidHandle(name='execute_per_region', parent=None), input_name='region')) of type <class 'dagster.core.execution.plan.inputs.UnresolvedStepInput'>.
Hamza Khurshid Butt
02/08/2021, 1:46 PMSUCCESS
but that bar for solid keeps on moving to infinite time.
Here is screencast of the Bug:
https://www.loom.com/share/b4f6b031cbea43c1893e45a552f917aa
Any Help or suggestions would be appreciated🙂👍Charles Lariviere
02/08/2021, 9:09 PMNone
or an empty list?
I have an extract
solid that pulls data from a REST API as a list of dicts, which I take as input in a to_df
solid to format as a dataframe. The API is not guaranteed to return results for a given partition, which then causes issues downstream with my IO manager and Dagster’s type validation. I could address that in the to_df
solid, but curious if there was a better pattern to use — some kind of conditional execution logic in the pipeline definition?Xu Zhang
02/09/2021, 4:32 PMXu Zhang
02/09/2021, 4:33 PM