Flavien
03/08/2021, 6:54 PMtag:key=value*
Rubén Lopez Lozoya
03/08/2021, 8:06 PMsashank
03/08/2021, 10:17 PMcollect
step
• @owen will discuss upcoming plans for the asset system, including asset lineage
• @sandy will will demo the new backfill and partition features in Dagit
• @sashank will demo the new documentation infrastructure and design
There will also be a Q&A and discussion section for any questions you have. If you’d like to submit a question in advance, feel free to do so here: Questions Form
DM me your email or fill out this form to be added to the calendar invite: https://forms.gle/K2dni4iPq48GnUEq6. If you aren’t able to make it, we’ll share a recording afterward.bklau-zap
03/08/2021, 11:20 PMsashank
03/08/2021, 11:22 PMpaul.q
03/09/2021, 4:35 AMSasha Gorelikov
03/09/2021, 12:27 PMDavid
03/09/2021, 1:55 PMDavid
03/09/2021, 1:55 PMDavid
03/09/2021, 1:55 PMDavid
03/09/2021, 2:12 PMDavid
03/09/2021, 2:13 PMDeveshi
03/09/2021, 4:30 PMmax
03/09/2021, 4:59 PMdwall
03/09/2021, 6:05 PMLaura Moraes
03/09/2021, 8:18 PMRubén Lopez Lozoya
03/09/2021, 10:11 PMNoah K
03/09/2021, 10:16 PMNoah K
03/09/2021, 10:17 PMHenry
03/09/2021, 11:37 PM@repository
def tmp_pipeline_repository():
tmp_pipeline = pipeline_factory()
return [tmp_pipeline, tmp_pipeline_schedule]
where pipeline_factory
returns a pipeline whose functionality changes based on some external data which changes asynchronously from the pipeline, will the pipeline be reconstructed before each scheduled run? If not, is there a way to do this that conforms to dagster
design patterns? Happy to clarify the details further if this is too vagueSpandan Pyakurel
03/10/2021, 4:29 AMdagster.execute_pipeline
. The execution, however wasn't displayed in dagit pipeline's runs list. Is there anything I can do to fix this?Johnathan Brooks
03/10/2021, 10:44 AMdbt_cli_snapshot
solid has incorrect configs. On line 340 here, you can see that dbt snapshot
is set with options thread
, exclude
and models
... however that command does not accept the models
flag, but rather uses the select
flag to choose which snapshot(s) to run (confirmed here). Does that look right to others as well? Anything I'm missing there or is this just one thing that needs to be fixed?sk4la
03/10/2021, 2:34 PM@pipeline
def ingestion_pipeline():
def ingest(file):
ingest_file(path=file)
def spread(file):
gather_files(path=file).map(ingest)
gather_files().map(spread)
When I execute this pipeline, I get the following:
Solid "ingest_file" cannot be downstream of more than one dynamic output. It is downstream of both "gather_files" and "gather_files_2"
As I understand it, nesting dynamic outputs is not currently implemented.
Alright, so instead of mapping from inside the spread
function, I tried to map over its result, like this:
@pipeline
def ingestion_pipeline():
def ingest(file):
ingest_file(path=file)
def spread(file):
return gather_files(path=file)
spread_files = gather_files().map(spread)
spread_files.map(ingest)
It does not work either, which makes sense since the underlying dependency graph should be the same.
Does anyone have tips on how to overcome this kind of situation using Dagster?David
03/10/2021, 2:49 PMCharles Lariviere
03/10/2021, 4:01 PMindex
to understand which column is the primary key (and uses it as the merge
key). It looks like my only option right now would be to exclude that “column” from the pandas dagster type, though I would prefer to have the dagster validation on it as well (i.e. non_nullable
, unique
, plus the nice-to-have documentation!)Lidor
03/10/2021, 5:04 PMMichael
03/10/2021, 6:44 PMJohnathan Brooks
03/10/2021, 8:51 PMAlex V
03/11/2021, 3:58 AMDeveshi
03/11/2021, 10:52 AM