the dagster documentation is really awful
# dagster-feedback
l
the dagster documentation is really awful
dagster docs feedback 1
d
hey luis, where are you encountering issues?
l
The documentation is pretty bad (no offence intended), complex, lack of useful examples, even I can't find easily in the docs how pass multiple parameters to one asset (call an external API).
d
sorry you’re having trouble. if you can point out the specific areas you think are challenging, happy to submit them to our docs team for improvement
l
The docs have a lot of different objects and pieces but is not clear when and how use them. Just a bunch of concepts, random code, no useful examples. Sorry I gave it a try.
d
I have to disagree. Dagster docs have been absolutely incredible for me to learn both the high level concepts and specific issues that have come up dealing with the library. Used in tandem with the #dagster-support community, we have converted several critical pieces of our infrastructure over to Dagster in a relatively short amount of time
🌈 2
thistbh 2
d
@Luis Pinto could you share more about the specific use case you’re trying to accomplish? I’d love to try to recreate your learning experience.
l
@Danny Steffy I feel happy for you and I do wish success. Now, the fact that it does work for you, it doesn`t mean it will work for everyone or it will suit everyone, your reality is not absolute.
d
I never said it was, merely wanted to give a different perspective and mention that the community resources in #dagster-support have been fantastic to clear up any confusion with the library or documentation.
f
Hey @Luis Pinto - it’s a fair call. Do you have suggestions on what improvements might make it more accessible to you? Something like a concept map, perhaps? What would be useful would be some examples of other documentation that worked for you so we can compare.
b
So @Fraser Marlow @Dagster Jarred, my 2 cents (warning: tl;dr) I am almost a year in Dagster adoption and I love the promised features of Dagster. but even today I often find myself scratching my head to make head and tails of the advance features of Dagster. Just today I came across a statement
For a given dynamic partition set, partition keys can be added and removed. One common pattern is detecting the presence of a new partition through a sensor, adding the partition, and then triggering a run for that partition:
Now above says common pattern in detecting presence of new partition is through sensor, so is Sensor recommended way or the best way ? what is so special about adding the detection mechanism in Sensor which cannot be achieved in any other place. for eg. If I just write dynamic partition addition in a loop in the asset itself that is creating the sub partitions My current complexity is doing data processing with daily partitions that are Fanned out in smaller chunks and then finally aggregated again on daily basis. but this is not a unique pattern for me. this is a industry pattern for large scale data processing. I have been reading partitions and its related APIs again and again but its not clear. Overlapping features between Assets + Dynamic Partitions vs Ops with Dynamic Outs. One supports yield function, where as another does not I have been doing software development for a long time, and I understand documentation can be pain, but it needs to be written for smart people as well as not so smart. Examples (lot of examples) of APIs and their usages, Dagster has ~10 Partition definitions supported. It has ~10 PartitionMappings, but not all of them have Code samples. I go scouring for code examples till the depths of reading unit test code, of that new and shiny API that might be help me but unit tests are scaffolded to hide away lot of usage patterns, no complains about it, but just pointing the futile efforts to understand the usage hints of any XYZ API that is available with no examples or use cases The “Fully Featured Project” in examples is not using all new introduced APIs It might help with “imaginary” use cases of each of the APIs available, and that will help in adoption. Also additional proof reading might help. Lot of places I found statements too complex to grasp the context. I want Dagster to be super successful and I am awed at the velocity it is growing in features. I think the documentation will need to keep up with the amazing feature it supports
dagster docs feedback 2
f
Amazing. Thanks @Binoy Shah for taking the time to write up this thoughtful feedback. These are very actionable points. Yes, it has been a challenge on our side to both evolve the platform with constant releases and catch up all the documentation. But your input is the kind that will help us focus our efforts. Thanks for this and believe me, we will take this to heart.
m
My question here is very relevant I believe.
👀 1
r
the documentation is great, there’s just so much going on that it’s easy to get lost. the problem that I see is that Dagster offers too many different ways to do the same thing. when I'm trying to build something robust quickly, I don't want options - I want one very opinionated API that's been tested thoroughly and works as-expected. I'd rather dumb + simple + works as expected vs an ever-evolving black box that tries to do everything for you but has edge cases. a good example is simply building an asset graph where my assets materialize daily. how many different ways can you think of doing this in Dagster? there are various types of explicit sensors with different policies, and the catch-all reconciliation sensor. I can also avoid the standalone concept of sensors entirely and use an asset job + schedule based on the partitions definition. there are also materialization policies that can generate your assets based on freshness and other attributes instead of partitions. even then, there are hidden gotchas and restrictions in these approaches that don't come up until runtime. very quick examples: • all assets in a given job must have the same partitions definition. even if the cadence is the same, the start dates must be the same. does not work in a repo where folks are adding assets on a daily or weekly basis with new start dates moving forward. if we set all start dates to the min of the graph, then new assets have false flags in the UI for missing partitions. • the asset reconciliation sensor doesn't work for graphs where the absolute root has a static partition or adhoc materialization. a common use case is a downstream asset that reads from a static asset (e.g., file) + a source asset sink (e.g., API table). when using the sensor, the downstream just gets stuck. IMO, the project moves very fast and is always introducing new features and ideas. this is nice behavior for a research project, but frustrating when deploying and maintaining this on production systems. btw the Slack community is great and the Dagster team is so incredibly nice and helpful, but IMO it's not a good signal when users are frequently coming to Slack for help.
👍 3