It is useful to try and capture the story of a predicate or a feature in the docs?

The title ends in a question mark so I guess I feel the answer is “NO”; however, I would like to know how others see this. This post is provoked by the suggestion of @abaljeu in that thread there.

My own experience with “versioned docs” is a mixed bag. Just the other day I was fighting my way through the docs of Python’s ipaddress and there are some real gems there, for example:

https://docs.python.org/3/library/ipaddress.html#address-objects

I don’t know if those are auto-generated or hand-written but we have incompatible behavior going back and forth between versions :smiley: [EDIT not sure really, let’s just say it is quite confusing]

I would rather circumvent the issue altogether and stick with the “latest” or if I can’t find motivation to upgrade/test/refactor then just never touch the code again until it is time to throw it away.

(I missed a lot of words here; what I meant was, I write my code for the latest shiniest software at the time. Then, I have two obvious options: either update regularly and aggressively refactor every time upstream has breaking changes; or decide to freeze until it is time to throw my code away and only fix bugs in my own code. What ends up happening in practice is that most of people most of the time are forced by the circumstances to take some middle road. For example, you don’t want to change any features in your own software, but a vulnerability fix upstream forces you to upgrade your dependencies and in order to do this you must refactor. Maybe that’s just how it is and in that case studying the changelogs is inevitable…)

Another consideration is that that kind of documentation is invaluable for closed-source software but the combination of open souce and git supersedes the need for it to some degree.

On the other hand there is a rather big precedent with the “SWI-Prolog extensions” section but I would argue that this is on a different level conceptually.

What are your experiences and opinions?

2 Likes

SWIMM is a startup venture dedicated to continuous documentation by integration of documentation with git and git based code versioning.

Apparently, SWIMM sees a lot of traction in organizations who see a lot of value for documentation to onboarding – given the average churn of 2 years in hitech.

Its apparently also free for open source projects …

It is of course good they call it SWI MM :slight_smile: It is totally unclear what they do though. Their claim for it to work about of the box anywhere is rather unlikely considering SWI-Prolog’s documentation comes from various sources. If anyone wants to try it, please report.

Overall, we have seen some hints on how to find things. One option might be to integrate these more closely into the website. For example a button that tries to extract relevant commit messages and/or the source. One of the problems is that changes indirectly propagate to closely related predicates where functionality is shared. So, you’d need a dependency tree. That is doable of course. That could also be used to browse the relevant source quickly online. I don’t know the feasibility of this. It is also hard to guess how many people this would make happy.

Python’s documentation takes some skill to read. There’s the reference documentation, often one or more PEPs (enhancement proposal), and long discussions in the python-ideas mailing list. Plus various tutorials, which are often wrong in subtle ways (which doesn’t stop them from being popular). It’s a somewhat flawed system, but at least the PEPs help in understanding the rationale and intended use of a feature (the PEPs also summarize the mailing list discussions).

Yeah sure but ain’t nobody’s got time for that :smiley:

Here is what I was referring to (how to confuse a cat):

Changed in version 3.8: Leading zeros are tolerated, even in ambiguous cases that look like octal notation.

Changed in version 3.10: Leading zeros are no longer tolerated and are treated as an error. IPv4 address strings are now parsed as strict as glibc inet_pton().

Changed in version 3.9.5: The above change was also included in Python 3.9 starting with version 3.9.5.

Changed in version 3.8.12: The above change was also included in Python 3.8 starting with version 3.8.12.

I really didn’t mean to shit on Python in particular, I just used this as a practical example of the perils of (automated?) annotation of the historical developments.

Agreed. But nobody seems to have come up with a better way of doing things. :wink:
Even with full-time tech writers. (When I was at Google, there were running jokes about the documentation being always out of date … and that was with a team of tech writers who did a very good job of organizing things and keeping things up to date. IBM did an excellent job of keeping its external documentation up-to-date, but the cost was significant.)

I haven’t tried SWIMM, but did attend a webinar, which looked very promising.

I work with a similar tool which has a graphical user interface that I find more appealing to me – but, my tool currently lacks significantly in that it doesn’t synchronize with code changes – to show me where documentation I wrote may have become outdated.

SWIMM has this feature built-in and its one of their killer features to help overcome documentation staleness.

They promote a continuous documentation process and I think recently they integrated into JIRA as well.

p.s. as an aside - as a startup they were well funded some time ago with 27M USD.

Burning through venture capital?

I guess that requires the tool to at least know where the documentation for a particular piece of code is. For SWI-Prolog it resides in a .pl file, a .md file or a LaTeX file. For the Prolog code the PlDoc comment at least documents the code (typically) below it. The docs for foreign predicates in packages is also documented in the .pl file, but the code is in some C(++) file. How is any tool going to make this link? Then, changes may well be further down the combined Prolog and C(++) call tree. How is a general tool going to find these dependencies? [ I came across cflow yesterday, which promises to create a call tree for C code, but apparently it does so using rather superficial analysis as it gets it completely wrong when applied to the SWI-Prolog source ].

Even if you could fix these problems using a tool where you can add plugins/rules that explain how the source is organized, you get the real hard problem that changes to some predicate/function in the call tree do affect their callers. It is typically rather unclear whether documentation needs an update and if so, which.

My gut feeling is this can only work if the project organization and comment/docs organization follows pretty rigid rules. For a new project that is quite likely worth a try. For an old project the effort to get there is way too high. Of course, if SWIMM can do all this magic, I’m glad to use it :slight_smile: For now I do not believe in fairy tales :slight_smile:

2 Likes

Not so sure. I have this obsession with side-stepping hard problems. (Some call it “laziness”). Yes, it is difficult, expensive, time consuming to write technical documentation, and to keep it up to date for software in particular. However, there is a way to side-step the problem; it isn’t novel as such and I don’t expect to write anything that you don’t already have experience with. [Disclaimer: I am currently testing this approach in practice. However, I am not using a scientifically rigorous approach because I cannot afford it, so the value of whatever “wisdom” I am dispensing is questionable…]

Do not write technical documentation that reflects the codified parts of the system.

Do not write specifications in a natural language.

Do not document what your program/module/class/function/predicate… does.

Do not make diagrams and other visualizations that simplify and obscure the behavior of your system. (Here, “system” is your software and all the upstream software, running on a concrete machine, within the context of a larger system with practically unpredictable behavior.)

You will of course do all of these things as you develop and maintain your system. It is unavoidable that in the process you will produce many artifacts of fuzzy nature. Their purpose is to help people unite towards a common goal, share a narrative to give direction, and motivate. Once you have codified the behavior, you can and should discard those artifacts from your “documentation” and keep them only as a part of an audit trail or proof of work.

There are two underlying assumptions to my argument.

  1. The codified parts of your system are available for inspection; this means both readable by humans and crucially, in practice executable on a concrete machine within a predictable environment, ie they are reproducibly testable.
  2. You do provide enough documentation, strictly for humans, that delineates the purpose of your system within some larger context. There, you can and should go wild with your diagrams and visualizations, as long as you remember to leave out all the codified parts.

This is my current understanding of the issue of technical documentation, very superficially. I tried to keep it short and purposefully omitted a lot of relevant detail. For example, the definition of a “concrete machine” is quite fuzzy itself. I don’t mean a particular CPU architecture, I mean any “machine” that is in practice a black box that behaves somewhat reproducibly. Under this definition, even a system built with some (or several) cloud service provider’s product offering is still a “concrete machine”.

I reached my self-imposed word count limit. Thank you for your attention :slight_smile:

1 Like

Boris,

It happens that I have been following the company for some time now.

And, they are, in my mind, a model startup venture – they got seed capital, the founders lived the problem first hand, and they first worked with a design partner – a company that wanted their solution work for them.

Once they worked out a first working feature set in collaboration with design partners, they expanded to other beta customers and eventually raised (as said) 27M USD, etc.

The problem they are solving is real (how to systematically onboard new developers into existing code bases, with an average churn rate of 2 years in industry), and their git based solution gets traction …

Quite like the tool I use, its a clean-slate documentation tool.

You write the documentation as you code, or shortly after, with their tool. You don’t automatically integrate already existing documentation.

You can link the documentation to any file – in the tool i use its simple a generated token I am adding to pl files, but could add then to c or cpp or any other text based file. These are picked up by the tool to link and visualize documentation.

So, it works for projects that can move and use to their documentation platform.

To me it looks that they are either reinventing existing technologies or even worse, knowingly forgetting to clearly put their work in the context of existing prior art. Back in the day when programs were commonly written in what we now describe as “low-level programming languages”, and when computers were prohibitively expensive and quite unique, Donald Knuth came up with the concept of “literate programming”. This has been iterated on and has influenced more modern technologies (Jupiter comes to mind). I don’t want to make too far-reaching assumptions but it might be that SWISH shares some of that pedigree.

For what I can see, they are not showing me their code. This automatically make them ineligible for certain purposes that I am interested in.

1 Like

Developers are people, shockingly enough. If you treat them like inexpensive, easy to replace resources, you will indeed get exactly that. Maybe it is something we can no longer avoid but on some level I really dislike the idea.

It doesn’t really matter if they reuse or reinvent prior technologies or ideas … its not about innovation really.

What matters is that they successfully create a product based on those ideas and bring the product to market – this is a difficult and non-trivial task.

Its customers who vote with their feet and money, if the startup succeeds or not, and so far, it seems they have a good first shot at succeeding.

This is a strange argument. If you are using a proprietary product and the business does not succeed, what do you do then?

You find some guidelines to write literate programms here.
I share this because among all other papers and books about
literate programming, sometimes 300 pages (sic!),

this was refreshingly short!

Unbenannt3

Source:

Literate Programming, A Practioner’s View
Bart Childs - 2001
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.365.8708

2 Likes

I went deep into literate programming territory at one point :slight_smile: the take-home message at the time was that modern languages have to some degree superseded the need for literate programming as initially proposed by Knuth. A non-exhaustive list of features present in a modern Prolog (Java, Python…) that had to be fixed (for Pascal!) by the original WEB tool by Knuth:

  • “Names” can be as long as necessary and the “syntactic constraints” are much more relaxed. For example, 'Reverse each line of input' is a valid predicate name (even if a bit unconventional).
  • There are very few places where the order of “code chunks” within the text of the program really matters. Artificial constraints of how you organize your code in files are far less common.
  • General availability of libraries for low-level algorithms (so you rarely have to write highly optimized and hence “unreadable” code).
  • Portability between platforms is responsibility of the run-time (so you rarely have to get to the nitty-gritty detail where extensive explanations are necessary).

Another issue was brought up by Michael Richter on the old list, from 2013-04-22:

Literate programming relies on someone exceedingly rare to pull off: someone who’s good at coding and who’s good at writing. It’s a great idea in theory. In practice it’s an almost-guaranteed horror.

I gotta say that back then I was a bit offended by this off-hand remark but now I have accepted the realities :smiley:

Some of the ideas have survived. R Vignettes, Jupiter, Notebooks as implemented by SWISH all have been influenced by literate programming.

I still have a suspicion that programming, as practiced today by the overwhelming majority of programmers, and all the tooling around it, is stuck in a local optimum purely by a historical accident; and that literate programming does hint towards a different hill or mountain altogether. Who knows…

I like to think about code as a combination of algorithm + data structure + architecture that maps onto features (scenarios).

Most, if not all, of the above relationships are not well captured in linear code text dispersed across text files / modules.

You need code stories that start from features (that explain the why) and that selectively refers to the parts to pull them together.

The code documentation tools suggest pursuing this model … its like a documented walk-through by the developer who coded the code.

Edit:

With feature I mean for example a user facing functionality and with code story i mean the documentation artifact those documentation tools promote.