Text Files Everywhere#

Date: 2025-12-21 Tags: intro, redefissues

“Goin’ back to nest in my family tree”

—Lamont Dozier

There is a mix of thoughts swirling in my head about software development workflows. I feel a need to organize and make sense of them; and then see if they will take me on a new journey.

The last software development project I was a part of was managed using a particularly hefty list of different products (most from different vendors, some internally developed). To avoid naming them, the specific ones used are not important anyway, here is just a list of activities where each had a dedicated product:

“A”: management of different types of requirements and risks
“B”: product management
“C”: technical specification reviews
“D”: issue tracking
“E”: time tracking
“F”: code reviews and CI
“G”: end-to-end verification testing management
“H”: deployment management
“I”: incident response management

This is not a complete list; I’ve narrowed it down to activities I wish to focus on. And sure, there are technical opportunities for merging some of these activities to be performed using a single tool. However, anyone who tried doing something similar will know the stark difference between what is technically possible and what can actually be accomplished in an organization of people.

Here comes one more incomplete list; why was the above frustrating?

Traceability between all listed activities is required. Everything needed to be connected, and demonstrably so to auditors.
Since the previous point is a laborious manual process, we did not verify it in continuation for everything—which left us open for missing things.
Data related to these activities, even though conceptually related, is fragmented in their respective product’s databases we did not always own. An automated integration was only possible if provided by the vendor, or if it’s technically and legally possible to develop.
As arbitrary analysis of everything as a whole was not practical, we did not even try to consider what useful insights could be gained from doing so.
The complexity introduced by this many components resulted in an overhead in operations that reduced efficiency. On many occasions it felt like we were fighting with these products and our procedures for using them more than we were spending time on the actual software development we were supposed to be doing.
The company had to pay some hefty licensing fees for the privilege of being in this situation.

Every listed point can be explained, rationalized and justified. You probably heard it all before, I don’t wish to go there right now. What I do wish for is an alternative. Not even an alternative that is necessarily better in most cases and for most people, but an alternative with a different set of compromises that can be viable in some cases and that is worth considering.

You might think “Why should I care? Most of the activities from the list are often not formally done for software projects”. Well, that’s the thing; my belief is that they should be done more—they work.

Over time, we can see how software projects around us adopted more and more activities like this. Thirty years ago version control systems were novel for many. I remember freshly graduated software engineers twenty years ago not knowing what that even is. Continuous integration was exotic. Code reviews and automated verification did not feel as mandatory as they do now. It’s worth stressing that creating good software involves much more than coding. Keeping this in mind is especially relevant now in the age of vibe coding. Therefore, I’m thinking about how doing all of the listed activities could be made easier and more natural; even for small software projects.

This challenge has been brewing in my mind for some time. I currently believe an effective approach could be accomplished by:

first turning the data model on its head,
then generalizing all of these activities into a single type of problem,
and finally embracing the Unix philosophy of responsibility fragmentation.

The Data Model headstand

In short, what I mean by this is no data silos. With the usual approach, different applications in play each own their dedicated and isolated databases:

Instead of that, data is to be collaboratively managed by users and applications:

You know… like what we’ve been doing with files for decades.

Recognition of the unifying software pattern in play

All the different products from above have the following in common about the records/artifacts they manage in relation to the listed activities:

They are structured units of information.
They have a type.
They have a state accompanied with a FSM for changing it.
They may link to each other, and these links can have types (relates to, blocks, mitigates, etc.)
They contain a set of fields; different sets for different types.
Validation rules for those fields can depend on state and associated links.
They exist independently of who works on them,
but people do need to be associated to them in different ways (fields for assignee, created by, reviewer, etc.)
History of changes is stored
A list of comments can be associated with them
A list of arbitrary file attachments can be associated with them

While this software pattern looks recognizable, I could not find a satisfactory existing name for it. Issue Tracking System is the closest I managed.

It is pretty close though, the main issue I have with the name is the word “issue” itself: many things tracked by such systems do not fit that classification (“requirement”, “user story”, “specification” to name just a few). Therefore, rather than getting hung up on semantics too much and making up a new name I’ll use this already recognizable one. By this point, people know not to take the word “issue” too literally in issue tracking.

Plain text and RDF to the rescue

Software engineers are used to working with plain text files. Why not take advantage of that familiarity? It can also be argued that plain text has the best longevity and portability (and people have, profusely… you won’t have difficulties finding examples). For people that don’t want to touch a text file, graphical user interfaces can be developed that look familiar yet use those text files under the hood.

Once text files are treated as more than inert documents, a familiar pattern begins to appear.

As soon as we want to say that one artifact depends on another, that something is provisional rather than settled, or that an idea evolved over time, we are no longer just writing text. We are making statements about artifacts and their relationships.

People routinely model knowledge without explicitly naming it as such.

The moment we distinguish between drafts and finished work, track relationships between artifacts, or reason about how something came to be, we are operating with an implicit model of entities, relationships, and change.

This understanding may live in comments, filenames, or mental shortcuts, but it exists nonetheless. Recognizing the unifying pattern means making it explicit, so it can be reasoned about, shared, and extended.

When artifacts, relationships, and changes are all treated as first-class, graph-shaped structures tend to emerge naturally. At that point, the shape of the model matters more than the particular way it is encoded.

Resource Description Framework (RDF) is one such encoding. It is not compelling because of its syntax, but because it matches the structure that has already emerged.

Putting plain text and RDF together almost feels like cheating.

Side note: a very short introduction to RDF

Resource Description Framework (RDF) represents information as simple statements: subject, predicate, object. This makes entities and relationships explicit and allows new facts to be added without rewriting existing ones.

RDF is less interesting for its syntax than for the kinds of systems it supports. Some well-known examples include:

Schema.org, which describes the structure of the public web
Wikidata, a collaboratively maintained global knowledge graph
PROV-O, a W3C standard for modeling provenance and change
SKOS, used by libraries and public institutions to manage vocabularies
OBO Foundry ontologies (e.g. Gene Ontology, Human Phenotype Ontology), used in open biomedical research

These systems differ widely in purpose, but share a need to represent entities, relationships, and change explicitly across tools and over time.

Not that there is anything novel about the combination, RDF had textual representations since its inception. The twist is skipping the usual triple-store ingest pipeline and keeping plain text files on a file system (or systems) as the source of truth. Integrity checks and queries would load an in-memory representation on the fly.

The major downside is complexity.

When I used RDF in the past, there was often some lack in my understanding followed by a failure to locate the right bit of documentation to bring clarity. It was frustrating to know that something could be done but not to know how. Luckily this is now less of a problem as LLMs have a good understanding of RDF and explain it well. This gives me hope that the learning curve for mastering it can now be surmounted with less difficulties—and that better tooling will result from more people doing so.

Another factor contributing to complexity is that when implementing an issue tracking system in this way, there is little choice but to make it distributed. On the bright side, distributed issue tracking systems are not something we have in abundance, so it could be good to have more options. I realize that many will immediately feel troubled by distributed state management. Synchronization of conflicting changes to state can quickly become non-trivial. This is where some centralization can come into play. Remember that distributed is not the same as decentralized. Nothing is stopping us from having a single point of control and truth to simplify conflict resolution. These points could additionally support a more classic client-server graphical UI many people will demand—built on top of RDF and plain text files.

Trying to condense these ideas into a single sentence, I get:

A file-backed, RDF ontology driven model for an issue tracking system where all higher-level behavior is delegated to specialized tools.

Immediately, the following problems with this approach come to mind:

Required skill set. Expertise in RDF/Ontologies and things like SHACL is not common in typical software development teams.
Performance and scalability. Databases exist for good reasons… The bet is that modern workstations are “good enough” for some brute force approaches that will likely be necessary, and that good designs for file synchronization can resolve scalability and data access control challenges.
Data integrity validation. Direct control over data means that it would be frighteningly likely for a change to brake its integrity. For good UX and confidence in the system the tools used with the data must make data integrity validation their central focus.
Ontology change management. With complex knowledge models and fragmented data storage, necessary data migration due to ontology changes needs to have a robust solution users can rely on.
History of changes for data. One extreme is the complete reliance on the underlying version control system for files, and the other is to store detailed history of all changes in the files themselves. Neither is particularly attractive. Middle-of-the-road solutions need to be explored.
Aversion to writing RDF formats (like Turtle, XML, JSON-LD) by hand. In most cases there will be a need for an intermediate format (probably a text markup format, or even annotations in code) that gets transformed to RDF on the fly.

Even after considering all that, I still find the approach promising. Admittedly, I have been known to be excessively optimistic… so to keep things as grounded as possible I would have to think rigorously about validation.

Comparison to existing systems

Comparison of issue-tracking systems page on Wikipedia gives an overview of many available options. This page also effectively illustrates how rare distributed issue tracking systems backed by flat files are. Maybe that’s for good reason and I should take a hint? Nah, there are too many hypotheses I wish to test first.

Looking through these and seeing the mountain of features available on top of issue tracking I additionally realize that I’m really not interested in making “a killer new ITS”. What I am interested in is making a framework and some accompanying tooling and documentation with examples with a highly limited and specific scope that people can leverage as just one small (though foundational) part in developing “a bespoke killer ITS” that they need.

If some of the example ITSs become more widely useful as they are, that is a bonus.

The big number of issue tracking systems that exist reminds me of a similar situation I encountered with workflow systems. My theory for why there are hundreds of them is that it is tempting to make one and that there is no perfect one-size-fits-all solution. Something similar might be true for issue management systems. In which case, it would be sensible to prepare a good foundation for building your own.

In that light, I’ll skip listing comparisons with specific existing ITSs. They are either not intended to be used in a way I describe here, or they can be used in conjunction with the framework I’m imagining.

But what about…

An honorable mention is Org Mode for GNU Emacs. For decades, Emacs users have successfully managed requirements, tasks, documentation, code, and even executable workflows using plain text alone.

Seen in that light, Org Mode is not an outlier but an early demonstration: plain text can support far more than static documents when paired with a sufficiently rich internal model. Its longevity is evidence that the idea itself is sound.

At the same time, this approach has largely remained tied to its original environment. My suspicion is that this is not due to a lack of expressiveness, but to a missing ingredient: a way for plain-text artifacts to serve as a general substrate on which arbitrary software systems can be built.

To truly have the best of both worlds, the text we work with must be usable both as a human-facing authoring surface and as a stable interface for other tools. This includes tools that look nothing like the original editor.

This is where a more explicit, tool-agnostic representation becomes relevant. RDF is designed to model knowledge independently of any single application, which makes it a natural fit here.

Potential future applications

While the primary target is software development management, the framework is to abstract away domain-specific references. The aim is to provide a generalized set of tools common to all issue tracking systems.

The primary advantage of this framework is frictionless interconnection of multiple bespoke ITSs, providing inherent cross-referencing and traceability between related entities from different systems. Instead of each such ITS having its own data silo, all issues from the various different interconnected ITSs are logically centralized and in a common format.

The secondary advantage of this framework lies in the fact that it is inherently distributed; provided a distributed version control system is used for the files. This reduces reliance on centralized remote infrastructure.

Taking this secondary advantage to its possible extreme, the framework could be part of a foundation for creating a completely (or partially) decentralized software forge, covering management of all stages in software development lifecycle (SDLC). Procedures and software to enforce them could be developed that make it possible to do things like code reviews and CI/CD locally on workstations with dedicated ITSs backing it. It would need to involve containerization for repeatability and digital signatures for trustworthiness, but it would all technically be possible.

With this in mind, in the long run I’m interested in experimenting with this framework as a foundation for many different applications; without prejudice. This includes everything you might find in a software forge, but some unconventional examples that come to mind are:

User interface modeling. Representing a UI as a directed graph where nodes correspond to states and edges to possible actions and their consequences. Apply algorithms from graph theory to UI/UX design analysis and optimization. Then additionally, use this model for automating UI interactions in tests.
Test case management. Use source code for test implementation as a the single source of truth. Derive RDF triplets related to test cases from source code annotations. Applicable to manual test cases as well by leaving the implementation empty.
Test run reports. Structured results of test runs permanently stored. Could be used for durable and distributed test execution by having the test runner use this report as its runtime state management.

Plan for a POC

To prove these concepts can work in practice, I’ll need to at least make the following:

A base RDF ontology appropriate for the ITS software pattern described above.
An example extension of that ontology covering most of the software development activities from the beginning of this article; to be used for managing the design and implementation of the POC itself (as the first example application of the framework).
Use of a markup format as an intermediate storage format (that is more pleasant to work with by hand).
A backend service (imagined to run locally) responsible for:
- Dataset integrity validation.
- Dataset query execution.
- Automated plain text file updates.
UX tooling for using a text editor as the main UI for authoring issues (probably an extension of an existing LSP server).
Web, TUI and CLI applications for interacting with the dataset.
A large mock dataset to stress the above applications with.

It’s quite a list… well, challenge accepted! I’ll give it a go at https://gitlab.com/toniruzadev/redefissues.

To be continued.