A Minimalist Approach to Software
Please consult the latest
version here.
- What is minimalism? Minimalism does not mean writing less code,
but writing code that counts. Minimalist programs are elegant and
have a tight code structure, and do one thing well.
- Brutal minimalism means removing all features and code that are
superfluous and not needed to achieve the minimum viable program
(MVP). Some features are "good to have" but if they are not "must
have" they should be eliminated if aiming for a brutally
minimalistic design.
- Minimalism goes hand-in-hand with simplicity. A
program should be as simple as possible to do the task at
hand. Over-engineering is a serious issue in software, leading to
extreme bloat, unnecessary layering and an unmaintainable mess.
- Programming is never about lines of code or less typing or other
such superficial measures. (Though all these are typically the
outcome of minimalist design). Programming is about expressing
executable ideas cleanly. It is a very difficult art and requires
removing the fear of (full or partial) rewrites and a sharp,
mathematical and axiomatic focus on minimal concepts required to
implement features efficiently.
- In our notation an object is a unit in which related data are
kept together. Example: instances of a C struct containing
plain-old-data (POD). To achieve a clean design, data and operators
on those data must be kept separate. This is the mathematically
correct thing to do as it allows constructing different systems
of functions to manipulate the same data in an independent and
non-intrusive manner.
- Data in objects should be treated as read-only and not directly
modified. Data modification should only happen via functions.
- Though sometimes flexibility and extensibility are important, very
often they are not important and in such situation only the
special case should be handled, but handled well.
- Flexibility and extensibility should not be based on the existence
of common terms in ordinary language to describe two otherwise
disparate systems. Ordinary language is not precise enough to
express commonality and only through very careful analysis one
discovers commonality (or lack thereof).
- Different systems should not be shoehorned into one without
significant analysis. In fact, flexibility usually is increased
when systems are cleanly separated, but allow structured
communication between them. Consider Unix command-line tools and
their simple and elegant chaining mechanisms via pipes and
output/input redirection.
- When flexibility and extensibility are required they should be
implemented with an elegant and minimalist design without the need
for complicated class hierarchies and fat interfaces. A hierarchical
class structure that first suggests itself usually does not work
cleanly in practice. Object (data) nesting is fine, class
inheritance is usually not as it leads to incestuously shared state.
- Inheritance should not be used to implement feature extensions as
it leads to code bloat and brittle class hierarchies. Some code
duplication is fine (and duplicate code should be refactored into
functions).
- Separation of data and operators allows dispatch on multiple object
types. That is, functions can be written that take two or more
objects to perform an action. This removes the incestuous state
sharing that occurs when data and operators are mixed.
- Dependencies should be minimized, and especially dependencies that
one does not understand in a deep manner should be
avoided. Exponential dependencies (if each dependency adds two
more) should be avoided at all costs. If dependency management leads
to adoption of a complex package manager that does "magical things"
like install everything under the sun from scratch, then the
situation should be re-examined very carefully and simplification
undertaken.
- Modern scripting languages are very flexible and powerful. Some
like Lua are specially designed for embedding in larger applications
and have a very tiny footprint. C code (or C APIs) are very easy to
bind in multiple languages. Hence a good architectural motif (used in
redis, haprox and most games) is to write the low-level performance
critical code in C and use scripting to provide higher level
control.
- It should be remembered that not all control structures need be
possible in C. Higher-level scripting languages allow more complex and
elegant control structures (like lexical closures or coroutines) even
when they are missing from the low-level language used to implement
the performance critical aspects of the code.
- The API exposed to the scripting language should be fine-grained
enough to allow use of complex control structures like lexical
closure, coroutines and iterators. Allow users the ability to pass
structured data between the script and compiled layer.
- Proper use of C structs and function pointers can lead to
surprisingly elegant designs and clean code. Memory management is not
the burden it is made out to be. Recall highly robust and reliable
software like the Linux kernel, redis, haproxy, sqlite etc are written
in C. Static analysis tools and valgrind are your friends. Remember:
at first one wants results but very soon one wants control. C gives
you complete control.
- Last year's problems should not be papered over with yet another
layer of code. Layered software design is good but layers should be
used in the sense of indirections and not bandages.
- To understand an existing software
library/framework popularity should not be used as a
metric. Some popular libraries may have high-quality code but more
often popularity is simply an indicator of good marketing (funding
pressures or corporate greed to establish platform tie-in). Further,
popularity, specially when it comes with a promise of quick initial
returns, often indicates mediocrity as popularity can only be achieved
by targeting people who can't be bothered to develop a deep
understanding and create minimalist
programs. Typical minimalist applications do not have extensive
enough needs to require including everything-under-the-sun
frameworks. In fact, it is a good idea to avoid anything that has
the word "framework" or other buzzwords in it.
- Minimalist and MVPs should be quick to build. Incremental builds
should not take more than a few seconds and full system distclean
rebuild should not take more than several seconds. Note that using
some heavily (infernally) templated C++ libraries slow builds
notoriously. These infernal templated libraries (ITLs) should be
avoided.
- There is no need for the development and deployment
build systems to be the same. In fact, GNU make is a good option for
all builds. Recall that at deployment one does not need full
dependency tracking and so it is sufficient to simply build
everything. Efficient development, on the other hand, requires fast
incremental builds and hence a fast minimum-dependency build system is
desirable.
- Consider sqlite that takes the extreme step of creating a single
monster C file that can be simply built with "cc -c -O
sqlite3.c". This is not always possible or desirable for all
projects, and perhaps an "amalgamated directory" approach is
better. In this approach a script or build target generates a
deployment directory, constructs Makefiles and/or shell-scripts to
compile all code and tar.gzs everything. Note cmake generated
Makefiles are not stand-alone and so can't be used in amalgamated
deployment archives. Obviously, this amalgamation approach does not
work for script code but is also not needed: amalgamation should
ease builds while scripts do not require building.
- In summary: creating efficient and innovative software requires a
minimalist or even brutally minimalist approach. The goal should be
to construct one or more MVPs that have structured data exchange
protocols instead of giant monolithic programs. The latter are
almost invariably slower, harder to maintain (despite their
developers having used the latest OOP and "Agile" fads to make them
extensible) and difficult to understand.
Ammar Hakim and Murtaza Hakim. Updated November 3rd 2022