Scale in Prose

Outline of the problems of scale in prose for technical writers in relation to the scaling needs of fiction writers, with some notes on why the tools of tech writers should carry over to improve the labors of fiction writers.

Night recently during a writing group meeting the question came up in regard to tooling and I made an offhand comment that I prefer using programmatic solutions to typesetting documents for submission and publication. It occurred to me after saying it that while my system is completely open source, I do not have an effective method of explaining what it is or how it works.

This article is the first in a series that attempts to ameliorate this by explaining my tooling and process in writing fiction, in particular how my tooling addresses certain problems scale, which should give a clear indication of why I prefer working in text editors over word processors of any ilk.

Problems of Scale

Most fiction writers manage their documents using word processors. The reason for this is that most writers are not actually working at scale. A novelist may be shuffling through a host of files amounting to several hundred pages of text, but they tend to operate on very small portions of that text at any given point in time.

The effect is that a word processor seems like a more convenient and useful tool. It's pretty, it's easy to get started, you don't have to think too much about how it works, and it's rare that you actually have to work at a scale to where its limitations become obvious, so you might as well just do it all by hand and not get side tracked.

Technical writers and especially technical writers working in documentation have to face a very different animal.

When a novelist sits down to a new project, they have a blank document. They'll spend a year or two writing on that document and when they're done they've something in the neighborhood of three or six hundred pages. When a docs writer sits down to a project, they're typically inheriting a document from Engineer. This document is usually a website, but in print amounts to a book that easily dwarfs *War and Peace.* Depending on the release cycle, the writer is required to edit, maintain and expand the book with changes and new materials sometimes coming annually and sometimes often as monthly.

This is a very different problem than the novelist faces. A word processor can sometimes open a book and sluggishly work through it, but try to feed it a small docs project and it'll fall on its face and cry. Try to edit it in a word processor and the writer will be the one falling on his face and crying.

Imagine you have a thousand page book and the task of,

Now try to imagine how long it would take you to solve these tasks effectively. Pervasive changes and complete searches in a book that long from a word processor would take hours to work through.

But... when you work from a text editor, the changes take seconds.

Now imagine if instead a thousand page .docx file you had thousand pages of content spread over a couple hundred plain text files.

Renaming Characters

Imagine you have a thousand page novel more or less together, but you've decided that the name of one character is not working for you any more. You want to give them a new name, perhaps tinker with their ethnicity or gender or whatever.

In a word processor you would open the base document or the file for every chapter in the book and perform a find and replace. Might take you the better part of an hour if the book is long enough, and that's just the text you have now. Moving forward you would have the new problem of wondering whether you called them by the new name or the old name. Meaning you'd have to do the whole thing over again during the editorial process.

That's in a word processor, when you work from a text editor it's a very different story. For instance, say you have a character named Debbie and you want to rename her Daphne.

user@hostname$ sed -t 's/Debbie/Daphne/g' *.rst

And we don't want to miss the diminutive, so let's throw another operation in there to catch them as well.

user@hostname$ sed -t 's/Deb/Daph/g' *.rst

The command called here is sed, which is short for stream editor. It opens the files listed by wildcard at the end of the command and streams the text, performing operations defined by the string. In this case the string says that you want to substitute the text of the first argument with the text of the second argument, and the g at the end means to do it for every instance on the line, not just the first. Lastly, the -i argument means that you want to perform the edit in place, updating the file rather than streaming it to standard output.

These two commands are the equivalent of running two find and replace all operations on every file provided and being an old school unix program operating on text file, it can iterate over a thousand page book changing every instance in microseconds.

That's a lot faster than doing it by hand, but it gets better. When you write in plain text you need a build chain to generate PDF documents to share with others. Personally, I like to define this build chain in a Makefile.

build: clean-text


clean-text:
    sed -i 's/Debbie/Daphne/g' *.rst
    sed -i 's/Deb/Daph/g' *.rst 

There are two facts which are important about this Makefile. First, we've taken the sed commands listed above and set them in the file for the clean-text target. We can list other sed commands here as new editorial concerns arise, but writing it this way means that we can call:

user@hostname$ make clean-text

To run every command in that indented block.

The second important point is that the build unit lists clean-text as a dependency. build typically calls all the commands necessary to render the document from plain text source files into a polished and typeset PDF document. Having clean-text as a dependency means that those sed commands get called before running the build, which means I never have to worry about Debbie or Deb creeping into my text. The computer automatically fixes them for me.

Moving Text

Let's say you are pretty far in your book and you've submitted it for developmental editing from a friend. Amid the feedback there is a point that you need to move a section of text from its current position to some later position.

In a word processor where one document contains the entire book, this means cutting and pasting a chunk of text from one position to another and hoping you get all the text you need to move and place it in the right position in a way that won't otherwise disrupt the formatting. It is, to wit, a manual process and necessarily error prone.

How you solve this from a text editor depends on the tool you use render plain text files to PDF. Personally, I write using the reStructuredText markup language and render the files to PDF using Sphinx. The specific method for moving text varies depending on whether you want to move a scene or an entire chapter.

Moving Chapters

Let's say you want to move a whole chapter.

Generally, I use the term primary character to refer to any character that can become the focus of narration for a chapter or scene. When I plot out a book, there are usually at least three primary characters and I expect that the chapters will alternate between them. Let's say I have a numeric digit in the filename to designate the ordering of the chapter within the novel, but I want to change the overall ordering so that the character Bob doesn't have two chapters in sequence.

In the source directory of the build there is a file, usually called index.rst that contains the top level structural specifications for the book. It might look something like this,

###########################
Words: A Novel
###########################

Part One
=========

.. toctree::

   part1/bob-1
   part1/bob-2
   part1/jill-1

The toctree directive shown above searches for three .rst files located in the part1/ directory and renders them in the order given. Say I've made a mistake in this and jill-1, which currently occurs after bob-2, should occur before it.

To change the ordering of the chapters in the novel, I simply change the ordering of the file calls in the toctree directive.

.. toctree::

   part1/bob-1
   part1/jill-1
   part1/bob-2

This changes the order to Bob's first, Jill's first, then Bob's second. When I next run a build it'll automatically update the PDF to display the chapters in the correct order.

Moving Scenes

Moving bodies of text within an individual chapter is a little more complicated than this, given the particular way in which I structure chapters versus books and parts. While some writers prefer to organize their books by 1:1 chapter to file, I prefer 1:N chapter to files.

With the project hierarchy I use a .rst extension to indicate structural elements of the book, that is the main book file, the files that organize parts and the files that organize chapters. When it comes to text, I use .inc files, with a special configuration variable added to vim to ensure that it always treats .inc files as though they were .rst files. I then use the include directive to pull the .inc content into the .rst file during build.

A chapter .rst file might look something like this:

###################
Chapter Title
###################

Section 1
===========

Subsection 1.1
--------------------

.. include:: subsection-1-1.inc

When the build runs the include directive pulls the contents of a specific file and dumps it into source where indicated.

In the output I suppress the individual section titles, collecting them into a block as the chapter heading. This gives the book an old timey feel and lists out in a poetical way to list what the reader is about to dig through. Organizing the document in this way lets me operate on the titles collectively and the text separately.

As you might guess from the above, moving a body of text within a chapter is as simple as moving the include directive to a different position, then running a new build.

Finding Text

Finding text in a single file is easy, but what happens when you've text spread across several documents. You can open each file and run find to see if it's there, but that's about all you can hope for.

By contrast, searching plain text files for a particular word or pattern of words is simple to the point of being silly.

user@hostname$ grep -i "thingamabob" *.rst

Grep searches every file in the current directory with a .rst extension for the particular pattern *thingamabob*, which here is case-insensitive though you can search for any pattern you like using Regular Expressions.

Unlike the find operation in a word processor, grep is fast. It can search host of files in seconds, and with the recursive option it can search every file in a directory tree if you like.

Word Count

When you have your whole document in one file you can go into the menu and pull up document statistics to check your word count. When you have your document spread across multiple files you can check the current file's word count against your notes to see what's what. But, when you have your document spread across multiple plain text files, you can run the word count locally using wc.

user@hostname$ wc ~/documents/*.inc

 13   450  2509 /home/user/documents/file1.inc
 55  1325  6865 /home/user/documents/file2.inc
 46  1549  8089 /home/user/documents/file3.inc
 44   811  4253 /home/user/documents/file4.inc
 22   505  2702 /home/user/documents/file5.inc
  3   104   540 /home/user/documents/file6.inc
 24   940  4943 /home/user/documents/file7.inc
 14   274  1443 /home/user/documents/file8.inc
 23   520  2814 /home/user/documents/file9.inc
244  6478 34158 total

This is the wc command, which is another old school unix utility used in getting the character, word and line count for a file. If you pass to it a string of files by wild card, it counts them all up together for you.

In cases where your directory hierarchy makes it difficult to garner this count with wc, there's a LaTeX utility for retrieving such statistics called texcount. Since I'm only interested in the word count, I pass the output to grep to boil down to the line I want.

user@hostname$ texcount document.tex | grep "Words in text"

Words in text: 6461

It's not exactly the same since wc counts some things that texcount doesn't, but it's in the ballpark, which it what matters. Put that command in a Makefile and tie it into a quieted build process, then when I call Make from my text editor I get that "Words in text" line as a return value.

Concluding Notes

I expect I'll spend a lot of time moving forward writing on different aspects of my workflow, but I hope for the present that this illustrates the clear advantages of a text editor over a word processor when working at scale.