Appendix A: When Editing Code Files, Use a Text Editor; Not(!) a Word Processor
On several occasions in this text you will encounter exercises which require you to create files containing some textual content; perhaps a PHP program, some XML, some JSON, etc. Although there are several ways to create a file with text in it, most of us will use a software application in which we type or copy/paste the text. When you do this in the context of this book’s exercises, we want you to use a so-called text editor, not a word processor.
For a quick lookup of common free and open source text editors for Windows, macOS and Linux, see the table at the end of this appendix. If you want to understand the difference between text editors and word processors, read on.
A Word Processor and a Text Editor Are Not the Same Thing
From experience we know that few students who are new to coding realize that what looks to be text to them, does not look the same to a machine. For instance, suppose that someone types the following into the interface of a text editing program such as Windows’ Notepad: foo Enter goo
No-one will be surprised when it looks like the following on the screen:
Let us assume that we now store these two lines of text in a file called foo.txt.
When asked how many symbols (letters, bytes) foo.txt contains, most people will say “six;” three for the word foo and three for the word goo. However, if one would actually check how many bytes the file foo.txt contains, one might find one of several answers, none of which is “six.”
Let’s see what happens when on a Linux machine we create the file (not using Notepad, because that is not a Linux text editor) and we ask how many bytes the resultant file has. One of the Linux commands for doing this is wc (for word count):
>wc foo.txt 2 2 8 foo.txt
wc tells us that the file has two lines (the first “2”), two words (the second “2”) and eight (“8”) characters.
Another way is to check the byte count with the ls command:
>ls –l foo.txt -rw-rw-r--. 1 userid userid 8 Nov 20 14:22 foo.txt
Notice the ‘8’ indicating the file size in bytes.
To see each of the bytes in the file, we use the od (octal dump) command:
>od –c foo.txt f o o \n g o o \n
Sure enough, the file has eight one-byte characters in it. Six to form the words foo and goo and two so-called new-line characters (\n). The effect of these new-lines, of course, is that when you pull up the file in a text editor, both foo and goo are on their own lines.
If we try this on Windows, however, we get a different result (we once again read the files with the od command in Linux AFTER we have created the file in Notepad):
>od –c foo.txt f o o \r \n g o o
We once again have eight bytes, but this time foo and goo are separated by a return character (\r) and a new-line character (\n), while there is no new-line after goo.
If you find this confusing, look at the size of the file after we type the exact foo and goo text in Ms Word and store it in a file foo.docx:
>ls –l foo.doxc -rwxr-xr-x. 1 userid userid 11561 Nov 20 14:46 foo.docx
Holy, moly!! This time we end up with 11,561 bytes!
So what is going on here? Programs such as Microsoft’s Word and Apple’s TextEdit are word processing programs. Their job is two-fold: store text and format that text in any way the user specifies. This formatting can take lots of forms, from changing font type and font size, to line indentations, inter-line spacing, etc. Since all this formatting information must be stored with the text, word-processed texts typically have far more many bytes that text alone. Hence the 11,000+ bytes for the simple foo/goo Word file.
If, on the other hand, we use a text editor (not a word processor), such as Notepad, Notepad++ , TextMate, Sublime, nano, emacs, vi, bluefish or one of a host of other ones, all we get out is plain, unformatted text. If in those texts we want some indentation at the beginning of a line, we must type spaces and tabs, and if we want a new line, we must type the newline character (Enter key) and those are stored in the file, but nothing else.
Whereas this distinction between word processors and text editors explains the size and content differences between the files generated by them, it does not yet explain why the foo/goo text files generated with a text editor in Linux and Windows are different (same size, different content). That difference is explained by Linux and Windows following different conventions for storing plain text files.
Which Text Editor to Use?
In this text, you will only(!) use a text editor. Coders who write program code, write their code using text editors, not word processors. As explained above, word processors are used to make text look good for humans to read. Program code, however, needs no formatting other than distributing it over multiple lines and adding some indentation; all of which can be easily accomplished with the enter, space, and tab keys.
This does, however, not mean that coders are impartial about which text editor to use. Most coders are very attached to and enamored with their favorite text editor and will stick to it unless something much better appears.
We, as authors of this text have no axe to grind as to what text editor you use. However, we want you to not(!) use a word processor, as this will likely cause all different sorts of problems when you try to run the exercises. Unfortunately, recognizing whether or not you have a word processor or a text editor is not always easy, especially in confusing cases such as Apple’s TextEdit, which is not(!) a text editor but a word processor, so better not use it because it will not allow you to store plain text files.
The following table should help you find a few suitable (free and open source) text editors. Once again, as authors we really do not care which one you use. Just try one. If you like it, stick with it. If you do not like it, grab another one (Just do not spend all your time comparing specs between these, only to find out at the end that you have no time left to actually use them).
- In these examples, characters are represented with single bytes; i.e., one byte per character. However, in a character representation system such as UTF-16, characters are represented by two bytes each. Hence, this would double the counts used here. ↵
- Files with unformatted text are known as ‘plain text’ files. ↵
- Many text editors can be set to automatically adjust text layout associated with a specific programming language; e.g., C, C++, HTML, Python, etc. In these cases, the text editor automatically includes spaces, new-lines, tabs and even parentheses and curly braces, each of which will, of course, become part of the text. ↵