Plain English Programming

Having programmed for many years in many languages, I often find myself thinking in English pseudo-code, then I translate my thoughts into whatever artificial syntax I’m working with at the time. So one day I thought, “Why not simply code at a natural language level and skip the translation step?” My elder son (also a programmer) and I talked it over, and we decided to test the theory. Specifically, we wanted to know:

1. Is it easier to program when you don’t have to translate your natural-language thoughts into an alternate syntax?

2. Can natural languages be parsed in a relatively “sloppy” manner (as humans apparently parse them) and still provide a stable enough environment for productive programming?

3. Can low-level programs (like compilers) be conveniently and efficiently written in high level languages (like English)?

And so we set about developing a Plain English compiler (in Plain English) in the interest of answering those questions. And we are happy to report that we can now answer each of those three questions, from direct experience, with a resounding, “Yes!”

The Theory

Our parser operates, we believe, something like the parsing centers in the human brain. Consider, for example, a father saying to his baby son…

“Want to suck on this bottle, little guy?”

…and the kid hears…

“blah, blah, SUCK, blah, blah, BOTTLE, blah, blah.”

…but he properly responds because he’s got a “picture” of a bottle in the right side of his head connected to the word “bottle” on the left side, and a pre-existing “skill” near the back of his neck connected to the term “suck.” In other words, the kid matches what he can with the pictures (types) and skills (routines) he’s accumulated, and simply disregards the rest. Our compiler does very much the same thing, with new pictures (types) and skills (routines) being defined — not by us, but — by the programmer, as he writes new application code.

The Practice

A typical type definition looks like this:

A polygon is a thing with some vertices.

Internally, the name “polygon” is now associated with a dynamically-allocated structure that contains a doubly-linked list of vertices. “Vertex” is defined elsewhere (before or after this definition) in a similar fashion; the plural is automatically understood.

A typical routine looks like this:

To append an x coord and a y coord to a polygon:
Create a vertex given the x and the y.
Append the vertex to the polygon’s vertices.

Note that formal names (proper nouns) are not required for parameters and variables. This, we believe, is a major insight. A real-world chair or table is never (in normal conversation) called “c” or “myTable” — we refer to such things simply as “the chair” or “the table”. Likewise here: “the vertex” and “the polygon” are the most natural names for these variables.

Note also that spaces are allowed in routine and variable names (like “x coord”). It’s surprising that all languages don’t support this feature; this is the 21st century, after all. Note also that “nicknames” are also allowed (such as “x” for “x coord”). And that possessives (“polygon’s vertices”) are used in a very natural way to reference fields within records.

Note, as well, that the word “given” could have been “using” or “with” or any other equivalent since our sloppy parsing focuses on the pictures (types) and skills (routines) needed for understanding, and ignores, as much as possible, the rest.

Like a Math Book

At the lowest level, things look like this:

To add a number to another number:
Intel $8B85080000008B008B9D0C0000000103.

Note that in this case we have both the highest and lowest of languages — English and machine code (in hexadecimal) — in a single sentence. The insight here is that a program should be written primarily in a natural language, with snippets of code in more appropriate syntax as (and only as) required. Like a typical math book: mostly natural language with formula snippets interspersed.

We hope someday the technology will be extended, at the high end, to include Plain Spanish, and Plain French, and Plain German, etc; and at the the low end to include “snippet parsers” for the most useful, domain-specific languages. Español Llano, thanks to our helper Pablo in Argentina, is now up and running.

An Objection Answered

Now perhaps you’re thinking natural language programming is a silly idea. But have you considered the fact that most of the code in most programs does simple stuff like “move this over there” and “show that on the screen” — things that can be most conveniently and most naturally expressed in a natural language? Let’s consider an example we can examine in detail:

Our compiler — a sophisticated Plain-English-to-Executable-Machine-Code translator — has 3,050 imperative sentences in it.

1,306 of those (about 42%) are conditional statements, and at least half of those are trivial things like these:

If the item is not found, break.
If the compiler’s abort flag is set, exit.

The remainder of those conditional statements are slightly more complex, but all of them fit on a single line (with our font, in our editor). Here are a couple of the longer ones:

If the length is 4, attach $FF32 to the fragment’s code; exit.
If the rider’s token is any numeric literal, compile the literal given the rider; exit.

Of the remaining sentences:

272 (about 9%) are simple assignment statements:

Put the type name into the field’s type name.

202 (about 7%) are just the infrastructure for various loops:

Loop.
Get a field from the type’s fields.
[ other stuff here]
Repeat.

183 (6%) simply add something to the end of this or that list, like so:

Add the field to the type’s fields.

164 (about 5%) are trivial statements used to return boolean results, start and stop various timers, show the program’s current status, and write interesting things to the compiler’s output listing.

Say no.
Say yes.
Set the variable’s compiled flag.
Start the compiler’s timer.
Stop the compiler’s timer.
Show status “Compiling…”.
List the globals in the compiler’s listing.

119 (about 4%) advance the focus in the source code, sentences like:

Bump the rider.
Move the rider (code rules).

92 (about 3%) are used to create, destroy and keep internal indexes up to date, sentences like:

Create the type index using 7919 for the bucket count.
Index the type given the type’s name.
Destroy the type index.

58 (about 2%) are used to find things in various lists:

Find a variable given the name.

37 (about 1%) are calls to various conversion routines:

Convert the rider’s token to a ratio.

31 (about 1%) are used to generate actual machine code (plus those that appear in conditional statements, as above):

Attach $E8 and the address to the fragment.

And that accounts for 80% of the code in our compiler.

Only 57 of the remaining sentences (less than 2% of the whole) are mathematical in nature, a line here and there like these:

Add 4 to the routine’s parameter size.
Subtract the length from the local’s offset.
Multiply the type’s scale by the base type’s scale.
Calculate the length of the field’s type.
Round the address up to the nearest multiple of 4096.

And the rest are not formulaic at all. Stuff like:

Copy the field into another field.
Append the fragment to the current routine’s fragments.
Abort with “I was hoping for a definition but all I found was ‘” then the token.
Initialize the compiler.
Remove any trailing backslashes from the path name.
Reduce the monikette’s type to a type for utility use.
Eliminate duplicate nicknames from the type’s fields.
Prepend “original ” to the term’s name.
Extend the name with the rider’s token.
Unquote the other string.
Read the source file’s path into the source file’s buffer.
Generate the literal’s name.
Extract a file name from the compiler’s abort path.
Write the compiler’s exe to the compiler’s exe path.
Swap the monikettes with the other monikettes.
Skip any leading noise in the substring.
Scrub the utility index.
Fill the compiler’s exe with the null byte given the compiler’s exe size.
Position the rider’s token on the rider’s source.
Pluralize the type’s plural name.
Link.
Finalize the compiler.
Check for invalid optional info on the type.

And that’s why we say that most of what most programs do is easy stuff, stuff that can be conveniently expressed in a natural language. And that, in turn, is why we like programming in Plain English: the thoughts in our heads are typed in as Plain English “pseudo code” and, with a tweak here and there, that pseudo code actually compiles and runs. And is self-documenting, to boot.

Another Objection Answered

You may be thinking that natural language is just too verbose for programming. But is it really that bad? Let’s consider a couple of examples. In a traditional programming langauge, we might draw a box using a statement like this:

substring.draw ( box, color, source.text.font, source.text.alignment ) ;

Which is 10 words and 11 punctuation marks: 21 total elements.

The Plain English equivalent would be:

Draw the substring in the box with the color and the source’s text’s font and alignment.

Which is 16 words and 3 punctuation marks: 19 total elements.

Admittedly, the Plain English version requires a few more easy-to-type alphabetic characters (it’s difficult to say exactly how many since traditional coders put spaces in different places); but that’s a small price to pay for not having to learn (or think in) an artificial syntax.

Here’s another example:

if ( ! source.colorized ( ) ) color = black ;

Which is 5 words and 8 punctuation marks: 13 total elements.

Compared with the Plain English:

If the source is not colorized, put black into the color.

Which is 11 words and 2 punctuation marks: 13 total elements.

Again, it’s mostly a matter of whether you like to type words or (specialized) punctuation. And whether you like to think in two different syntactical and grammatical forms simultaneously. And whether you want your code to be self-documenting. And whether you want code that’s friendly for beginners. And whether you want to code in a language (like English) that will still be in common use 100 years from now. Personally, we think you may have lost some human perspective if you’ve come to think that “(!source.colorized())” is a good way of saying anything!

The Prototype

If you’re interested, you can download the whole shebang here:

www.osmosian.com/cal-4700.zip

It’s a small Windows program, less than a megabyte in size. But it’s a complete development environment, including a unique interface, a simplified file manager, an elegant text editor, a handy hexadecimal dumper, a native-code-generating compiler/linker, and even a wysiwyg page layout facility (that we used to produce the documentation). It is written entirely in Plain English. The source code (about 25,000 sentences) is included in the download. No installation is necessary; just unzip. Start with the “instructions.pdf” in the “documentation” directory and before you go ten pages you won’t just be writing “Hello, World!” to the screen, you’ll be re-compiling the entire thing in itself (in less than three seconds on a bottom-of-the-line machine from Walmart).

Thanks for your time and interest.


Gerry Rzeppa
Grand Negus of the Osmosian Order of Plain English Programmers


Dan Rzeppa
Prime Assembler of the Osmosian Order of Plain English Programmers

13 thoughts on “Plain English Programming

  1. I am in general agree with your ideas. At least there something tempting to have that technology which could translate plain English to instructions for computer. Given the obvious questions regarding practicality I have following questions:
    1. Do you attempt to convert existing code to plain English in automated fashion? I imagine that for C code that could give very interesting results.
    2. How complex data types like struct could be defined? I do not see examples for that. Data structures could be loosely named as “object” or “abstractions” from real world which has some properties.
    3. Next point would be related to classes. How to effectively model them as combination of aforementioned “objects” and declared operations on them.
    4. Once we declare operations on objects, given the loose nature of Plain English there should be polymorphic calls defined somehow.

    I have even more questions, but they are more technicalities right now.

    Like

    1. Hello, Andrey!

      Thanks for writing.

      1. Do we attempt to convert existing code to plain English in automated fashion?

      No; we simply “code” in English (or Spanish; we have a Spanish version now as well).

      Your other questions will be answered, and will be easier for me to elaborate on, if you will first:

      (a) Download our system and peruse the “instructions.pdf” in the documentation folder; then
      (b) Write me directly (gerry.rzeppa@pobox.com) and we’ll discuss!

      The download is less than a megabyte, and no installation is required. The instructions are in large type with wide margins.

      Thanks again,

      Gerry

      Liked by 1 person

  2. I have just stumbled on this today and, after starting to go through the tutorial in the documentation, am very fascinated by it.
    What was your intention with this language? Do you intend it as a teaching tool for children, or do you see some other applications for it? What was your inspiration for it?

    Like

    1. Our intention (and inspiration) was to make programming simple and fun like it was in the Old Days — when a programming product was fully described in a 100-page manual that explained the development environment, the language itself, and all of the built-in types, variables and routines one would need to create whatever one was dreaming of creating. Kind of a 32-bit version of Apple BASIC, or Microsoft’s QuickPascal for DOS.

      Regarding our intended audience, we think the same interface and language is suitable for ‘kids of all ages” — both novices and experts alike. It’s a system for kids, yes, but it isn’t just a system for kids. We used this interface and language ourselves to conveniently and efficiently create the whole shebang. So when a student is ready to dig deeper, he can simply dig. He doesn’t have to invest in a new shovel (language) or find another plot of land (development environment). It’s all in one place, from “Hello, World!” down to the machine code.

      Please address future communications to me directly (gerry.rzeppa@pobox.com) since I check that address more regularly. Thanks!

      Like

  3. Thanks. It really is a cool project, I showed it to my students. I’m also doing NLP and ultimately, want to program using English. Is there a reference documentation BTW? I’m not sure about how the grammar works precisely.

    Like

  4. The entire system is here:

    http://www.osmosian.com/cal-4700.zip

    The download is less than a megabyte; no installation necessary. The instructions are in the “documentation” directory in both PDF and native formats. The source code for the whole system is in the six files described, in brief, on page 4 of the manual, and in more detail on pages 5 thru 10. A single-page summary of the language syntax is on page 11, and a more technical description is in the ebnf file in the same directory.

    Don’t hesitate to write me directly with questions and comments: gerry.rzeppa@pobox.com

    Like

      1. It’s a false positive. Please contact your antivirus vendor and let them know.

        Our executable file includes only five of the many sections that less efficient compilers generate so a few (sloppy) virus-detection programs mistake it for something malicious. There are other innocuous differences, as well (our source files, for example, have no extension — like “.txt” or “.cpp” — on them). The code is safe. Check it with virustotal.com, for example, where only four of 89 virus-detection programs make this “false-positive” mistake (last time I checked).

        Like

      2. VirusTotal again reports only 4 programs which give it a false positive. A year ago, it was 5. I can add one: Microsoft Defender. Defender is vicious, giving little opportunity to mark a program as safe and attacking newly-compiled programs even if they’re identical to others marked safe. I wrote up my experiences as a comment on VirusTotal, along with a workaround which seems to work for simple programs and a theory on how it got this way.

        I’m reproducing my comment here as VirusTotal mangled the quote marks:

        This is a compiler for “plain English programming.” Anything and everything this compiler produces may be flagged as a severe threat, though it may not be if you swap the order of tests for different events. For instance, I have this in my code; (‘\’ starts a comment):

        If the event’s kind is “key down”, check if it is quitting time given the event’s key.
        If the event’s kind is “set cursor”, show the arrow cursor. \ Keep set cursor before refresh to escape stupid malware “recognition”.

        If I exchange the order of those two lines, Microsoft Defender flags the newly-compiled program as a severe threat. After the first “detection”, it’s likely to destroy the newly-created program before I can mark it as not a threat.

        The compiler is built into the IDE which is more complex. I don’t know what to change. I evidently managed to mark my personal fork of the IDE as not a threat years ago, but now I want to develop a cross-compiler. I forgot about how severe Microsoft Defender’s actions are, (it’s been a couple of years,) and now it prevents me from compiling my cross-compiler. It prevents the original compiler from writing the file. I haven’t begun to work on the cross-compiler yet, so what’s happening is Defender is attacking a program which is identical to one marked safe, just freshly created with a different name.

        I can imagine how this situation arose. It’s similar to a friend’s account of his rebellious teen years in the 90s. As a wannabe hacker, he would go onto certain websites, download “modules”, and use them to gain access to Linux servers. The idea was to get in and keep others out. He said “I didn’t know what I was doing,” meaning he didn’t know the technical details, and this must be true because even 20 years later he was still not capable of being the programmer he wanted to be. In this account, we see criminals using unskilled kids to do their dirty work for them. I think some criminal group, possibly called Wacatac, has tried to use this compiler to enable similarly unskilled kids to write malware. In turn, malware-detection programs now key on entirely innocent parts of the compiler’s output.

        Like

Leave a comment