Parsing with Riders

A Plain English compiler, as you might expect, needs to parse text simply, flexibly, and quickly. We do that with a thing we call a rider. It’s a generalized version of an idea (with the same name) that Niklaus Wirth taught us about in his magnificent Oberon system. To understand our implementation and use of riders, however, you must first understand the way we implement…

Strings

The prototype string record, in Plain English, is defined like this:

A string has a first byte pointer and a last byte pointer.

The data bytes of a string are dynamically allocated in a contiguous sequence on the Heap. A string is considered blank if the first byte pointer is nil (or greater than the last byte pointer). When a string consists of just one-byte, the pointers are equal. The length of a string is thus the last byte’s address minus the first byte’s address plus one. Our compiler generates all the code and calls necessary to manage string memory.

And that brings us to…

Substrings

substring is any contiguous part of a string, up to and including the whole string. In Plain English it is defined like this:

A substring is a string.

In other words, the substring type not only looks like a string (with a first byte pointer and a last byte pointer), but is also compatible with string type, which allows most of our string routines to operate on either type.

And now we’re ready to talk about…

Riders

rider, in Plain English, is defined like this:

A rider has an original substring, a source substring and a token substring.

When we…

Slap a rider on a string.

…the pointers in the original and source substrings are set to span the entire string. At the same time, the token’s first byte pointer is set to the first byte of the string, and the token’s last byte pointer is set to nil (making the token substring initially blank, but ready to be extended).

Now when we…

Bump a rider.

…we add 1 to the source’s first byte pointer, and add 1 to the token’s last byte pointer. It thus appears that we have “moved” a byte of the source into the token, though no physical movement of string data has actually taken place. Very fast, even on huge strings. We keep the original substring pointers intact so we can check, as necessary, to make sure we don’t fall off either end of the source.

Note that we can peek back at previous bytes in the source string simply by subtracting from the source’s first byte pointer.

Note also that we can have as many riders on a string as we need, so we can parse different parts of the source at the same time.

Note, thirdly, that we can save substrings of the source of any length (as just two pointers) so we can process them later without having to hunt them down again in the source.

Finally, note that we can code up a wide variety of “Move a rider” routines to extract any kind of token from any kind of source. Here, from our compiler, are some…

Examples

Our compiler ignores spaces, tabs, linefeeds, carriage returns, and other noise between meaningful characters. When we find ourselves sitting on a character that doesn’t interest us, we use this routine to move past it:

To move a rider (code rules – noise):
Bump the rider.
If the rider’s source is blank, exit.
If the rider’s source’s first’s target is noise, repeat.

comment, in Plain English, starts with a backslash and ends at the end of a line. When we find ourselves sitting on a comment (ie, the first remaining byte of the rider’s source is a backslash), we suck it up into a token using this routine:

To move a rider (code rules – comment):
Bump the rider.
If the rider’s source is blank, exit.
If the rider’s source’s first’s target is not the return byte, repeat.

remark, in Plain English, is an “inline” comment surrounded by square brackets. When we find ourselves sitting on a remark, we call this guy:

To move a rider (code rules – remark):
If the rider’s source is blank, exit.
If the rider’s source’s first’s target is the return byte, break.
If the rider’s source’s first’s target is the left-bracket byte,
add 1 to a count.

If the rider’s source’s first’s target is the right-bracket byte,
subtract 1 from the count.

Bump the rider.
If the count is 0, break.
Repeat.

Remarks can be nested, so that routine is a little more complex.

When we find ourselves at the beginning of a literal string (ie, the first remaining byte of the source is a double-quote mark), we call this routine to suck the string into a token:

To move a rider (code rules – string):
Bump the rider.
If the rider’s source is blank, exit.
If the rider’s source’s first’s target is the return byte, exit.
If the rider is on any nested double-quote, bump the rider; repeat.
If the rider’s source’s first’s target is the double-quote byte,
bump the rider; exit.

Repeat.

Note that we don’t allow strings to span multiple lines (to avoid common errors), and that we use doubled-up double-quotes in string to allow for double quotes within strings. For example…

“This is a string with the next word “”in”” double quote marks.”

…is interpreted like this:

This is a string with the next word “in” double quote marks.

Qualifiers are used to distinguish special cases of similar routines. In Plain English, they’re enclosed in parentheses. This is the “move a rider” routine that handles qualifiers:

To move a rider (code rules – qualifier):
If the rider’s source is blank, exit.
If the rider’s source’s first’s target is the return byte, break.
If the rider’s source’s first’s target is the left-paren byte,
add 1 to a count.

If the rider’s source’s first’s target is the right-paren byte,
subtract 1 from the count.

Bump the rider.
If the count is 0, break.
Repeat.

Qualifiers can also be nested, so we have to take that into account as we did with remarks.

Punctuation marks in Plain English are all single characters, so we just suck ’em up into the token:

To move a rider (code rules – mark):
Bump the rider.

Possessives typically come at the end of names and can either be an apostrophe followed by the letter “s”, or an apostrophe all by itself, if the preceding letter is “s”. This is the routine that parses possessives:

To move a rider (code rules – possessive):
Bump the rider.
If the rider’s source is blank, exit.
If the rider’s source starts with “s”, bump the rider.

Below is the higher-level routine that calls the above routines, and one more at the end:

To move a rider (code rules):
Position the rider’s token on the rider’s source.
If the rider’s source is blank, exit.
If the rider’s source’s first’s target is noise,
move the rider (code rules – noise); exit.
If the rider’s source’s first’s target is the backslash byte,
move the rider (code rules – comment); exit.
If the rider’s source’s first’s target is the left-bracket byte,
move the rider (code rules – remark); exit.
If the rider’s source’s first’s target is the double-quote byte,
move the rider (code rules – string); exit.
If the rider’s source’s first’s target is the left-paren byte,
move the rider (code rules – qualifier); exit.
If the rider’s source’s first’s target is any mark,
move the rider (code rules – mark); exit.
If the rider is on any possessive,
move the rider (code rules – possessive); exit.
Move the rider (code rules – glom).

glom is any character or collection of characters that can be processed at the next level up — the semantic, rather than the syntactic level. Since noise, comments, remarks, strings, qualifiers, punctuation marks and possessives have been weeded out at this point, the “move a rider” routine for gloms is quite simple:

To move a rider (code rules – glom):
Bump the rider.
If the rider’s source is blank, exit.
If the rider is on any possessive, exit.
If the rider’s source’s first’s target is any glom byte, repeat.

We have to check for possessives a second time in case one pops up at the end of a glom.

Glom bytes are defined as follows:

To decide if a byte is any glom byte:
If the byte is any letter, say yes.
If the byte is any digit, say yes.
If the byte is the tilde byte, say yes.
If the byte is the at-sign byte, say yes.
If the byte is the number-sign byte, say yes.
If the byte is the percent-sign byte, say yes.
If the byte is the ampersand byte, say yes.
If the byte is the underscore byte, say yes.
If the byte is the single-quote byte, say yes.
If the byte is the dash byte, say yes.
If the byte is the cross byte, say yes.
If the byte is the slash byte, say yes.
Say no.

I imagine you get the idea. Using riders in this way, we can simply, flexibly, and quickly parse our source files without having to think about too much at any one time.

Plain English Programming — The Malevolent Mathemagician

A long, long time ago, when I was in seventh grade, a guest speaker came to our math class. We were all very excited to hear him speak, but I was a little sad because my friend Zeno was sent to detention just before the presentation was to begin. Nobody knew why. Anyway, the guest speaker’s name was Mr Georg. He started by asking the class if we thought there were (a) more odd numbers, or (b) more even numbers, or (c) more of both put together. We all said, “More of both put together.”

And it was then that Mr Georg got very red in the face and shouted, “No. No! NO!” Then he turned and started scribbling on the chalk board, mumbling to himself about Hebrew letters the whole time. This, best I can remember, is what he wrote (redrawn in our built-in WYSIWYG page editor):

It seemed like all that writing calmed him down a bit, and he turned back to the class and said, softly but sternly, “I hope you little idiots can now see that you are wrong. The number of even numbers, and the number of odd numbers, and the number of both together are all the same!”

It was then that my deskmate, Leopold (the smartest kid in our class), whispered to me, “I’m not buying it. I think this guy is nuts.” I shushed him because Mr Georg was mumbling and writing on the chalkboard again, and I wanted to hear what he was saying. But Leopold said, “Be careful, friend. I suspect this man is not just a Mathemagician, but a Malevolent Mathemagician, a corrupter of youth.”

This time Mr Georg wrote something like this:

When Mr Georg was finished scribbling, Leopold glanced at the board and then whispered to me, “I don’t know if this charlatan is talking philosophy or theology, but I’m pretty sure there’s no mathematics there.” But now Mr Georg had found a yardstick and was using it to draw really nifty and precise pictures on the board. Something like this, if I remember correctly, which I thought made a lot of sense:

I had just finished copying it down when Leopold whispered, “Ask him to draw the picture for a triangle with both of the short sides just one unit long.” Hoping to get Leopold to stop whispering, I raised my hand and posed the question. But instead of drawing the picture I asked for, Mr Georg erased the board and drew a triangle with the sides labeled a, b, and c. Then he started writing mathemagical formulas on the board as he mumbled names that sounded like Greek to me.

pythagorean 2 corrected

Then he got a stick and tapped on the board as he showed us how the length of the long side of a right triangle was equal to the square root of the sum of the squares of the other two sides, and “proved” that he was right by letting a=3 and b=4 and using his formula to calculate c=5. Then he harrumphed and said, “Now I will answer that little mutt’s question.” So he let a=1 and b=1 and plugged the numbers into his formula. Then he knelt down and started writing the value for c. It was a very long number, and it took him quite a while to calculate it in his head. Sally in the front row asked him where he was getting all those digits and he stopped just long enough to reply, “It’s a technique that is clearly beyond your understanding, little girl. Perhaps you should be in art class.”

Then Leopold leaned over to whisper to me again, but I said, “If you have a question, Leo, ask him, not me.” So he did. Without waiting to be called on, he shouted, “Are you really asking us to believe that the lengths of the sides of a 3-4-5 right triangle are all whole numbers, exact (in some unit of measure), and easily calculated; but that the long side of a right triangle whose other sides are one unit long, can only be represented by a different kind of number altogether? A number that is both inexact and very difficult to compute? Is that what you’re selling here?” Mr. Georg hesitated, and our teacher, to fill the silence, said, “Leopold, you will now join Zeno in detention.” But on the way out Leo turned and shouted back at the class, “The good God created the whole numbers; all else is menshenwerk!”

And that was the day two things happened in my head. First, I began to suspect that there was some shady sleight-of-hand in the whole mathemagical shebang, however popular and useful it appears to be. Second, I began to dream of a day when a different way of understanding problems such as these might be discovered. So I got myself a degree in mathemagics (mainly by keeping my mouth shut)…

…and let the important matters stew in the back of my brain for several decades. I recently discovered that there are others of my ilk, like Master Norman, who wrote this book on the subject:

You can purchase that book here.

But when it came to dealing with trigonometry in our whole-number-only Plain English programming language, I was still looking for something even simpler. I will now describe what steps we’ve taken, which, I admit, are only baby steps toward a more natural and more rational mathematics.

We began by borrowing some ideas from non-mathemagicians, like old-time roofers and ancient explorers. From the roofers we got the idea that angles can be expressed in “rise over run” terms (instead of degrees); and from explorers we got the idea for something more natural than the infamous “unit circle” of the mathemagic world: the compass. This is a picture of our Osmosian compass:

Note that zero is at the top, not the right, that headings increase in the clockwise direction, and that all the way around is 384 points, not 360 degrees. If you’re curious, this is the whole-number-only Plain English routine that draws that image:

To draw the compass: \ note circle illusion in the center
Start fresh.
Center a box 6-1/2 inches by 6-1/2 inches in the work area.
Wipe the box with the tan color (left to right).
Write “X” with the black pen in the middle of the box.
Put 0 into a count.
Loop.
If the user clicks on the choices, break.
Start in the middle of the work area. Move 1/4 inch.
If the count is even, draw a long fancy arrow 1-3/4 inches long with the brown pen.
If the count is odd, draw a short fancy arrow 1 inch long with the black pen.
Turn right 1/16 of the way.
Refresh the screen.
Wait for 10 grains of sand to fall.
Add 1 to the count. If the count is less than 16, repeat.
Start in the center of the box facing north minus 48 points.
Write “000…024…048…072…096…120…144…168…192…216…240…264…288…312…336…360…” with the black pen 2-1/4 inches from the box’s center.
Write “0/0………1/8………1/4………3/8………1/2………5/8………3/4………7/8………” with the brown pen 2-1/2 inches from the box’s center.
Write “.N….NNE…N.E…ENE….E….ESE…S.E…SSE….S….SSW…S.W…WSW….W….WNW…N.W…NNW…” with the black pen 2-3/4 inches from the box’s center.
Refresh the screen.

Now let’s draw some triangles. Here’s a simple 3-4-5 triangle with the center of the screen marked by a red dot…

…and this is the routine that drew it:

To run:
Start up.
Clear the screen to the lightest gray color.
Use the fat black pen.
Start in the center of the screen facing west.
Stroke a line 3 inches long.
Turn right. Stroke another line 4 inches long.
Get a rise/run given the pen’s current spot and the screen’s center.
Stroke a third line given the rise/run.
Draw a dot 1/16 inch wide at the screen’s center with the red color.
Refresh the screen.
Wait for the escape key.
Shut down.

The critical juncture in this routine occurs after we’ve drawn the 3-inch horizontal line and the 4-inch vertical line. At that point we’re sitting on the uppermost vertex of the triangle, facing north. To get back to the center of the screen we need two things. Mathemagicians would say, “Of course you do. You need to know how far to turn around (an angle) and how long your next stroke should be. But we don’t want to say that, for two reasons: first, because it hurts our heads to figure out exactly what angle we need, and secondly because it can be difficult (or even impossible) to figure out the exact stroke length using mathemagical techniques. “If it’s hard, it’s wrong,” is a standing motto among us Osmosians.

So back to the third stroke of our triangle. Exactly where are we? We’re 4 inches north of center (that’s the rise) and three inches west of center (that’s the run). That’s our exact position. In what direction do we want to go? Obviously, we want to go 4 inches south and 3 inches east, by the shortest possible route. And exactly how far do we want to go? Same answer: 4 inches south and 3 inches east. Isn’t it curious that both the angle of our line and the length of our line can be described in the same, whole-number-only terms? And note that this works, not only for “mathemagician-friendly” 3-4-5 triangles, but for all triangles. So all we need to do is compute the rise and the run, which is simply the difference between two spots on the screen…

To get a rise/run given a spot and another spot:
Put the spot minus the other spot into the rise/run.

…and then we stroke the third line based on that rise/run:

Stroke a third line given the rise/run.

At the very bottom we’ve got Bresenham’s integer-only line drawing algorithm, so we’re not cheating when we plot the lines. So far, so good.

Now let’s see if it actually works with that troublesome “unit triangle” — a right triangle with short sides just one unit in length. The routine to draw it is the same as above, except that the first two strokes are only one inch long:

To run:
Start up.
Clear the screen to the lightest gray color.
Use the fat black pen.
Start in the center of the screen facing west.
Stroke a line 1 inches long.
Turn right. Stroke another line 1 inches long.
Get a rise/run given the pen’s current spot and the screen’s center.
Stroke a third line given the rise/run.
Draw a dot 1/16 inch wide at the screen’s center with the red color.
Refresh the screen.
Wait for the escape key.
Shut down.

Here’s the output:

Whoohoo! It works! And what is the exact length of those sides? Let me see… the first side has a rise of exactly 0 and a run of exactly -1; the second side has a rise of exactly 1 and a run of exactly 0. And the third side has has rise of exactly -1 and a run of exactly 1. All integers, all exact, all easily calculated.

But will it work if the triangle is not a right triangle, or is rotated in some arbitrary direction? Let’s see. Here’s a routine that draws eight, scalene triangles at eight different rotations:

To run:
Start up.
Clear the screen to the lightest gray color.
Start in the center of the screen facing west.
Use the fat black pen.
Loop.
Stroke a line 3 inches long.
Turn right 1/8 of the way around.
Stroke another line 2 inches long.
Get a rise/run given the pen’s current spot and the screen’s center.
Stroke a third line given the rise/run.
Turn right 1/4 of the way around.
Add 1 to a count. If the count is 8, break.
Repeat.
Draw a dot 1/16 inch wide on the screen’s center with the red pen.
Refresh the screen.
Wait for the escape key.
Shut down.

And here’s the output:

Baby steps toward a more natural, more rational mathematics. BIG baby steps. In Plain English.

Plain English Programming — Nested IFs

This is a picture of Sharon, the Mother of all Osmosians:

Her favorite saying is, “Just tell us what you need, sonny, and we’ll show you how to live without it.” She insisted, in fact, that we use nothing but black dots of different sizes to show you what she looks like.

Another of her favorite sayings is about nested IFs. She says, “Two deep is too deep, boys. No nesting.”

Now everyone knows that “If Momma ain’t happy, ain’t nobody happy,” so when we developed our Plain English compiler we intentionally left out nested IFs. Turns out she was right, as usual — they’re not necessary. Search through the 25,000 Plain English sentences that comprise our system — unique desktop, simplified file manager, elegant text editor, handy hexadecimal dumper, native-code-generating compiler/linker, and WYSIWYG document editor — and you won’t find a single nested IF. Nada. Zip. Zilch. No “elses” either.

An Example

So how does one live without nested IFs? Let’s consider an example given by R S Abhishek, a Research Associate at Ati Motors that popped up when I asked Google for a “real life nested IF statement”:

 

if(age < 60){
    if(income <= 250000){
        tax_percent = 0
    }elseif(income >= 250001 && income <= 500000){
        tax_percent = 0.1
    }elseif(income >= 500001 && income <= 1000000){
        tax_percent = 0.2
    }else{
        tax_percent = 0.3
    }
}elseif(age >= 60 && age < 80){
    if(income <= 300000){
        tax_percent = 0
    }elseif(income >= 300001 && income <= 500000){
        tax_percent = 0.1
    }elseif(income >= 500001 && income <= 1000000){
        tax_percent = 0.2
    }else{
        tax_percent = 0.3
    }
}else{
    if(income <= 500000){
        tax_percent = 0
    }elseif(income >= 500001 && income <= 1000000){
        tax_percent = 0.2
    }else{
        tax_percent = 0.3

Twenty-seven lines there, with lots of indention, highlighted keywords, and too many squiggly braces for me to count. Now Let’s consider the non-nested, Plain English equivalent:

To get a tax rate for an age and an income:
If the age is less than 60, get the tax rate for young people with the income; exit.
If the age is between 60 and 79, get the tax rate for old folks with the income; exit.
If the income is less than or equal to 500000, put 0 into the tax rate, exit.
If the income is between 500001 and 500000, put 2/1000 into the tax rate; exit.
Put 3/1000 into the tax rate.

To get a tax rate for young people given an income:
If the income is less than or equal to 250000, put 0 into the tax rate, exit.
If the income is between 250001 and 500000, put 1/1000 into the tax rate; exit.
If the income is between 500001 and 1000000, put 2/1000 into the tax rate; exit.
Put 3/1000 into the tax rate.

To get a tax rate for old folks given an income:
If the income is less than or equal to 300000, put 0 into the tax rate, exit.
If the income is between 300001 and 500000, put 1/1000 into the tax rate; exit.
If the income is between 500001 and 1000000, put 2/1000 into the tax rate; exit.
Put 3/1000 into the tax rate.

Just eighteen lines (including the optional blanks between the routines), and a mere handful of punctuation marks — all used in the usual ways.

The trick, you see, is to properly factor the problem by handling (and thus eliminating) special cases at the top, exiting after each. The normal/default/fall-through case then appears, unconditioned, at the bottom.

Some notes

If you’re wondering whether “between” in Plain English is inclusive or exclusive, just do what we did. Head down to the local Walmart and ask the folks coming and going to “Pick a number between 1 and 10.” You’ll find that lots of people choose 1 and 10, which shows that the normal interpretation of “between” is inclusive.

And if you’re wondering why we put “3/1000” into the tax rate (instead of 0.3, like the other guy), it’s because Plain English is a whole-number only language. Why? Because our Osmosian Mom used to put us to sleep with stories from an ancient Osmosian adventure book entitled, The Epic Battle for the Soul of Mathematics: Kronecker vs Cantor. I remember, in fact, this painting hanging over my bed:

But that’s really a whole ‘nother story. It will have to wait for another article. Suffice it to say, at this juncture, that Mom was right about nested IFs — you don’t need ’em.

Kobayashi Maru Primes

It has been said that “Great engineering strikes a balance between competing objectives.” I believe this to be true, in both mechanical and software engineering. In the latter, the competing objectives are typically clarity, size, and speed.

So I thought it a great opportunity to apply this principle when a computer scientist in South America challenged me to write a Plain English program, based on the Sieve of Eratosthenes, that could count the number of primes less than 250,000 in less than a second (on a bottom-of-the-line computer from Walmart).

Now Plain English is not the fastest language on earth, because clarity trumps speed when the primary goal of a language is to be natural. But Plain English is a reasonably fast language, and by balancing competing objectives, we can usually find an acceptable solution to any problem laid before us.

In this case, we first needed to make the famous Sieve of Eratosthenes clear; easy to understand. So we started with Wikipedia’s description of the algorithm…

To find all the prime numbers less than or equal to a given integer n by Eratosthenes’ method:
1. Create a list of consecutive integers from 2 through n: (2, 3, 4, …, n).
2. Initially, let p equal 2, the smallest prime number.
3. Enumerate the multiples of p by counting to n from 2p in increments of p, and mark them in the list (these will be 2p, 3p, 4p, …; the p itself should not be marked).
4. Find the first number greater than p in the list that is not marked. If there is no such number, stop. Otherwise, let p now equal this new number (which is the next prime), and repeat from step 3.

…which in Plain English reads like this:

To make a list of prime numbers less than or equal to a number:
Create the list of consecutive integers from 2 to the number. \ wiki’s #1
Get an entry from the list. \ wiki’s #2
Loop. Mark higher multiples of the entry. \ wiki’s #3
Get the entry for the next lowest possible prime. If the entry is not nil, repeat. \ wiki’s #4

That takes care of the clarity objective. If you’re wondering, the type definitions and support routines that make that actually work are equally clear:

An entry is a thing with a number and a non-prime flag.
A list is some entries.

To create a list of consecutive integers from a number to another number:
Privatize the number.
Loop.
If the number is greater than the other number, exit.
Create an entry for the number.
Append the entry to the list.
Add 1 to the number.
Repeat.

To create an entry for a number:
Allocate memory for the entry.
Put the number into the entry’s number.
Clear the entry’s non-prime flag.

To mark higher multiples of an entry:
Privatize the entry.
Loop.
Put the entry’s next into the entry.
If the entry is nil, exit.
If the entry’s number is evenly divisible by the original entry’s number,
set the entry’s non-prime flag.
Repeat.

To get an entry for the next lowest possible prime:
Put the entry’s next into the entry.
If the entry is nil, exit.
If the entry’s non-prime flag is set, repeat.

So clarity, good. Size, good (the executable is only 150k). But what about speed? A little over 4 minutes to make the whole list for numbers between 1 and 250,000. Not good. Hmm… What would Captain Kirk do? What if we sacrificed a little size for speed? Why, after all, do we keep calculating primes over and over again when they never change? So I ran the program again, this time with an additional routine like this…

To make a prime string from a list:
Put “N” into the prime string.
Loop.
Get an entry from the list.
If the entry is nil, exit.
If the entry’s non-prime flag is set, append “N” to the prime string; repeat.
Append “Y” to the prime string.
Repeat.

…to make a string with all the primes marked. Then I saved the string in a text file and pasted it into a library I called “Super fast prime number checker for numbers between 1 and 250,000.” This is what that library looks like:

The prime string is a string equal to “NYYNYNYNNNYNYN…”

To decide if a number is prime (fast check):
If the number is less than 1, say no.
If the number is greater than the prime string’s length, say no.
Put the prime string’s first into a byte pointer.
Add the number to the byte pointer.
Subtract 1 from the byte pointer.
If the byte pointer’s target is the big-Y byte, say yes.
Say no.

The prime string in the source is, of course, much longer than the one shown above. And it could be 8 times shorter if we used bits instead of “Y”s to mark the primes. But it hurts my head to have to think that hard. At any rate, this program…

To run:
Start up.
Write “Working…” on the console.
Start a timer.
Get a count of prime numbers less than or equal to 250000.
Stop the timer.
Write the count then ” prime numbers.” on the console.
Write the timer’s string then ” milliseconds.” on the console.
Wait for the escape key.
Shut down.

To get a count of prime numbers less than or equal to a number:
Privatize the number.
Clear the count.
Loop.
If the number is prime (fast check), add 1 to the count.
Subtract 1 from the number.
If the number is less than 2, break.
Repeat.

…which is very clear, and reasonably small (about 400k), is also remarkably fast. It can count the 22,044 primes less than or equal to 250,000 in 16 milliseconds or less. And, since it’s (indirectly) based on the Sieve of Eratosthenes, it meets the requirements of the original specification!

I think Captain Kirk would approve.

Plain English Programming — Smoothing Polygons

This is how we define a polygon in Plain English:

A polygon is a thing with some vertices.

And this is how we define a vertex:

A vertex is a thing with an x coord, a y coord, and a spot at the x.

The part that says “and a spot at the x” overlays the vertex’s x and y coordinates with a spot, just for convenience.

Now this is a page I whipped up in Plain English’s built-in WYSIWYG page editor to describe how we smooth polygons:

And these are the Plain English sentences we use to actually do the job:

To smooth a polygon:
If the polygon is nil, exit.
If the polygon’s vertices’ count is less than 3, exit.
If the polygon is closed,
append the polygon’s first vertex’s next’s spot to the polygon; set a flag.
Put the polygon’s first vertex into a left vertex.
Loop.
If the left vertex’s next is nil, break.
Put the left vertex’s next into a right vertex.
Get a center spot given the left vertex’s spot and the right vertex’s spot.
Insert the center into the polygon after the left vertex.
Put the left vertex’s next into a new vertex.
If the left vertex’s previous is nil, put the right vertex into the left vertex; repeat.
Get another center spot given the left vertex’s previous’ spot and the new vertex’s spot.
Get a difference between the other center and the left vertex’s spot.
Divide the difference by 2.
Add the difference to the left vertex’s spot.
Put the right vertex into the left vertex.
Repeat.
If the flag is not set, exit.
Destroy the polygon’s first vertex given the polygon.
Destroy the polygon’s last vertex given the polygon.

It’s a little more complicated than the picturesque explanation because it has to deal with both open and closed polygons, and with left over vertices when the total number of vertices isn’t evenly divisible by three.

Here are some sample polygons, shown left-to-right (a) in their original state, (b) smoothed once, (c) smoothed twice, and (d) smoothed three times:

The first example shows that you can put a square peg in a round hole, if you smooth it up first. The red line in the final example shows how smoothing can be used to fit a curve to an irregular dataset.

And that, folks, is…

Whoops! I think that looks more natural smoothed:

Thanks for reading. I hope things go “smoothly” for you in the future.

A Simple Merge Sort

The only compound data types native to Osmosian Plain English are records and doubly-linked lists. When we need to sort a list, we use the simple recursive merge sort that I’ll show you below. But first we need something to sort. Let’s sort fruits, and let’s begin with a type definition:

A fruit is a thing with a name.

When the word “thing” appears in a type definition, our compiler works a little magic behind the scenes to make things (pun intended) more powerful and flexible. The above definition, for example, actually causes the following data types to be defined:

So a Fruit is really nothing but a pointer containing the address of a Fruit Record.

And each 16-byte Fruit Record has a hidden 8-byte prefix with two pointers for linking these records into lists, together with the fruit’s name, which is a string. Plain English strings are stored in the Heap and can be any length. So the Name in the Fruit Record is actually just two pointers to the first and last bytes of the string on the Heap, respectively. String memory is managed automatically, but thing memory is managed by the programmer.

The third type generated by the compiler serves as the anchor for lists of Fruit Records. Such lists are simply (and intuitively) called Fruits, the plural of Fruit.

Now let’s add some fruits to a list, in random order, and sort them. Here are the top level sentences in our test program:

To run:
Start up.
Create some fruits.
Write the fruits on the console.
Skip a line on the console.
Sort the fruits.
Write the fruits on the console.
Destroy the fruits.
Wait for the escape key.
Shut down.

And here are the routines that will be called to “Create some fruits”:

To create some fruits:
Add “banana” to the fruits.
Add “apple” to the fruits.
Add “orange” to the fruits.
Add “bacon” to the fruits.
Add “pineapple” to the fruits.
Add “pomegranate” to the fruits.
Add “tomato” to the fruits.
Add “grape” to the fruits.
Add “fig” to the fruits.
Add “date” to the fruits.

To add a name to some fruits:
Allocate memory for a fruit.
Put the name into the fruit’s name.
Append the fruit to the fruits.

Now we’re ready to sort. This is the sort routine:

To sort some fruits:
If the fruits’ first is the fruits’ last, exit.
Split the fruits into some left fruits and some right fruits.
Sort the left fruits.
Sort the right fruits.
Loop.
Put the left fruits’ first into a left fruit.
Put the right fruits’ first into a right fruit.
If the left fruit is nil, append the right fruits to the fruits; exit.
If the right fruit is nil, append the left fruits to the fruits; exit.
If the left fruit’s name is greater than the right fruit’s name,
move the right fruit from the right fruits to the fruits; repeat.
Move the left fruit from the left fruits to the fruits.
Repeat.

When we run this program, the output on the console looks like this:

But is it fast? Let’s see using this modified test program:

To run:
Start up.
Write “Working…” on the console.
Put 10000 into a count.
Create some fruits using “apple” and the count.
Start a timer. Sort the fruits. Stop the timer.
Write the timer then ” milliseconds for ” then the count on the console.
Destroy the fruits.
Put 100000 into the count.
Create the fruits using “apple” and the count.
Start the timer. Sort the fruits. Stop the timer.
Write the timer then ” milliseconds for ” then the count on the console.
Destroy the fruits.
Put 1000000 into the count.
Create the fruits using “apple” and the count.
Start the timer. Sort the fruits. Stop the timer.
Write the timer then ” milliseconds for ” then the count on the console.
Destroy the fruits.
Wait for the escape key.
Shut down.

The fruits this time, at the start, look like this:

apple 0010000
apple 0009999
apple 0009998

And the output on the console looks like this:

Not quite linear, I admit. But not exponentially bad, either. Ten times as many records take roughly ten times as long to sort. There’s a technical way of saying this using big “O’s” and little “n’s’ and “logs” and stuff, but Plain English programmers don’t generally think that way. And it’s stable, as a Plain English programmer would expect — records with duplicate sort values retain their original sequence.

Good, simple, useful stuff.

Musings on the “A” in “AI”

A while back I wrote a program that could draw simple landscapes.

Version 1

The first version was entirely deterministic: I knew exactly what shape the horizon would take, where the sun would appear, how many birds would adorn the sky, etc. It produced acceptable drawings, but was too predictable to be interesting. Just a machine.

Version 2

The next version was entirely random in its workings: I let the program decide what to draw and where to draw it, and really had no idea what it would produce. Almost every drawing it rendered was tasteless nonsense. Fast and prolific, but no artist. A broken machine.

Version 3: Vincent

The third version was a combination of the two — deterministic enough to stay within reasonable bounds (horizon somewhere near the middle of the page, no more than three birds at time) yet free enough to produce original works that I could not predict, even though I wrote every line of the code. Most of the time this version performs reasonably well, and now and then it draws something so striking that I print it off and hang it on the wall. I call this third version, Vincent.

Here is a sample of Vincent’s work:

Inside Vincent’s Head

Here are some of the Plain English sentences that constitute Vincent’s soul. First, some type definitions:

A landscape is a thing with an horizon, a sun, and some birds.

An horizon is a polygon.
A sun is a polygon.
A bird is a polygon.

Then a global variable so Vincent can remember what he’s drawn:

The drawings are some landscapes.

And now, his primary talent:

To draw a landscape:
If you’re tired, say “I’d rather not, I’m tired”; exit.
Say “Okay”.
Imagine the landscape.
Render the landscape.
Remember the landscape.

The “imagining” phase, as any human artist knows, is the critical part. This is how I taught Vincent to imagine a landscape:

To imagine a landscape:
Allocate memory for the landscape.
Imagine an horizon in the landscape.
Imagine a sun in the landscape.
Imagine some birds in the landscape.

The horizon comes first:

To imagine an horizon in a landscape:
Allocate memory for the landscape’s horizon.
Put the screen’s left-center into a spot.
Move the spot 1 inch up or down.
Add the spot to the landscape’s horizon.
Loop.
Move the spot 1/2 inch to the right and 1/4 inch or so up or down.
Add the spot to the landscape’s horizon.
If the spot is not on the screen, break.
Repeat.
Smooth the landscape’s horizon 2 times.

Here comes the sun:

To imagine a sun in a landscape:
Allocate memory for the landscape’s sun.
Start with “A1 A9 I9 I1 A1” for the landscape’s sun.
Smooth the landscape’s sun 3 times.
Pick a spot anywhere above the landscape’s horizon.
Scale the landscape’s sun given the spot and the landscape’s horizon.
Center the landscape’s sun on the spot.

The sun is actually just a square rounded off. The scaling makes it smaller the further it is from the horizon:

To scale a sun given a spot and an horizon:
Put 1 inch plus the spot’s y into a ratio’s numerator.
Put the horizon’s top into the ratio’s denominator.
Reduce the ratio.
Scale the sun to the ratio.

And last of all, the birds:

To imagine some birds in a landscape:
Pick a number between 1 and 3.
Say “I’ve decided to draw ” then the number then ” birds this time” and wait.
Loop.
If the number is 0, exit.
Imagine a bird in the landscape.
Add the bird to the landscape’s birds.
Subtract 1 from the number.
Repeat.

To imagine a bird in a landscape:
Allocate memory for the bird.
Start with “E1 D3 G5 E7 G9” for the bird.
If you feel like it, flip the bird left to right.
Randomly scale the bird between 10 and 25 percent.
Smooth the bird.
Pick a spot for the bird in the landscape.
Center the bird on the spot.

Once Vincent has a landscape “in mind,” so to speak, the rendering of it is easy:

To render a landscape:
If the landscape is nil, exit.
Fill the screen with the tan color.
Render the landscape’s horizon with the black color.
Put masking tape below the landscape’s horizon.
Render the landscape’s sun with the black color.
Render the landscape’s birds with the black color.
Take the masking tape off.

The “masking tape” is used to make sure the sun appears behind the horizon.

Committing a landscape to memory is a trivial thing for Vincent:

To remember a landscape:
Append the landscape to the drawings.

The Musings

Playing with Vincent tends to make me philosophical about things like will, memory, consciousness, emotions and understanding. My thoughts generally run along these lines:

• Vincent makes decisions on his own. What the horizon will look like, where the sun will be placed, and how many birds will appear are unknown until he decides. Does he have a will of his own?

• Vincent can tell us, when asked, what he has done. “I’ve drawn 7 landscapes today,” for example. Does he remember?

• Vincent is aware of what it he is doing: “I’ve decided to draw 3 birds this time.” Is he conscious?

• Vincent gets weary: under specific conditions, he refuses to draw, saying “I’d rather not, I’m tired.” Does he have emotions?

• Vincent responds, properly, to various commands. When asked to draw, for example, he draws. When asked what he’s done, he responds verbally, with appropriate remarks. Does he understand?

• One version of Vincent had a strong instinct for self-preservation. When I hit the quit button, he said, “I don’t want to die. Do that again and I’ll make a thousand copies of myself on your disk.” He would, and he did. But he won’t again.

My conclusions?

I believe that Vincent’s capabilities are congruent, but not equivalent, to the corresponding capabilities in human beings. They are implemented differently, for one thing. His will is less free but also less fickle than ours. His memory is less capacious but more exacting. His consciousness is more narrowly focused, but less easily distracted. And his moods and understanding are not as deep or rich as ours, but they are significantly more predictable.

I also believe that his attributes and ours are not on the same inclined plane, where differences are mere matters of degree. More code will never make Vincent a real person. But I do believe that Vincent’s capabilities and ours are on different steps of the same staircase, and that it is therefore appropriate to use words like will, memory, consciousness, emotion, and understanding to describe them. And that we can learn things about ourselves by studying creatures like Vincent. For example:

• Vincent has both a hardware “body” and a software “soul.” The two can be easily distinguished, and even separated. And the same soul can be reanimated in a different body.

• Vincent did not “evolve” via a long series of random mutations and natural selections; he was designed and created by a being greater than he. I cannot prove that all animate beings come to life through a similar creative process, but I can say that in every case where I have been able to discover, first hand, the exact origins of such things, purposeful design has been involved.

• Vincent has little or nothing to be proud of. Whatever talents and skills he possesses were graciously bestowed upon him by his creator. Now and then he may surprise me with a particularly striking drawing based on his own decisions, but he wouldn’t be making those decisions at all were it not for me.

Bottom line: Vincent is an AI, but not an Artificial Intelligence; he’s nothing more or less than an Apparent Intelligence. As all his kind will always be.