One of the bright sides of computers is that they do precisely as they are told. Scripts execute in a completely predictable fashion, with no possibility of misinterpretation by the computer. Any errors, stack overflows and infinite loops are the result of the fallible humans and the error-filled code they compose. For computers, unlike humans, “I was only following orders!” is a wholly acceptable excuse.

The trouble, then, is how to communicate with them. The answer, of course, is: via programming languages. There are hundreds of different computer languages, of different syntactic “levels”. A “high level” programming language is one whose syntax is more “similar” to human languages. Examples would be Python, JavaScript and SQL. Commands use plenty of recognizable English words and often resemble sentences, that even non-programmers can comprehend.

“Lower-level” languages, on the other hand, use fewer words and are more difficult for non-experts to parse. The lowest-level language is assembly code, rarely used nowadays except as homework for computer science students. Assembly programs are virtually impossible to read and are much more verbose, as each instruction must be meticulously crafted. Here’s a script, courtesy of MadWizard.org, to calculate the length of a string.

.586
.MMX
.model flat,stdcall
option casemap:none
.code
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
LongStrLen proc Arg1
lpString equ [esp+4]
mov eax,lpString
pxor mm(0),mm(0)
pxor mm(1),mm(1)
@@:
pcmpeqb mm(0),[eax+0]
pcmpeqb mm(1),[eax+8]
packsswb mm(0),mm(1)
add eax,16
packsswb mm(0),mm(0)
movd ecx,mm(0)
test ecx,ecx
je @B
sub eax,16+1
@@:
inc eax
cmp byte ptr [eax],0
jne @B
sub eax,lpString
emms
ret 4

OPTION PROLOGUE:DEFAULT
OPTION EPILOGUE:DEFAULT

LongStrLen endp

end

…compare these 30-some lines of hell to this JavaScript functional equivalent:

const stringLength = string => return string.length();

Over time, languages have become higher level and much easier to handle for newcomers. By lowering the barriers to entry, laymen such as myself can execute a wide range of tasks without needing a Mathematics/Electrical Engineering Ph.D.

Ultimately, all programming languages are higher-level abstractions from the one form of communication that computers ‘comprehend’: machine language, which ultimately consists of long series of binary numbers: 0s and 1s. qq

According to Wikipedia, the following code calculates the n-th number of the Fibonacci sequence (with hexadecimal strings rather than binary):

8B542408 83FA0077 06B80000 0000C383
FA027706 B8010000 00C353BB 01000000
B9010000 008D0419 83FA0376 078BD989
C14AEBF1 5BC3

So how did we go from binary… to Assembly… to C… to C++… to Python? The answer is in the title: compilers!

Compilers are scripts that take higher-level languages and translate them into simple binary that computer CPUs can execute. Typically, this is a multi-stage process that looks something like this:

  1. Front-End: “verifies syntax and semantics” via “lexical analysis, syntax analysis, and semantic analysis”. Basically, the compiler parses each character of the code, makes sure it is valid, then produces an “intermediate representation” to be analyzed in the next step…
  2. Middle-End: This consists of optimizations: e.g. “removal of useless or unreachable code, discovery and propagation of constant values, relocation of computations”. The middle-end takes said intermediate and spits out an altered version, which gets processed during the final step…
  3. Back-End: The back-end goes through a series of further, even more complex optimizations, based on a particular CPU, and then generates the actual, final code to be run: “Typically the output of a back end is machine code specialized for a particular processor and operating system.”

In summary, the compiler does basic parsing of the high-level code, then goes through several series of alterations and optimizations, then finally produces a low-level, optimized binary code that can be quickly read and executed by the machine.

Many programmers have gone through and coded their own compilers from scratch, which are freely available online. One of the most cited examples was that of Rui Ueyama, a Japanese programmer who put his source code (95% in the C language) on Github (https://github.com/rui314/8cc), along with a detailed, day-by-day description of its creation over the span of 40 days: https://www.sigbus.info/how-i-wrote-a-self-hosting-c-compiler-in-40-days

This article is just scratching the surface of compiling; I apologize for the superficiality of my descriptions, but frankly, even after several hours of research, I’m still struggling to process some of the basic ideas. At this point, all I can say is that compilers are magical programs that take high-level code and, thousands of lines of mathematically-informed algorithms and code, turn said code into strings of 0s and 1s that our computers use to do our bidding- provided that our own bugs don’t doom us to failure from the start.

Written by

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store