One of the bright sides of computers is that they do precisely as they are told. Scripts execute in a completely predictable fashion, with no possibility of misinterpretation by the computer. Any errors, stack overflows and infinite loops are the result of the fallible humans and the error-filled code they compose. For computers, unlike humans, “I was only following orders!” is a wholly acceptable excuse.
“Lower-level” languages, on the other hand, use fewer words and are more difficult for non-experts to parse. The lowest-level language is assembly code, rarely used nowadays except as homework for computer science students. Assembly programs are virtually impossible to read and are much more verbose, as each instruction must be meticulously crafted. Here’s a script, courtesy of MadWizard.org, to calculate the length of a string.
LongStrLen proc Arg1
lpString equ [esp+4]
cmp byte ptr [eax],0
const stringLength = string => return string.length();
Over time, languages have become higher level and much easier to handle for newcomers. By lowering the barriers to entry, laymen such as myself can execute a wide range of tasks without needing a Mathematics/Electrical Engineering Ph.D.
Ultimately, all programming languages are higher-level abstractions from the one form of communication that computers ‘comprehend’: machine language, which ultimately consists of long series of binary numbers: 0s and 1s. qq
According to Wikipedia, the following code calculates the n-th number of the Fibonacci sequence (with hexadecimal strings rather than binary):
8B542408 83FA0077 06B80000 0000C383
FA027706 B8010000 00C353BB 01000000
B9010000 008D0419 83FA0376 078BD989
So how did we go from binary… to Assembly… to C… to C++… to Python? The answer is in the title: compilers!
Brief Description of the Compiler Process
Compilers are scripts that take higher-level languages and translate them into simple binary that computer CPUs can execute. Typically, this is a multi-stage process that looks something like this:
- Front-End: “verifies syntax and semantics” via “lexical analysis, syntax analysis, and semantic analysis”. Basically, the compiler parses each character of the code, makes sure it is valid, then produces an “intermediate representation” to be analyzed in the next step…
- Middle-End: This consists of optimizations: e.g. “removal of useless or unreachable code, discovery and propagation of constant values, relocation of computations”. The middle-end takes said intermediate and spits out an altered version, which gets processed during the final step…
- Back-End: The back-end goes through a series of further, even more complex optimizations, based on a particular CPU, and then generates the actual, final code to be run: “Typically the output of a back end is machine code specialized for a particular processor and operating system.”
In summary, the compiler does basic parsing of the high-level code, then goes through several series of alterations and optimizations, then finally produces a low-level, optimized binary code that can be quickly read and executed by the machine.
Many programmers have gone through and coded their own compilers from scratch, which are freely available online. One of the most cited examples was that of Rui Ueyama, a Japanese programmer who put his source code (95% in the C language) on Github (https://github.com/rui314/8cc), along with a detailed, day-by-day description of its creation over the span of 40 days: https://www.sigbus.info/how-i-wrote-a-self-hosting-c-compiler-in-40-days
This article is just scratching the surface of compiling; I apologize for the superficiality of my descriptions, but frankly, even after several hours of research, I’m still struggling to process some of the basic ideas. At this point, all I can say is that compilers are magical programs that take high-level code and, thousands of lines of mathematically-informed algorithms and code, turn said code into strings of 0s and 1s that our computers use to do our bidding- provided that our own bugs don’t doom us to failure from the start.