Posts

Building a Toy Language Compiler in Go

Image
A follow-up to my previous post about building a toy language interpreter in Go. Throughout this post, I'll be referencing my repo . We'll go over a compiler which translates source code to machine code, and then a VM to execute this machine code. What is a compiler?  A compiler takes source code and translates it to executable machine code . (See my previous post for more details.) The figure below shows the high-level architecture of a simple compiler. Like the interpreter we built previously, our compiler has a lexer and a parser which turn our source code into an AST . This is our compiler's frontend . However, that's where a compiler diverges.  The compiler may translate the AST into at least one internal representation (IR) . The IR(s) may be represented using any data structure (even another AST!), but they should follow two rules: (1) be easy to produce, (2) be easy to translate into the target machine code. The IR can also be optimized...

Building A Toy Language Interpreter in Go

Image
In this post, I'll be giving a high-level overview of interpreters, why they're interesting, and how they work. I'll be referencing my repo, go_interpreter , which is a simple interpreter written in Go for a toy language. The figure below shows the structure of our interpreter with an example input: What is an interpreter?  An interpreter is a language processor that seems to directly execute the source program on user input: In contrast, a compiler first takes in a source program and translates it to an executable machine code target program. Then, the user can pass in inputs to this target program and get outputs. In general, an interpreter is better at reporting errors since it's processing statements one at a time, and a compiler is faster at generating outputs from inputs. In this post, we're going to go over a simple interpreter that is capable of supporting: integers, booleans, strings, arrays, hashmaps prefix, infix operators index opera...

The CAP Theorem

Image
The first explanation I heard about the CAP Theorem reduced it to " c onsistency, a vailability, network p artition tolerance: pick two out of three ". However, as catchy as this statement was, I discovered that it wasn't accurate. In fact, the CAP Theorem has been widely misunderstood and misused since it was named in 2000  and formalized in 2002 . In this blog post, I'll compile and clarify what I have learned from papers, blogs, books, and my distributed systems engineering class. C, A, and P  Let's define each of the letters before talking about the CAP theorem. Consistency Gilbert's and Lynch's 2002 paper considers only the linearizability consistency model . Linearizability is a guarantee on individual reads and writes of a single object. A total order on all operations must exist such that each operation appears to have executed instantly and atomically at some point between when it was called and when it sent back a response. Time constra...

tldr: Bitcoin

Image
Part of the tldr series , which summarizes papers I'm currently reading. Inspired by the morning paper . Bitcoin: A Peer-to-Peer Electronic Cash System [2008] Nakamoto  Introduction Nakamoto was motivated by how much electronic payments relied on trusting third-party financial institutions. Thus, they created Bitcoin, an "electronic payment system based on cryptographic proof instead of trust", which eliminates the need for a central bank.  Transactions An electronic coin is a chain of digital signatures . Each owner has a public and private key (also called a wallet ). They can transfer a coin to the next owner by signing the hash(previous transaction + next owner's public key) with their private key, and appending this to the end of the chain of signatures.   Transactions also contain multiple inputs and outputs, so change is easy. (For example, let's say you wanted to pay someone 0.03, but you only have a previous transaction where you received ...

tldr: Dynamo, Amazon's Highly Available Key-Value Store

Image
Part of the tldr series , which summarizes papers I'm currently reading. Inspired by the morning paper . Dynamo [2007] Introduction Dynamo is a eventually consistent, distributed key-value store built by Amazon to be highly reliable and highly scalable. It's been deployed to handle storage for some of Amazon's core services. The diagram below shows Amazon's platform, and where Dynamo fits in.  System Architecture The figure below is a table of the distributed systems techniques that Dynamo uses. Its interface is made up of two operations:  get (key) Locates object replicas corresponding to the key Returns object(s) with conflicting versions, and a context (defined below) put (key, context, object) Figures out where to store the replicas of the object based on the key Writes replicas to disk The context encodes system metadata (e.g. version) about the object, and helps the system validate the object in the put request Partitoning  In...

tldr: Chord

Image
Part of the tldr series , which summarizes papers I'm currently reading. Inspired by the morning paper Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications [2001] Stoica et. al Introduction The motivation for this paper is distributed peer-to-peer applications, which can't efficiently locate the node that stores a particular data item. Specifically, Chord supports 1 operation: given a key, it maps the key to a node.  It uses consistent hashing to balance load, and Chord nodes only needs routing information for a few other nodes.  Motivation & Context Chord is a library linked to client and server applications that: Provides a function, lookup(key) --> IP address of node with key Notifies application on node of changes in its set of keys  The above figure shows an example of an application using Chord. Chord addresses the following issues: Load balance Chord is basically a distributed hash function that even...

tldr: Bayou

Image
Part of the tldr series , which summarizes papers I'm currently reading. Inspired by the morning paper . Bayou, a Weakly Connected Replicated Storage System [1995] Terry et. al Introduction Bayou is motivated by providing a storage system for applications that are: Collaborative (conflicting, concurrent activity) Weakly connected (mobile computers with poor network connectivity) Non real-time  For example, it can support shared calendars.  In fact, Bayou only requires occasional, pairwise communication between computers. Furthermore, it follows a weakly consistent, replicated data model . Finally, it implements domain-specific, automatic conflict resolution . This allows the system to remain available.  System Overview  Applications (represented in the above figure as clients) interact with servers through the Bayou API. Although clients interact with a single server at a time, they can switch between interacting with different servers...