I'd like to first mention the kickstarter for me three years ago, Exploring C++20: The Programmer's
]]>I'd like to first mention the kickstarter for me three years ago, Exploring C++20: The Programmer's Introduction to C++ 3rd edition, by Ray Lischner. C++ is frightening to engineers of any background. Exploring C++20 is broken down into pieces not only that anyone can understand, but with good intentions of someone who has battled against "the bad" parts of C++ and wants to forget them. The book allows you to skip C++'s dreadful history and write modern, well-written C++ today — I felt qualified in the language after reading it alone.
At the bottom of this page, you will find a changelog addendum with the date and time of future changes.
RAII - Resource acquisition is initialization (RAII) is an idiom that can help cope with resources in a safe way. The idiom is also known as constructor acquires, destructor releases (CADRe), and scope-based resource management (SBRM). RAII takes advantage of the symmetry of a class between its constructor and its corresponding destructor. We can allocate a resource in the constructor of a class, and we can deallocate it in the destructor. If we create such a class as a template, it can be used for different types of resources
Smart pointers - std::unique_ptr<T>
, std::shared_ptr<T>
, std::weak_ptr<T>
std::shared_ptr<T>
and std::weak_ptr<T>
are thread-safe by design.std::atomic<T>
in C++20
Prefer std::jthread
and joinable threads
std::thread
class requires the join()
method be called explicitly to wait for the thread to finish. This can lead to programming errors. The C++20 standard provides a new thread class, called std::jthread
, that solves this inconvenienceIn modern C++, avoid all explicit use of new
and delete
Take advantage of Boost
Prefer immutable data to mutable data
Write code well by default
Global scope using
directive such as using namespace std;
— Never ever do this. Ever.
Prefer Uniform Initialization
Design to enable optimizations (by the compiler)
Use static_assert
for compile-time assertion checks
std::move<T>
and &&type
rvalue references for move semantics
Use empty namespace instead of global static
declarations
detail
or internal
namespace for implementation detailsData members should be private by default
Always keep data members private, and provide access only through member functions. See OCP "Closed for modification, open for extension" open-closed principle
Avoid trivial getters and setters
A class invariant is a condition that must hold for all valid instances of a class
Any class that has virtual functions must declare its destructor to be virtual
too
I'll get to the point and let
]]>I'll get to the point and let it speak for itself:
export module palindrome;
import <ranges>;
import <locale>;
export import <string_view>;
export bool constexpr is_palindrome(std::string_view str) {
auto letters_only{str
| std::views::filter([](char c) { return std::isalnum(c, std::locale{}); })
| std::views::transform([](char c) { return std::tolower(c, std::locale{}); })
};
auto reversed{letters_only | std::views::reverse};
return std::ranges::equal(letters_only, reversed);
}
Wow, right? That's a lot to unpack - especially if you haven't looked at C++ in a while. For starters, C++20 has a new modules feature akin to import/export
in other languages. Second, C++20 has what's known as "ranges" that work on "views":
std::vector<std::string> strings{"string", "one", "two", "testing" };
auto sizes = strings
| std::ranges::views::transform([](auto&& str) { return str.size(); })
| std::ranges::views::filter([](auto size) { return size > 3; });
The astute programmer (or PLT enthusiast) will notice that ranges
are higher-order operators and views
are an algebraic type abstraction that satisfies constant time copy, move, and assignment operators.
This is profound because it's very close to monadic composition. Not even Rust has higher-kinded polymorphism or free monadic composition.
This is just the beginning. C++23 takes the next big step — behold — the Maybe monad:
using namespace std::literals;
const std::vector<std::optional<std::string>> v
{
"1234", "15 foo", "bar", "42", "5000000000", " 5", std::nullopt, "-43"
};
for (auto&& x : v | std::views::transform(
[](auto&& o)
{
// debug print the content of input optional<string>
std::cout << std::left << std::setw(13)
<< std::quoted(o.value_or("nullopt")) << " -> ";
return o
// if optional is nullopt convert it to optional with "" string
.or_else([]{ return std::optional{""s}; })
// flatmap from strings to ints (making empty optionals where it fails)
.and_then(to_int)
// map int to int + 1
.transform([](int n) { return n + 1; })
// convert back to strings
.transform([](int n) { return std::to_string(n); })
// replace all empty optionals that were left by
// and_then and ignored by transforms with "NaN"
.value_or("NaN"s);
}))
std::cout << x << '\n';
Moreover, flat_map
was added as a free standard header for container types. I can't put the language down right now.
As a personal researcher in PLT and computational linguistics, C++ is hitting all the right spots, and I expect this to continue to be the case for quite some time. There are enough toys in it to keep me busy for many years. On a side note, as far as PLT goes, I am not particularly interested in Rust: the syntax is offputting, and it can't do anything that grabs my attention.
I'd like to end this post with one last gem, C++23 has a new class templatestd::expected
- functional programmers should quickly catch that this is the try monad for error handling. :-)
Catch you next time!
]]>I've been working on a noteworthy amount of language and compiler design for the last couple of years, both professionally and at home. In compiler theory, some of the most useful tools are intermediate languages, intermediate representation (IR), and syntax-directed translation (SDT).
This article is an introduction to IRs — I plan to work on a series to cover the intricacies of what I call the 'middleware' of a compiler.
Consider the expression:
2 * (1 + 1)
Of course, there are a few ways we could represent and store the meaning of this expression. The first that comes to mind is probably a syntax tree:
*
/ \
2 +
/ \
1 1
We can construct the expression (and, even the result) by evaluating the tree. We could build it with any algebraic data type or language with pointers.
What other representations do you think would be useful?
Another is Polish notation:
* 2 + 1 1
The usefulness of polish notation may not be so apparent at first, but it in fact has many! First, wrap the operators with parenthesis, and we have what's called an s-expression. A symbolic expression of a tree-like data construct that comes from lisp. But there's more — that means the expression can be evaluated as-is in a lisp program! Your calculator in grade school evaluated its calculations in one of polish or reverse polish notation. This form supports smaller or embedded machines with limited resources.
I like to call s-expressions a 'compression' of foldable data structures and algorithms. It's easier to store and re-use the expression in this representation, with far fewer characters, and we get the same result.
I've used s-expressions to compress, store, and construct immutable data structures. They are also useful in automata theory and regular languages.
On paper, polish notation may be easier to identify optimizations than with the AST representation:
\[\begin{aligned} \ * \ 2 + 1 \ 1 =\\ * \ 2 + 2 \end{aligned}\]
We can merge the trailing + 1 1
into simply a 2
. Congratulations, you've learned the gist of intermediate representations! The outcome is the same, and we simplify the expression.
The generalization of choosing a representation or representation language — and the difference in facilitating efficiency and optimization before the final result — is the heart of the theory of IRs. In our first example, we represent mathematical arithmetic.
It is this simple concept that constitutes the representation of a programming language or formal system to a machine (or abstract machine, such as a Turing machine) in a compiler.
The cleverness a compiler may have in reasoning with a piece of code to platform-dependent machine code is founded on informed assumptions it can make between its abstract representation and the capabilities of the abstract target machine.
Next, consider the program:
x = 10
y = 12
if x == 5:
x = x + 2
else:
x = x - 1
The same scientific process we used in our first example to adequately represent the equation before the evaluation applies here. Except, instead of arithmetic to a calculation — a program to an executable or machine.
Parser and language designers only tend to think of abstract syntax trees and parse trees when they think of program or language representation — an AST is only one form of IR a program may use. The measure of potential and helpfulness a compiler or interpreter has towards a target platform depends on its IR flexibility. Support more than one IR, or a composable IR, and you get better potential.
If you're reading this, you've probably heard of a stack. You probably also know that a 'program' may have what's called a stack
and a heap
. The analog of a polish or reverse polish representation to a math equation is the abstract stack machine to a computer for a program.
An abstract stack machine is a simple representation with minimal capabilities, but are fantastic for basic compilers and arithmetic expressions.
As Western University puts it:
The instructions are quite limited and fall into three classes:
- arithmetic operations
- stack manipulation
- control flow.
It similarly simulates the evaluation of a postfix representation of arguments for the stack. With a couple of control flow and stack manipulation instructions like push
, pop
, lvalue
(pushes a data address) rvalue
(content of data at a location) and goto
, gotrue
,gofalse
.
The stack machine representation of our program above may look like:
lvalue y -- add y lvalue address
rvalue 12 -- set rvalue to last address in stack (y)
lvalue x -- add x lvalue address
rvalue 10 -- set rvalue to last address in stack (x)
push x
push 5
eq -- greater or less than? pops last 2 values and push result
gotrue equal -- goto and pop if value at top of stack != 0
gofalse notequal -- goto and pop if value at top of stack is 0
label equal
copy -- make a copy of value at top of stack
push 2
+ -- pop last 2 values and run the '+' operator
:= -- sets r-value on top with l-value below it
label notequal
copy -- make a copy of value at the top of stack
push 1
- -- pop last 2 and run the '-' operator
:= -- sets r-value on top with l-value below it
For a little more information on abstract stack machines, check this out. A couple things to notice here: one, how limited the instructions are to get to our result. Two, how the representation looks like a generalization of an assembly language for an ISA. However, because an abstract stack machine is so simple to represent, a targeted compiler with this IR is tremendously easier to build.
I'm sure you've noticed (if not irritated, at this point) that we have a variable in our program that is not in use. If we were to visualize a table of the abstract stack, notice that we had to add y
first, so that we could pop
the lvalue
address x
off to assign it an rvalue
. How effective is this representation in providing things we care about in a compiler — dead code elimination, common subexpressions, redundancy, or semantic-preserving simplifications?
It's just like the punched-cards in the early Fortran programming era. Since the goal is a stack machine if we were to build a compiler with this representation, removing our y
lvalue to simplify the code before the final code generation would require rebuilding the entire program!
Indeed, this representation is for programs as simple as arithmetic parsers or the same reasons you'd use reverse polish notation (such as building state machines); compilers that use them are very easy to build and can fit on computers from the mid-1900s. However, it isn't effective for a program with general control logic flow and facilitating efficiency — the lack of support for register machines or memory-to-memory makes the case even worse.
In my first example, polish notation turned out to be a great representation as suddenly we could evaluate expressions as lisp and support embedded calculators. It was easy to spot and make optimizations to the equation, keeping correctness intact.
In my next article on the topic, we will explore better representations for general-purpose languages and programs that promote modern cross-platform support and computers — such as using C as an IR, or the LLVM. Moreover, the insight provided by choosing one of these IRs over another for more effective targeted code generation and optimization.
]]>My research has lead me to a discovery in problems prevalent to systems that use cooperative multitasking (or non-preemptive multitasking).
First, cooperative multitasking, as the name states, allows processes and programs to voluntarily yield control (that is, suspend and resume) to one another enabling them to solve tasks together.
Such as, of course, an event loop.
Coroutines are software components that generalize this style of multitasking to subroutines. Let's take a look at an example — remember (hopefully not) WinMain
, and GetMessage
in the WinAPI?
Suppose we implemented it like so:
void MessageLoop(HWND Window) async* {
let LPMSG message = yield GetMessage(Window);
yield HandleMessage(Window, message);
}
void HandleMessage(HWND Window, LPMSG message) async* {
match message {
W_OPEN => HandleOpenEvent(Window);
W_CLICK => HandleClickEvent(Window);
W_EXIT => HandleExitEvent(Window);
}
yield MessageLoop();
}
// ...
void main() {
MessageLoop(CreateWindow());
}
Here, we have a designed a system that now scales as many windows and widgets may cooperatively work together for messages in an event loop.
Some languages, however, do not give us this luxury, and instead provide a more limited semicoroutine
(generator) that does not control where it continues execution; that is, they always return to their caller instead of possibly another routine.
There is a remarkable way around that. Let me explain.
I want to briefly touch on Trampolines
. Stay with me, Take a look at this straightforward definition in Haskell:
data Trampoline a = More (() -> Trampoline a)
| Done a
runT :: Trampoline a -> a
runT (Done a) = a
runT (More k) = runT (k ())
That is, Trampolines are a way to 'capture' steps of computation, additionally providing a sort of 'stackless recursion'. In our definition above, running runT
on a trampoline will 'bounce' More
on k
until it's Done
and gives us the final value.
credit: http://bartdesmet.net/
By capturing and solving computation in a continuation-passing style on a trampoline, we can, in fact, interleave steps and thus implement coroutines!
Right. So in scripting languages like Python and Javascript, We don't get some of these things. But what looks very similar to the pattern above, and very similar to 'capturing computation'. Hmm.
🍿
Generators and Promises! Holy crap. Suppose we used a generator as a dispatcher
(or, essentially, a trampoline) on a map of child generators that called each other back and forth in continuation? Let's revisit our MessageLoop program above, in Javascript, assuming it had native async generators
:
async function* messageLoop(window) {
let message = yield* getMessage(window);
yield* HandleMessage(Window, message);
}
}
async function* handleMessage(window, message) {
try {
switch (message) {
case W_CLICK: await HandleClickEvent(window);
}
} catch(e) {
throw new MessageException(message);
}
yield* MessageLoop()
});
}
function main(): void {
const map = {
// initialize the iterators using the generator function references
[messageLoop]: messageLoop(window),
[getMessage]: getMessage(),
[handleMessage]: handleMessage()
}
let current = messageLoop;
while(!map[current].done) {
current = map[current].next();
}
}
We could go further, as creed has done, into a dispatch factory that generalizes contexts and allows us to trampoline an entire systematic process!
Incredible! By simulating a trampoline on an generator dispatch, we have essentially implemented a system of coroutines, thereby gaining the features that coroutines provide.
Since our dispatcher provides multiple entry points
in our program for cooperative tasks to solve, we can now build precise stack traces at any point in the event loop. We may now, also, at a single place deal with all uncaught exceptions by yielding
to a non-preemptive global handler (such as unhandledRejection
or rejectionHandled
) events. Lastly, as above, implement our own efficient cooperative event loops and state machines.
This is huge. It solves many of the issues that many single-threaded, low-feature languages have — such as playing popcorn with exceptions and dealing with poor dependency stack traces — and allows large applications to scale!
]]>The last couple of years I have been researching solutions to loathsome symptoms of engines like libChromiumContent
as a native platform. The biggest one being the bottleneck of operations on the DOM. Although originally things like scrolling caused an entire repaint, it doesn't have to anymore.
There are two reasons why I'm strongly behind them as a platform. The first one, is that Web Assembly is now available (and C/C++, Rust is already available with it). The second reason is the vigorous list of APIs readily available for powerhouse applications. WebGL, Geolocation, Threads, Thread-Safe Typed Buffers, ServiceWorkers, Thread-Safe B+ Tree, Thread-Safe Image Processing, Channel Messaging; the list goes on.
No, really. If it weren't for the prospects of WASM, I'd probably be doing Java and/or Swift.
I want to talk about repaint segmentation of algorithms.
Suppose you have a large set of elements in your application. Say, thousands of backlog lines in a chat program. You represent every line as a <p>
. And every new line is an appended <p>
. It sounds like it wouldn't work, doesn't it? Any modern engine would freeze, as the expensiveness would be too much — much less changing the color or sorting those lines!
Google actually gives us a couple solutions to dealing with visual changes. In conjunction with reducing paint complexity, you can solve scenarios like the one above. I took it a step further. Let's take a look at a sample Google gives us for breaking large tasks into smaller, per frame availability.
Immediately when I saw this, I thought about divide-and-conquer algorithms and how this could be generalized for a class of problems (with easing!).
Applying this segmentation to algorithms of \(\mathcal{O}(n)\) time is trivial. Let's take a look at quadratic, which initially may seem confusing. I took a look at bubble sort, being one of the sorts I know very well.
Here's a Codepen of it working. (source code)
Although
swap
here is only swapping the height, the result is the same. It causes a repaint. Moreover, some element's height may be the same, causing the visual variation illusion.
The code is beautifully straightforward. For \(\mathcal{O}(n^{2})\) (and, by extension, \(\mathcal{O}(n^{3})\)) segment the algorithm at the operation when enough time is available, here being swap
, calling requestAnimationFrame
at each 'depth'. It scales, and we can add easing or delay.
This approach works for ANY n
as it always runs the nth-depth operation when enough time on a frame is available. Mindblowing.
→🦉
Next, I took a look at quicksort
, also straightforward to implement. The generalization works for all logarithmic and combinatoric problems.
To demonstrate how powerful this method is, here is the
quicksort
with buffered segmentation. It does not complete, however, as pivot step atomicity is necessary.
Cool. Alright, let's break this down. First off, we notice quicksort
is async. The operation we segment is the partition, being the base operation for all recursive calls. Because we segment partition
on available frames, we want it to take its time. So I await a Promise
to the next pivot
in the sort — thus scaling the algorithm for all n
. The segmentation here isn't as fast as our quadratic solution. The atomicity of our pivot
operation cant be buffered into a time frame. But the algorithm does complete for all n
smoothly.
If there were many base-case operations, or 0th and 1st, we'd segment them together — scaling into recursion.
Some game-changing stuff here. I am completely confident I could build very large applications with close-to-native (if not fully native) speed. The research mentioned above has removed many problems regarding DOM applications at scale. If you have questions or thoughts send me a message on Twitter.
— Jahan
]]>This holiday weekend has enlightened me on some important topics in functional programming. It has persuaded me to take a real look at category theory.
It all began with a need of intuition on Applicative Functors. They really are such exquisite things.
Alright, prologue aside, let's dive in.
]]>This holiday weekend has enlightened me on some important topics in functional programming. It has persuaded me to take a real look at category theory.
It all began with a need of intuition on Applicative Functors. They really are such exquisite things.
Alright, prologue aside, let's dive in. First off, let's quickly revisit Functor
, HKTs, their implications and how they're important. The language I will be using is Haskell (for now), for obvious reasons, but will try to explain it well.
'Behind free monads, applicatives may be #1 pattern in FP, IMO'
A Functor is defined over types of kind * -> *
, or of unary type constructors. You've probably come across them quite a bit, as they include List
or []
and Maybe
. Where types of kind *
are what we call concrete types (like Int
).
As an informal side note. In other languages they are quite literally similar to generic types such as
List<T>
,Option<T>
— usingOption<f64>
, for example, in Rust. What we really want is higher-kinded polymorphism.
Let's take a look at the Functor typeclass (I'll explain)
class Functor f where
fmap :: (a -> b) -> f a -> f b
If you're unfamiliar with haskell syntax, that's okay. This essentially means for all instances of Functor
it must implement a function fmap
that takes a function of type (a -> b)
and a f a
(where f
is the type constructor or 'context': that is a type of kind * -> *
such as Maybe
or List
) and returns a f b
.
The best way to understand this concretely for someone just starting is to take a look at List
or []
. Remember map
? Its signature looks like:
map :: (a -> b) -> [a] -> [b]
Did something just click there?!? What about the Functor
instance for List
?
instance Functor [] where
fmap = map
I'm sure most of you have mapped over lists. But for completeness, An example is like
map (*2) [1..3]
which gives us[2,4,6]
That's right! Indeed List
is a Functor
, but there are in fact many types which are Functor
! We won't go over that here. The point is functors are things that can be mapped over.
Again, for completeness, Functors (Applicatives, and Monads) implementations must obey laws. I will omit proofs or anything rigorous here but be weary the instances should abide by the laws.
Time for the juicy insides. Let's take a look at the definition:
class (Functor f) => Applicative f where
pure :: a -> f a
(<*>) :: f (a -> b) -> f a -> f b
From our Functor
recap, some things here should be immediately clear. Also, for one, Applicatives
are indeed themselves also Functors
. Cool. So we see Applicatives
implement a pure
function and a (<*>)
(an infix function here, it's also called apply in other languages).
Is it obvious what pure
does here? It should be! if f
is some functorial context, then a -> f a
lifts a
into the Applicative Functor.
What about (<*>)
? It looks quite similar to our fmap
above, doesn't it? Except we notice the f (a -> b)
. Hmm. So it takes a Functor
that has a function in it and applies that function to a second Functor
. If you 'erase' the f
from the type signature, you get normal function application. Let's take a look at examples.
Holy Moly! See what's happening here? For one, In the first example [ (2 *) ]
is already a function in a List
(List
is also an Applicative Functor!). So it's applying the 2 *
function into [5]
In the second example, we lift the binary (*)
function into the applicative context and then apply it to our applicatives.
First off, suppose <$>
is the infix version of fmap
. What if we wanted to map a function over a collection of Applicative Functors?
Note that
(*)
is a binary infix function that takes two parameters a la2 * 2.
Wait one second here... You're telling me I can take a function and fmap
it over a collection of Applicative Functors? Inside their context?
Note that we only use
(*)
here which takes 2 parameters. The real magic here is that we could usen-ary
functions and apply it overn
applicatives!
To understand how powerful this is. Think about Future
s, Either
or Maybe
and other very useful types. Suppose you have defined a Router
or Middleware
over a collection of routes that is an Applicative Functor
. How incredible and effective it would be to fmap
a function over and determine or compose independent effects for all routes.
When I realized how powerful this truly is, I was moved to order a category theory book straight away. Monads do have their shine, and for good reason! But to touch briefly on the difference, they are dependent on the value and context a -> f b
. Applicatives are less restrictive, and have many uses within all functorial types f a
.
Until next time.
]]>It is common in my study to present mathematical examples, proofs, notes, etc. online. Generally, there are tools for analytic problems: things like mathbin, texpaste. What about steps in problems of Geometry — or sharing them soundly?
Through reviewing Intuitive Geometry, I realized how tiring, and inherently tedious it was to share proofs or constructions — or how I got there! Indeed, even presenting the notion of congruence for any two irregular polygons on the plane in steps can get pretty annoying quickly.
Intuitive Geometry is pretty important, being able to spot corollaries of the Pythagoras Theorem is interesting. I built Gridpaste just for this. I've used the tool to construct a proof of the Pythagoras Theorem itself!
I realized how this could be great in academia in general: students could simply give a link to show the answer to their homework with notes, under a verified account.
We'll see where Gridpaste goes — I've implemented graph of functions, constructions with their graphs and annotations with the rest of the tools can get to interesting places. In the future, I'd like to additionally add linear algebra features, probably some calculus concepts, too. Later, if I have the time, I'd like to write a version with three.js
to get some of the same great features in 3-space.
That'd be pretty cool.
— Jahan
]]>