Skip to content

add blog posts about js backend debugging, stacks and weak refs #37

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
115 changes: 115 additions & 0 deletions blog/2023-02-28-debugging-the-js-backend.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
---
slug: 2023-02-28-debugging-javascript-backend
title: Debugging the JavaScript Backend
date: February 28, 2023
authors: [ luite ]
tags: [ghc, javascript, debugging ]
---

## Introduction

I recently gave a short presentation on the topic of debugging the code produced by the GHC JavaScript backend to the GHC team at IOG. This blog post is a summary of the content.

## Debugging JavaScript

Browsers come with powerful development tools for JavaScript. In particular the chrome development tools are very useful for stepping through JavaScript code and inspecting data during the execution of a program. We can still use these development tools on the code generated by the GHC JavaScript backend, but due to the structure of the code, it can sometimes be difficult to figure out where exactly something goes wrong.

This blog post is an experience report that presents a couple of practical techniques for debugging various problems in the JavaScript code.

## Tracing Operations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the introduction you state: ... presents a couple of practical techniques ..., perhaps each technique should be labelled as such in the section header:

Suggested change
## Tracing Operations
## Technique 1: Tracing Operations


Various components of the RTS have tracing options enabled by preprocessor definitions. For example weak reference operations can be traced by compiling the `rts` package with the `-DGHCJS_TRACE_WEAK` cpp option.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example? Maybe show some of the CPP'd code where the debug option lives


Currently, enabling the trace functionality requires rebuilding the `rts` package, while previously with GHCJS it was possible to enable the required tracing by just recompiling the final program. We will likely change this setup to include all tracing functionality in a debug rts liked when using the `-debug` flag, and easily modifyable global settings to enable or disable specific tracing modules.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph should be the last one in the section because it is no longer talking about the technique, rather it is talking about the usability of the technique and then concludes that making this technique more usable is on our roadmap.

So the flow of the technique sections should be:

  • What the technique is
  • What information it provides
  • How to use the technique
  • Then future plans or issues with what we currently ship, i.e., the part where we say "right now this is hard because you have to rebuild with blah blah, but in the future we'll expose a flag"


All the tracing uses the `h$log` function which can be easily modified to redirect the output of the trace, for example tracing only to an array (which can be watched by the JavaScript debugger) and keeping only the last `n` entries.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the key part. As a reader looking to debug my JS backend code this is the part I'm most interested in. Thus you should add the examples that you elude to. That is add an example that demonstrates easily modified to redirect the output of the trace. This would be a lot of value added for the audience.


## Dealing With Tail Calls
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is an example of use case for Technique 1. So label it as such:

Suggested change
## Dealing With Tail Calls
## Use Case: Dealing With Tail Calls


All Haskell code is called from a main loop that looks as follows:

```javascript
while(!haveToYield(c)) {
c = c();
c = c();
c = c();
...
}
```

The main loop keeps calling the funtion returned by the previous call, until the thread has to stop for some reason. This means that the JavaScript call stack isn't very useful for figuring out where something goes wrong in our code: It only contains function calls up to the main loop. If some `c` fails, we don't know much about what calls lead up to the error condition!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that the JavaScript call stack isn't very useful for figuring out where something goes wrong in our code

I would start with this because it is the place the audience is at and a thing the audience will probably assume. So something like this:

  1. Unfortunately the call stack is not that useful...
  2. The reason is that the main loop of the RTS is ...<the example with all the c()...
  3. explain the example: the main loop keeps calling until...
  4. tie the example back into (1): If some c fails, we don't know much about what calls lead up to the error condition!


However the main loop does give us a good opportunity to add some tracing: If we log each `c = c();` call (the function name and possibly the status of the Haskell stack and some relevant global variables) we can reconstruct more easily which conditions resulted in the error. The RTS provides the useful `h$logCall` and `h$logStack` helper functions for this.

Logging main loop calls generates a lot of output, even more so than tracing specific RTS features, so it's probably necessary to redirect and/or truncate the output of `h$log` here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great! this means an example of doing that redirection in the first section is even better!


It's often useful to make the `haveToYield` condition deterministic, by not taking wall clock time into account. This runs each thread until it blocks or finishes (`c === h$reschedule`). That makes runs reproducible, even if more than one Haskell thread is involved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps show how to change haveToYield to achieve this? Is there a CPP flag?


## Data Corruption
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use case


Function call traces are useful if an error condition manifests itself relatively close to the initial problem. But what if our program crashes on some malformed data. We need to know which what caused to data to be malformed in the first place!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph is missing a sentence to tie into the next section:

Suggested change
Function call traces are useful if an error condition manifests itself relatively close to the initial problem. But what if our program crashes on some malformed data. We need to know which what caused to data to be malformed in the first place!
Function call traces are useful if an error condition manifests itself relatively close to the initial problem. But what if our program crashes on some malformed data. We need to know what caused the data to be malformed in the first place! To determine the cause of the malformed data we insert Representation Checks.


### Representation Checks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Representation Checks
### Technique 2: Representation Checks


One strategy is trying to catch the error earlier by introducing more checks to verify that the types of our data are what we expect. JavaScript is dynamically typed, so the browser will happily run our code, even if we use a `number` in a place where we'd normally use an `object`.

When generating code however, we have a lot more knowledge. When we access data fields of a data constructor or closure, we know which type of data we expect. It's straightforward to modify the code generator to add a test after each field access. The file `verify.js` in the `rts` package has some helper functions for this.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to say what a representation check is: a test after each field access
the check provides: guarentees that the data is the type we expect at access time or something like that
how to use: with the helpers in verify.js insert these into some code. Should have an example of doing this.


### Sequence Numbering
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Sequence Numbering
### Technique 3: Sequence Numbering


Representation verification does not always help us find the origin of the problem. Sometimes we'd like to know where some heap object was allocated. We can do this by combining function call tracing with sequence numbers for allocation. After allocating a Haskell heap object we call a helper function that gives the object a unique sequence number. When we run into the error condition with the incorrect data, we inspect its sequence number and match it up with the function that produced it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • what does the sequence number look like?
  • what is the name of the helper function that allocates it? What field is it in?
  • how do I inspect its sequence number and match it up?


Then in another run of the program we can step through the function that produced the wrong data using the JavaScript debugger.

## Debugging Weak References
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Case


Sometimes we don't want to debug a problem where the data itself is wrong, but where the data is used at the wrong time. For the JavaScript backend this issue comes up with the storage manager that keeps track of weak references.

The weak references garbage collector keeps track of every Haskell value that is still reachable from the Haskell runtime system. This means that if a Haskell value is considered to be unreachable by XXX XXX.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this section doesn't seem finished!


## Bisection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Bisection
## Technique 4: Bisection


Sometimes we have a working reference implementation and an optimized implementation in which we want to fix some problem. If our optimization is a more efficient implementation of a specific primop, then we probably know where to look for the problem. But if our optimization is a rewrite pass of all code, things can get a lot more difficult.

The JavaScript optimizer is such a rewrite pass. It takes JavaScript code and rewrites it to a more efficient and compact form.

The pass looks conceptually like this:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great!


```javascript
// input
f() {
// original function body
}

// output
f() {
// optimized function body
}
```

After any change to the optimizer we have to recompile all libraries. If we don't know exactly where to make our changes, this can take a lot of time.

It turned out to be very useful not to search by selectively enabling the optimizer only on part of the code: Once we know which function is broken by the optimizer, we can easily run the optimizer on it separately to find out where it goes wrong.

The trick to making this work effectively was keeping around both the optimized and original code for every function:

```javascript
// input
f() {
// original function body
}

// output
f() {
if(sequence_no_for_f < threshold) {
// original function body
} else {
// optimized function body
}
}
```
Each function gets its own sequence number, starting from zero. We adjust the threshold value that determines which functions run the optimized function body to quickly close in on where the optimized version gives a different results.

## Conclusion

We have seen a few debugging strategies for code generated by the JavaScript backend. Most of them are a bit ad hoc and require modification of the compiler or the compiled code. Over time we will probably make more of them available through code generator flags and from a debugging version of the RTS.
102 changes: 102 additions & 0 deletions blog/2023-02-28-ghcjs-weak-references.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
---
slug: 2023-02-28-weak-references-in-the-javascript-backend
title: Weak References in the JavaScript backend
date: February 28, 2023
authors: [ luite ]
tags: [ghc, javascript, storagemanager ]
---

## Introduction

I recently gave a short presentation on the topic of weak references in the GHC JavaScript backend to the GHC team at IOG. This blog post is a summary of the content.

## Haskell Weak References

The "Stretching the Storage Manager" [ssm][1] paper describes weak references as implemented by GHC. These weak references are available through the `System.Mem.Weak` module. Each weak reference connects a key and a value. The value is kept alive by the weak reference as long as the key is alive. Optionally, weak references can have a finalizer of type `IO ()`, which is run after the key becomes unreachable.

## JavaScript Weak References

JavaScript has weak references on its own, specifically the `WeakMap`. But the functionality is quite different from Haskell's. `WeakMap` is not iterable, its size is not visible and it has no finalizers. Therefore it's impossible to observe when a weak value has become unreachable.

There have been proposals to add finalizers to `WeakMap` but so far they haven't been implemented because they introduce nondeterminism and expose reachability information which could impact security.

Specific JavaScript environments like node.js do have weak references with the required functionality to implement Haskell `Weak#`. On these platforms we could substitute the general purpose `Weak#` implementation, and we could verify consistency between the general purpose implementation and a node.js specific one.

## Checking Reachability

Since we don't have a way to determine which `Weak#` keys have become unreachable, we have to do the opposite: Check which values are still reachable. The general idea is as follows: Every Haskell heap object gets a mark property `m`, which is changed by the rechability checker.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the set of value that m can have? Bool? Int?


After scanning the whole heap we can determine which `Weak#` keys are still reachable by checking if their mark has been updated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when does a scan occur?


## Weak Implementation

All Haskell heap objects have an identical object stucture, with an entry function and some data properties. The entry function also contains metadata about the object, for example the constructor tag for data constructors and the arity for functions.

```javascript
// heap object (incomplete)
{ f // function, entry point
, d1 // any, first data property
, d2 // any, second data property (or indirection to more data)
}
```

To be able to keep track of reachability, we add one property `m` to each object:

```javascript
// heap object (incomplete)
{ f // function, entry point
, d1 // any, first data property
, d2 // any, second data property (or indirection to more data)
, m // number, garbage collection mark
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aha! its a number! But why is it a number? if its 2 does that mean there are two things pointing to it?

}
```

The mark gets updated by the code that checks for reachability of everything. This means that we could implement a `Weak#` as follows:

```javascript
// weak (not actual)
h$Weak {
key: heap object
, value: heap object
, finalizer: null or heap object
}
```
Comment on lines +54 to +63
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great! Much better examples in this one


But this means that our `h$Weak` keeps both the key and the value alive. For the operations that `Weak#` needs to support, this isn't necessary, and in fact we'd like to avoid it so that the JavaScript storage manager can reclaim memory as quickly as possib. That's why we make another change to the heap objects, adding an optional indirection to the mark:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
But this means that our `h$Weak` keeps both the key and the value alive. For the operations that `Weak#` needs to support, this isn't necessary, and in fact we'd like to avoid it so that the JavaScript storage manager can reclaim memory as quickly as possib. That's why we make another change to the heap objects, adding an optional indirection to the mark:
But this means that our `h$Weak` keeps both the key and the value alive. For the operations that `Weak#` needs to support, this isn't necessary, and in fact we'd like to avoid it so that the JavaScript storage manager can reclaim memory as quickly as possible. That's why we make another change to the heap objects, adding an optional indirection to the mark:


```javascript
// heap object (actual)
{ f // function, entry point
, d1 // any, first data property
, d2 // any, second data property (or indirection to more data)
, m // number/h$StableName, garbage collection mark
}

h$StableName {
stableNameNo: number, unique identifier
, m : number, garbage collection mark
}
```

Now we can replace a `number` mark by an `h$StableName` for the key, and then create the weak reference as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should provide a link to the documentation of StableName and Weak.

Also you should mention that:

  1. StableNames have the crucial property here of not keeping objects they refer to alive.
  2. in the JS implementation, makeStableName is guaranteed to return the same StableName for a given heap object: heap objects have an link to their associated StableName (if any). In addition, during GC traversals, if an heap object is marked as reachable, its associated StableName (if any) is marked as reachable too.

Hence StableNames are a perfect proxy to know if a heap object is reachable without keeping the actual object alive. This is exactly what we need for the key of Weak.


```javascript
// weak (actual)
h$Weak {
key: h$StableName, the stablename of the key heap object
, value: heap object
, finalizer: null or heap object
}
```
This way the `h$Weak` does not reference they key itself. It still knows when the key is unreachable, since the mark of the `h$StableName` of the key would not be updated anymore.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great explanation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This way the `h$Weak` does not reference they key itself. It still knows when the key is unreachable, since the mark of the `h$StableName` of the key would not be updated anymore.
This way the `h$Weak` does not reference the key itself. It still knows when the key is unreachable, since the mark of the `h$StableName` of the key would not be updated anymore.


## Finalizers

Every time the heap is scanned for dead weak references, the associated finalizers are collected. After the pass, if at least one finalizer needs to be run, the storage manager schedules a new thread. This thread runs all the finalizers of the pass. Exceptions are handled between finalizers, but a finalizer that takes a long time will delay execution of the others.

## Conclusion

We have seen the implementation of weak references in the JavaScript backend. Since we cannot use the JavaScript engine to determine which Haskell heap objects are reachable we use a custom reachability check to implement the required functionality. We have chosen the implmentation in such a way that the JavaScript engine retains as little memory as possible.


[1]: Peyton Jones, Simon and Marlow, Simon and Elliott, Conal, Stretching the storage manager: weak pointers and stable names in Haskell, Proceedings of the 11th International Workshop on the Implementation of Functional Languages, 1999, https://www.microsoft.com/en-us/research/publication/stretching-the-storage-manager-weak-pointers-and-stable-names-in-haskell/
147 changes: 147 additions & 0 deletions blog/2023-02-28-js-backend-stacks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
---
slug: 2023-02-28-stacks-in-the-js-backend
title: Stacks in the JavaScript Backend
date: Febuary 28, 2023
authors: [ luite ]
tags: [ghc, javascript, threads, rts ]
---

## Introduction

I recently gave a short presentation on the topic of stacks in the JavaScript backend to the GHC team at IOG. This blog post is a summary of the content.

## Haskell Lightweight Stacks

In the context of a program produced by the GHC JavaScript backend, two different types of stack exist: The JavaScript call stack and Haskell lightweigt thread stacks. This blog post deals with the latter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In the context of a program produced by the GHC JavaScript backend, two different types of stack exist: The JavaScript call stack and Haskell lightweigt thread stacks. This blog post deals with the latter.
In the context of a program produced by the GHC JavaScript backend, two different types of stack exist: The JavaScript call stack and Haskell lightweight thread stacks. In this post we'll ignore the JavaScript call stack and instead focus our attention on the Haskell lightweight thread stacks.

In general its an anti-pattern to say former...latter because it creates an indirection in the readers head that they must keep track of. This increases the cognitive load of the writing and can make it harder to parse. So its better to be direct if possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I've certainly been guilty of using former/latter too.


Each Haskell thread has a thread state object, `t` of type `h$Thread`. This object stores the state of a lightweight thread, for example whether the thread is finished or whether asynchronous exceptions are ignored. It also contains `t.stack`, an array representing the stack and `t.sp`, a number pointing to the current top of the stack.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imagine ordering some food from a cafe. Its clearer to the barista if you start at a high level and then refine the information: I'm ordering 3 things: 2 drinks and a pastry. The drinks are both lattes, same size, but one with oat milk the other regular....and so on. The point is to first give a high level view: There are 3 things then refine that view over time: 2 drinks and a pastry and The drinks are both .... The other key point here is to order the things you've introduced and to stick to that order: The information for the drinks always comes before the information for the pastry because that order is how these things were introduced: 2 drinks and a pastry.

So how about:

Suggested change
Each Haskell thread has a thread state object, `t` of type `h$Thread`. This object stores the state of a lightweight thread, for example whether the thread is finished or whether asynchronous exceptions are ignored. It also contains `t.stack`, an array representing the stack and `t.sp`, a number pointing to the current top of the stack.
Each Haskell thread has a thread state object, `t` of type `h$Thread`. This object has three fields: the state, `t.status`; a stack, `t.stack`; and a stack pointer, `t.sp`. `t.status` tracks the state of the object, for example, it determines whether the thread is finished or is ignoring asynchronous exceptions. `t.stack` is an javascript array representing the stack of the thread, and `t.sp` is a number that tracks the top of `t.stack`.


`t.stack` grows dynamically as needed, and is occasionally shrunk to reclaim memory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge this line in with the previous paragraph. Its an orphan and is continuing the topic of the previous paragraph. Thus it belongs in that paragraph.


When a thread is created, the stack is initialized with some values:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When a thread is created, the stack is initialized with some values:
When a thread is created, the stack is initialized with some key values:

These aren't just any values, they are particular and key values to the correctness of the implementation.


```javascript
/** @constructor */
function h$Thread() {
this.tid = ++h$threadIdN;
this.status = THREAD_RUNNING;
this.stack = [h$done
, 0
, h$baseZCGHCziConcziSynczireportError
, h$catch_e
];
this.sp = 3;
this.mask = 0; // async exceptions masked (0 unmasked, 1: uninterruptible, 2: interruptible)
this.interruptible = false; // currently in an interruptible operation
...
}
```

The initial stack contains two stack frames. The top three slots contain a `catch` frame with the `h$catch_e` header, the `h$baseZCGHCziConcziSynczireportError` exception handler and `0`, for the mask state. The last slot of the stack is for `h$done` frame, which only has a header and no payload.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how could I tell from looking at the example that the stack contains two frames? I don't think I can, so maybe say that. Other than that this is good because you go from high level and then refine like I suggested above.


## Scheduling a Thread

Typical Haskell code does a lot of manipulation of values on the stack. It would be quite inefficient to do all of this through the thread state object of the current thread, `h$currentThread`. That's why the stack `h$stack` and the "stack pointer" `h$sp` to the top of the stack are global variables that are initialized when a thread is scheduled:

```javascript
// scheduling a thread t
h$currentThread = t;
h$stack = t.stack;
h$sp = t.sp;
Comment on lines +46 to +50
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explain the examples:

Perhaps like this:

We see that h$currentThread is set to the thread object t making t now the thread that is running from the view of the scheduler. Next we mutate the global variable tracking the stack to the thread stack: h$stack = t.stack and similarly set the global stack pointer to the thread stack pointer in the next line. So scheduling a stack is really a matter of assigning the proper thread information to the global variables that are used by the scheduler.

```

When a thread is suspended, the values are saved back to the thread state object:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is funny but I was reading through the history of chez scheme: https://legacy.cs.indiana.edu/~dyb/pubs/hocs.pdf

and this is exactly one of the ways they implemented continuations in an early version of chez (see Section 3, paragraph 3 beginning with The solution for continuations seemed obvious)


```javascript
// suspending a thread t
t.stack = h$stack;
t.sp = h$sp;
h$currentThread = null;
Comment on lines +55 to +59
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explain this example

```

## Stack Frames
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aha! Perhaps link or reference this earlier at the place I made a comment about stack frames


Each stack frame starts with a header, which is a JavaScript function. The header is followed by zero or more slots of payload, which can be arbitrary JavaScript values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps mention that in JavaScript functions can have properties and that we use this feature to indicate the number of stack slots for the frame payload.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see that you mention this later. I would put this her to first explain the structure of the frame and then how we use it.


The header serves as the "return point": When some code is done reducing some value to weak-head normal form it returns this value to the next stack frame by storing it in `h$r1` (or more for large values or unboxed tuples), popping its own stack frame and calling the header of the next stack frame at `h$stack[h$sp]`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explain that h$r1 is a register.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would write something like this to introduce the topic first:

GHC's calling convention for functions generated from STG is to always perform tail-calls, where the tail-call target is a continuation.
In practice, values "returned" by a function are in fact passed as arguments to its continuation.
When the continuation isn't statically known, it is passed via the stack, similarly to C's calling convention where return addresses are passed into the C stack.

In details what happens in this case is:

  1. "returned values" are stored into global variables corresponding to registers (h$r1...)
  2. the current function pops its own stack frame from the stack (if any?)
  3. remember that the header of a stack frame is directly a JavaScript function: the entry code of the stack frame. The current function should call this function to call the continuation.
  4. BUT remember that tail-calls aren't supported by JavaScript, hence what happens is that the current function returns the continuation to the scheduler instead of calling it directly, avoiding ever-growing call stacks. The scheduler then calls it (this method of implementing tail-calls is called "trompolining").

Here is an annotated code example of this process: ...


An example is shown below.

```javascript
function h$stackFrame_e() {
...
h$r1 = somethingWHNF;
h$sp -= 3; // pop current frame
return h$stack[h$sp]; // return to next frame
}
Comment on lines +71 to +76
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explain the example in a paragraph that immediately follows the example

```

The header also contains metadata, stored in properties of the function object. Of particular interest is the `size` property, which contains the size of the stack frame in slots. Certain operations, like throwing exceptions or restarting STM transactions need to know the size of each stack frame to be able to "unwind" the stack.

Almost all stack frames have their size stored in the `size` property of the header. An exception is the `h$ap_gen` frame, which contains an arbitrary size function application. This frame type does not have a fixed size, and the size is stored in the payload of the frame itself. Frames `f` with the size stored in they payload of the frame have `f.size < 0`.

## Exception Handling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this section was very good. The examples need to be explained more but other than that I thought it was very nice and clear. Good job!


During normal execution of a program, the code that manipulates the stack has knowledge of the specific stack frame it's working with: It knows which values are stored in each stack slot. However there are also operations that require dealing with all kinds of unknown stack frames. Exceptions and STM are the most important ones.

Haskell allows exceptions to be thrown within threads and between threads as an alternate way of returning a value. The throw operation transfers control to exception handler in the next `catch` frame on the stack.

The `catch` frame has two words of payload:

```javascript
0 // mask status
h$baseZCGHCziConcziSynczireportError // handler
h$catch_e // header
```

The code for the header is straightforward, it just pops the stack frame and returns to the next frame. This is what happens if no exception has occurred; the code just skips past the exception handler:

```javascript
function h$catch_e() {
h$sp -= 3;
return h$stack[h$sp];
};
```

An exception is thrown by the `h$throw` function, which unwinds the stack. Its implementation in simplified form looks like this:

```javascript
function h$throw(e, async) {
...
while(h$sp > 0) {
f = h$stack[h$sp];
...
if(f === h$catch_e) break;
if(f === h$atomically_e) { ... }
if(f === h$catchStm_e && !async) break;
if(f === h$upd_frame) { /* handle black hole */ }
h$sp -= h$stackFrameSize(f, sp);
}
if(h$sp > 0) {
var maskStatus = h$stack[h$p - 2];
var handler = h$stack[h$sp - 1];
...
}
/* jump to handler */
}
```

`h$throw` keeps removing stack frames from the stack until some frame of interest is found. Eventually it transfers control to an exception handler or it reports an error if no exception handling frame could be found. `h$throw` uses the `h$stackFrameSize` helper function do determine the size of each frame.

```javascript
function h$stackFrameSize(f) {
if(f === h$ap_gen) {
return (h$stack[h$sp - 1] >> 8) + 2;
} else {
var tag = f.size;
if(tag < 0) {
return h$stack[h$sp-1];
} else {
return (tag & 0xff) + 1;
}
}
```

## Conclusion

We have that stacks in the JavaScript backend are represented by JavaScript arrays. The contents on the stack consists of stack frames with a header and a payload. The header of each stack frame contains some metadata so that code for exception can traverse the stack and transfer control to an exception handler.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We have that stacks in the JavaScript backend are represented by JavaScript arrays. The contents on the stack consists of stack frames with a header and a payload. The header of each stack frame contains some metadata so that code for exception can traverse the stack and transfer control to an exception handler.
We have seen that stacks of Haskell lightweight threads are represented by JavaScript arrays with the JavaScript backend. The contents on the stack consists of stack frames with a header and a payload. The header of each stack frame contains some metadata so that code for exception can traverse the stack and transfer control to an exception handler.