Working With Janet's Threads
This post is about some internal workings of Janet and Jwno, especially how threads work and communicate. You may as well take it as me mumbling to myself. I try to explain things, but you may still need some basic knowledge about Janet and system programming to go through it.
I had never thought about how a virtual-machine-based and garbage-collected programming language manages its threads and memory, until I started playing with Janet, so what I wrote here may be wrong, and I’d be grateful if someone can help me point out the mistakes.
And obligatorily:
A Tale of Two Event Loops
Jwno is a Windows™ application, naturally it has a window message event loop. It’s crucial for most GUI-related stuff.
And Jwno is an application written in Janet. Janet also has its own event loop built-in to the core. Some of Janet’s cool features (e.g. fibers) depend on it.
In my limited programming experience, when two event loops clash, there’re generally these three ways to make them work together:
-
“Cascade” those two event loops. This way we can avoid multi-threading comletely, and most parts of the application can work together seemlessly. A good example would be cascaded epoll instances in Linux. Windows has
MsgWaitForMultipleObjects
to combine the window message event loop and other I/O events, while Janet usesI/O Completion Ports
at the heart of its event loop. There may be a way to connect these two and fold them into one thread. While this looks like a fun research topic, I’m not sure I know enough about Windows APIs and Janet internals to take this path. -
Use a polling architecture. This can also be done in a single thread. We just ask our event sources, one by one, for new events, then sleep for a while, and repeat. The window message event loop and Janet’s event loop can both work in a “single step” mode, so this could work, but adopting it in an interactive application like Jwno just feels… wrong.
-
Use threads to isolate them loops. To me this idea sounds boring and dangerous at the same time, but I know for sure it works, since it’s what Jwno is currently using. There’re details on managing threads and communicating between them, we’ll get to those in the remaining parts of this post.
Janet’s Threading Model
Simply put, Janet creates a separate instance of virtual machine for every thread it spawns. A VM instance has its own heap and garbage collector, so the garbage collection also mostly works on a per-thread basis. Unlike system threads that share the same process memmory address space, Janet threads by default can’t access data that’s managed by another Janet thread, since the data may suddenly get vaporized by the other garbage collector.
But consider this simple function:
(defn test-thread-isolation []
# thread #1
(def facts @{:janet-is-cool true})
(ev/do-thread
# thread #2
(put facts :ak-is-cool true)
(printf "From thread #2: %n" facts))
(printf "From thread #1: %n" facts))
And, spoiler alert, the function’s output:
From thread #2: @{:ak-is-cool true :janet-is-cool true}
From thread #1: @{:janet-is-cool true}
The code inside ev/do-thread
(thread #2) sees :janet-is-cool
, but the code outside (thread #1) can’t see :ak-is-cool
.
Did thread #2 violate the “no access to other threads’ stuff” rule? Not really. The fact is, thread #2 was just modifying a copy of the original variable, and that’s why thread #1 can’t see the modification. ev/do-thread
and friends carry out a complex ritual when spawning a new thread, called “marshalling”, to transparently copy data between threads, so that we can use closures seemlessly across thread boundaries.
The Ritual To Spin Up a Thread
To create a new thread, Janet does roughly these things:
- Marshal (pack up) the actual code to be run in the new thread, together with its environment, including any variables captured in its closure.
- Create a new system thread, and send it a buffer where the marshaled data resides.
- Initialize a new VM instance in the new thread.
- In the new thread, unmarshal (unpack) everything it received, and start running the freshly unmarshaled code.
After marshalling, the data looks like a network packet. The “packet” is independent of any thread, so can be moved around safely. When the receiving thread unmarshals the data, it takes the unmarshaled objects under its own management.
This whole process essentially copies Janet objects (both code and data) between threads. An object before marshalling is distinct from its counterpart after unmarshalling. We can simulate this process in the REPL:
repl:28:> (def a @[1 2])
@[1 2]
repl:29:> (def buf (marshal a))
@"\xD1\x02\x01\x02"
repl:30:> (def aa (unmarshal buf))
@[1 2]
repl:31:> (= a aa)
false
(Ab)normal Cross-Thread Communication
After the spin-up ritual, normal Janet threads usually use (ev/thread-chan)
(threaded channel) objects to communicate. You put stuff in the channel from one thread, and then take stuff out of the channel from another thread. The channel does the (un)marshalling transparently. This is the happy path and you can go through the official docs for more info, so I won’t elaborate here.
I took the not-so-happy path because, I soon hit a blocker when trying to use channels in my application: One of my threads is not a normal Janet thread. The original idea was to isolate two event loops, so there’s no event loop for Janet in my window-message-processing thread. It can’t run channel-related code, since that code depends on Janet’s event loop.
Then after some research on Janet’s internal rituals, I came up with this… abomination called alloc-and-marshal, and its ocunterpart unmarshal-and-free.
Here’s how they work together:
- In one thread,
alloc-and-marshal
allocates a buffer that’s not managed by the garbage collector, and saves the marshaled data in it. - The raw pointer pointing to that buffer gets sent to the other thread.
- Then in the other thread,
unmarshal-and-free
unmarshals the data, and frees the buffer.
So the marshaled data goes rogue with the unmanaged buffer for a little while, until it gets handled by the receiving thread. If the receiving end crashed when the data is still in-flight, the data will be lost in the void. But then I’ll have problems more serious than memory leaks, so this works quite well in practice.
A Suprising Behavior of Unmarshalling
The marshaled data looks like a network packet, so I (incorrectly) assumed that marshalling works just like packet serialization. Jwno uses Win32 UI Automation event handlers, and they run in an system-controlled thread pool. A handler function can get called in different worker threads, so at one point I did things like this:
- Save my marshaled handler function to a buffer.
- When a worker thread needs to call the handler function, unmarshal it from the buffer.
- When another worker thread needs to call the same handler function, unmarshal it from the same buffer again.
Unfortunately this caused random crashes.
After some hair-pulling investigation, it turned out the culprit was the act of unmarshalling a marshaled threaded abstract object multiple times.
Janet has so-called threaded abstract types. Objects of these types are reference-counted, and different threads can hold references to the same in-memory threaded abstract object. To keep a threaded abstract object alive in-flight, the marshalling code increments its ref-count, then the unmarshalling code may accordingly decrement the ref-count.
So unlike deserializing a network packet, which is usually free of side effects, unmarshalling the same marshaled data multiple times may destroy objects that are still in use, causing crashes.
Customizing How Something Gets Marshaled
Jwno uses Windows’ low-level keyboard hook to intercept global key bindings, so it has a rather convoluted keymap system, to adapt to the requirements of using that hook.
Here’s roughly how Jwno handles keyboard events:
- The main thread sends the whole keymap (which contains bound commands), by marshalling it, to the thread responsible for keymap handling.
- When keyboard events arrive at the keymap thread, it tries to match a key binding in the keymap.
- If there’s a match, the keymap thread sends the corresponding command back to the main thread, by marshalling it again.
- The main thread receives the command and runs it.
Note that commands get marshaled twice in this process, and they may contain user-defined functions to run custom code. After marshalling (and unmarshalling), the command coming back to the main thread is a distinct copy of the original command, and so are all the variables that have been captured by the function closures it may contain. This imposes a limitation on these user-defined functions: They cannot access mutable states outside their own scopes.
For example, suppose we have this code:
(var my-flag false)
(defn my-custom-action [&]
(if my-flag
:do-this
# else
:do-that))
(:define-key root-keymap "Win + S" [:split-frame :horizontal nil nil my-custom-action])
(:set-keymap (in jwno/context :key-manager) root-keymap)
And later, in the main thread, we try to alter the behavior of my-custom-action
:
(set my-flag true)
This gives a surprising result: If triggered by the key binding, my-custom-action
in the :split-frame
command will always run the :do-that
branch. But if we call my-custom-action
directly in the main thread, it swithes to the :do-this
branch correctly.
I did find a solution to this recently, and it turned out Janet already has good support for it: Exclude functions altogether when marshalling a keymap. All we need to do is to pass a “reverse-lookup table” to janet_marshal, telling the marshalling code to replace function objects with our placeholders.
We can simulate this “customized” marshalling in the REPL too:
repl:1:> (var my-flag false)
false
repl:2:> (defn my-fn [] (if my-flag :do-this :do-that))
<function my-fn>
repl:3:> (def rlookup @{my-fn 'my-placeholder})
@{<function my-fn> my-placeholder}
repl:4:> (def lookup (invert rlookup))
@{my-placeholder <function my-fn>}
repl:5:> (def dummy-lookup @{'my-placeholder 'my-placeholder})
@{my-placeholder my-placeholder}
We have my-fn
depending on my-flag
to do its work, along with some lookup tables. Now we want to send an array containing my-fn
to another thread, without my-fn
tagging along, so we do the marshalling like this:
repl:6:> (def buf (marshal @[my-fn :other-info] rlookup))
@"\xD1\x02\xD8\x0Emy-placeholder\xD0\nother-info"
Then we send both buf
and dummy-lookup
to the other thread. If the other thread needs to access the array, it can still unmarshal the data using dummy-lookup
:
repl:7:> (def arr (unmarshal buf dummy-lookup))
@[my-placeholder :other-info]
Notice how my-fn
turned into my-placeholder
. The other thread can also use dummy-lookup
to marshal and send a block of data back to the main thread:
repl:8:> (array/push arr :more-info)
@[my-placeholder :other-info :more-info]
repl:9:> (def buf2 (marshal arr dummy-lookup))
@"\xD1\x03\xD8\x0Emy-placeholder\xD0\nother-info\xD0\tmore-info"
And when the main thread uses lookup
(instead of dummy-lookup
) to unmarshal the data that came back, it can “restore” my-fn
:
repl:10:> (def arr2 (unmarshal buf2 lookup))
@[<function my-fn> :other-info :more-info]
Now we can verify that it’s indeed the original function, not a copy:
repl:11:> (= my-fn (first arr2))
true
repl:12:> (set my-flag true)
true
repl:13:> (apply (first arr2))
:do-this
Some Conclusions
When building Jwno, I learned to be careful about these gotchas:
- Spawning a new thread is quite a heavy operation, due to all the data copying. Instead of spawning threads ad-hoc, using fibers or a thread pool is usually better.
- Sending mutable data structures across thread boundaries often leads to surprising behavior.
- It’s dangerous to unmarshal the same buffer more than once (When using JANET_MARSHAL_UNSAFE).
But I think Janet’s threads are generally quite nice to work with. The high-level APIs are concise, and the low-level C APIs have enough “escape hatches”, that I can use to realize my crazy ideas. The Janet people really did a great job landing an elegant design.
And thanks for reading through this long post, you’re really tolerant of my mumbling 😄.
Get Jwno
Jwno
A tiling window manager for Windows 10/11, built with Janet and ❤️.
Status | In development |
Category | Tool |
Author | Agent Kilo |
Tags | janet, tiling, uiautomation, window-manager, windows |
More posts
- Jwno 0.9.12 Released33 days ago
- Jwno 0.9.11 Released58 days ago
- Scroll Jwno, Scroll!89 days ago
- Jwno 0.9.10 ReleasedDec 25, 2024
- Playing With WSLg WindowsDec 19, 2024
- How To Experiment With Your KeymapNov 26, 2024
- How To Use Jwno With Your Ultrawide MonitorsNov 14, 2024
- Jwno 0.9.9 ReleasedNov 07, 2024
Leave a comment
Log in with itch.io to leave a comment.