Click to view our Accessibility Statement or contact us with accessibility-related questions

Keymap optimization: language statistics and important indicators

more_vert
search
Welcome back to this series where we’re designing kick-ass keymaps! After covering basics like how good/bad QWERTY is, the power of layers and the potential of custom keymaps, we took the first real steps in designing your tailor-fit keymap by looking into some options for compiling a corpus in general and also with a more useful personal corpus in mind.
Quick recap: in this context, corpus is simply a fancy name for a big chunk of text.
Today, we’re going to analyze your corpus (or pretty much any text if you haven't done your homework yet) and discuss some basic language statistics along with common metrics that can be used to quickly evaluate a keymap, and also to compare layouts. This is the next logical step in our journey if you're aiming to craft the optimal keymap for yourself. Character/bigram/trigram frequencies To begin with, let's examine the character frequencies in our corpus. The occurrence of different letters can vary significantly not only between languages but also withing texts written in the same language. That's why I prefer a personal corpus over a general one and invest time and effort into compiling my own.
Tip: I put together a quick and dirty tool to calculate letter frequencies. Feel free to input any text, hopefully your own corpus, and press the 'Calculate' button. Results include character frequencies and also bigram and trigram frequencies.
A bigram is simply a pair of letters or other characters. In fact, it is much more important than mere character frequencies because a bigram represents an actual movement of fingers while typing -- from one key to another. After all, typing is all about movement rather than a static state. We will use bigram frequencies to calculate the majority of common metrics and indices we'll mention in this and future articles.
search

Image: Letter frequencies of this very article

Trigrams, similarly, are combinations of three characters. They are primarily used to evaluate the frequency of redirects (more on this later) but can also be used for assessing rolls and hand alternation. Using heat maps to represent letter frequencies is an excellent way to highlight one of the most striking shortcomings of QWERTY:
search

Image: Letter frequencies vs QWERTY...

Home row, home box, home positions Knowing the most frequent letters is important so you can put them on the home row. You can think of either the 8 home positions, without the inner/outer keys of the home row, or there's another approach called the home box and its variants (including some easy-to-reach keys of the top and bottom rows as well).
search

Image: Characters in the home positions - QWERTY (orange) vs Colemak (blue) All in all, our goal is to place the most common alphas in the best and most comfortable places. This way you can ensure that frequently used letters are right under your fingertips in their resting positions, minimizing the need to leave the home row. Finger travel Since we have all the necessary dimensions of a keyboard, we can calculate the exact distance our fingers travel while typing a specific text. This is an interesting metric that can be used to describe how easily a text can be typed. Different physical layouts may significantly affect this -- such as the standard layout vs column-staggered splits --, but also standard spacing vs Choc or CFX. That said, your keymap will make a huge difference! Row changes Leaving the home row may not sound that scary -- but it is. (Boo!) It might seem like a minor inconvenience, and while there will certainly be much worse key combinations on our journey, minimizing row changes is still important. Row changes can also increase typos, especially when you're trying to find your way back to the home positions. Hurdles (row skip)

August Dvorak, the creator of the Dvorak layout, referred to this as 'hurdle', so let's stick with that terminology. It represents the worst kind of row change, where you also skip one or more rows. Consider typing 'minimum' or 'December'. It's a slow and uncomfortable pattern that you should aim to avoid.
search

Image: Row skips with QWERTY (orange) vs Colemak (blue)

Same finger keypress, SFB Pressing two consecutive keys with the same finger is relatively slow and inefficient. Some of these occurrences can be avoided by rearranging your keys, while others are naturally ingrained in the language, such as double consonants or vowels. Typing out 'Mississippi' in QWERTY vs any optimized layout won't really make any difference with regards to SFBs (Same Finger Bigrams), because, well, you have to hit the very same key wherever it's located on your keyboard. (Pro tip: you can define a dedicated "repeat" key for all these occurrences, but this is really nerdy territory.)
search

Image: Same finger keypresses - QWERTY (orange) vs Colemak (blue), and natural SFBs (magenta) However, the number of other SFBs can be drastically decreased with proper planning and optimization. Hand alternation vs rolls Here are two antagonistic indicators, so let's cover these together. Alternating hands is preferred by some, as it can create a more rhythmic typing experience. For example, placing all the vowels on one side of the keyboard (as seen in the Dvorak layout and many others) is a good indication of optimization for alternation. However, there are also rolls, which are my personal favorites. Rolls are those smooth sequences that you can execute lightning-fast by rolling fingers from the same hand. I hope you know what I'm talking about because I find most definitions out there unsatisfactory -- too lose or vague for no reason. Instead, I prefer to define a select few 'winner' moves, excluding not just stretches and scissors, but also all the combinations involving the pinkies.
search

Image: Rolls - QWERTY (orange) vs custom1 (cyan) vs custom2 (green) keymap. QWERTY is quite bad at rolls, but maybe 'er' and 'ou' could be mentioned as common examples, and 'few' as a stacked roll ("onehand"). In the example above I generated a custom but relatively balanced keymap favoring rolls while penalizing redirects (cyan), and a second one clearly optimized for rolls (green). Obviously, alternation kills rolls and vice versa, so you either strike a good balance or optimize for one at the expense of the other. However, hyper-focusing on rolls may lead to unwanted redirects -- they start to appear in the previous custom2 (green) example as well: 'that'. Redirects What are redirects? They are pure evil. If you optimize your keymap with the goal of maximizing rolls, you may end up with a lot of redirects -- changing directions while typing out a single trigram. For example, 'sad'. In theory, it consists of two roll(ish) patterns, but in practice, it's a really awkward and uncomfortable movement. Inner/outer keypress, lateral stretch Keypresses in the outer or inner columns (using the index or pinky fingers) are inconvenient on their own, but when two of these are combined, it can be literally painful. In the English alphabet, there are relatively few letters, but in other languages with many accented or national characters, frequently used letters often end up on these keys. In such cases, minimizing lateral stretch becomes especially important. Other indicators Of course there are many more metrics and indicators: scissors, disjoint finger bigrams, hand and finger balance, specific finger patters, etc. You can also come up with original indices for your own optimization process. But the key point is that when comparing two layouts, you need to consider all the important indices simultaneously. As we've seen, some of these metrics are antagonistic or mutually exclusive, meaning that improving metric A will lead to a decline in metric B. Layout optimization is all about this tricky process of prioritizing, balancing, and testing. Summary So, we've covered the most important indicators. You can generate letter frequencies and understand concepts like SFBs, alternation, rolls, hurdles, etc. Now we're ready to use this knowledge to quickly evaluate layouts and to compare your freshly baked keymaps with each other or with well-known reference layouts. After getting familiar with the basics, we can finally begin actual layout optimization -- yay! -- starting on an intuitive level. How exactly? We'll explore that next time!
(Edited)
2
Comment
remove_red_eye
473
dovenyi
77

search
close

Let’s get the conversation started!

Be the first to comment.

Related Posts
Trending Posts in Mechanical Keyboards