Dean Scarff on bringing AI to climbing grades
It’s some time in The Future; the bushfire smoke has cleared, you’re out of coronavirus lockdown and you’re psyched to finally climb outdoors again. Flicking through the guidebook you spot it; a climb that’s at your grade, and it’s even got two stars. The onsight awaits.
Several hours later you’re thrutching up a section of the route that feels like it must be at least two grades harder than what the guidebook claimed. The only hope for your ego is that you’re collateral damage from a 1970s interstate sandbagging contest. If only you could have stumbled on a recently-bolted softie that climbs like a gym route instead!
In Australia we’re blessed with the best grading system in the world courtesy of the legendary John Ewbank. Ewbank used whole number grades, which took into account a range of factors. These numbers put climbs in order: you can expect a grade 20 will be harder than a grade 19, and so on.
Yet even with this system, we rely on the opinions of first ascensionists and guidebook editors for fair grading. These opinions can be warped by grade inflation, quirky styles and intentional sandbagging. Another problem arises when trying to do maths with grade numbers. Normally we can add 1 to a whole number to get the next whole number, but there’s nothing consistent and measurable that’s equivalent to adding 1 to the grade of a route.
What if there was a way to quantify which routes were soft or sandbagged, some measure that reflected the experience of many climbers? Well, obviously you’d use it to find something soft to tick and impress your friends, right?!
FIGURE IT OUT FOR YOURSELF
Imagine you’re seeking glory and wander into an unfamiliar crag without a guidebook. Your mate, who can redpoint most 19s but has been working several grade 21 projects for months, proceeds to flash a route. You guess it’s grade 19 or lower. If your mate dogged it instead, you might guess it’s above grade 19. The more climbers of different skill levels trying the route, the more information you have to pinpoint the grade.
But what if you don’t know how hard everyone else climbs to start with? That’s fine, you can watch for climbers who are taking roughly the same number of attempts as you to send particular routes. When it’s your turn on the next route, you can guess that you’ll take the same number of attempts as they did.
This is the basis for a scientific grading system. Instead of using people’s opinions about how hard routes are, you observe the actual outcomes of the ascents. To put a single number on how difficult a climb is, you can use the odds that you’ll send it.
THE CHESS GAME
Last year, on weekends I couldn’t climb, I investigated how far I could take these ideas. I drew inspiration from what may seem like a surprising source: the world of boardgame player ratings. Games like chess have ratings systems for ranking professional players, and I figured this could be applied to climbing.
These ratings systems are grounded in probability theory: the current ratings of two players can be used to calculate odds on the outcome of a game. The difference in ratings can be transformed to a probability using the ‘logistic function’, an S-shaped curve. If players have the same rating (0 difference), the odds are even. As the difference between the players’ ratings gets bigger, the probability the favourite wins gets closer to 100%. These systems allow for the ratings to be adjusted: after a game, the winner’s rating is adjusted up, and the loser’s rating is adjusted down.
I applied the same function for climbing. Every ascent is like a chess game, but one of the players is the rock itself. We can also make some other assumptions: it’s unlikely that climbers make big improvements overnight, and the longer it’s been since a climber has logged any climbing, the less certain we can be about their current performance level. All these statements about ‘what’s likely’ can be combined into a single statistical model. Based on this model, I built software that picks the best climber and route ratings to fit with a list of ticks.
All I needed now was data. To start with I just used my friends’ logbooks, which let me sanity-check the software I’d written. But with so many routes, the model needed more data to narrow down grades. So I made contact with theCrag.com, an online guidebook and logbook website. The team from theCrag gave me access to the hundreds of thousands of ascents records available from the platform, which I used to validate and refine the model.
We were stoked with the results of running theCrag’s users’ logbooks through my program. In most cases, the estimated grades are similar to guidebook grades, so the model is able to calculate a reasonable grade without anyone’s opinion as input. It’s also good at putting odds on clean ascents: adding 1 to a grade has a real meaning in terms of the odds someone can send a route. These are the theoretical improvements over conventional grading.
However, the real gold is where the model results and the guidebook disagree. For some routes, particularly rarely repeated ones, this seems to be due to a lack of data: the clean first ascent belies many hours of working a project. But in many cases climbers’ online accusations of soft or sandbagged grading were agreeing with the model. I’d found what I’d been looking for, and soon enough my friends were asking for recommendations of soft routes at the next grade.
There are still many opportunities for improvement. How good the estimated grades are depends a lot on the ascent data. Picking the right grades relies on climbers logging all their ascents online, even their hangdogs. And there are lots of other factors that are currently left out of the model to keep it simple, such as the effect of particular styles, reachy moves, knowing the beta and broken holds.
For me, working on this project was an opportunity to combine two of my hobbies (climbing and programming), while also making something for the climbing community. I’ve made the open source estimation software available for free, and I’ve also published an academic paper detailing the theory and performance of the model. I’m also very grateful for the willingness of Simon Dale and his team at theCrag.com to work together.
As for the revised grades, a first version of them has been incorporated into route listings on theCrag.com – you can see it listed amongst the Grade Citations under the name grAId (a play on the word ‘grade’ and ‘Artificial Intelligence’).
The team from theCrag is currently working on having the grAId continuously adjusted as new data becomes available, to complement their existing CPR system.
Some Australian routes the model predicts as soft for their grade:
- The Bandoline Grip (18), Shipley Upper, NSW
- Truancy Officer (20), Dam Cliffs, NSW
- Prima Donna (22), Brooyar, Qld
- Tribal Monkeys (23), Kalbarri, WA
- Screaming Insanity (26), Mt Coolum, Qld
- Thrust Gut (26), Blue Mountains, NSW
Some Australian routes the model predicts as sandbagged for their grade:
- Wobblebuns (18), Nowra, NSW
- Thyeses Feast (19), Bob’s Hollow, WA
- Loop the Loop (25), Shipley Upper , NSW
- Rubber Lover (25), Centennial Glen, NSW