Dean Scarff on bringing AI to climbing grades

It’s some time in The Future; the bushfire smoke has cleared, you’re out of coronavirus lockdown and you’re psyched to finally climb outdoors again. Flicking through the guidebook you spot it; a climb that’s at your grade, and it’s even got two stars. The onsight awaits.

Several hours later you’re thrutching up a section of the route that feels like it must be at least two grades harder than what the guidebook claimed. The only hope for your ego is that you’re collateral damage from a 1970s interstate sandbagging contest. If only you could have stumbled on a recently-bolted softie that climbs like a gym route instead!

In Australia we’re blessed with the best grading system in the world courtesy of the legendary John Ewbank.  Ewbank used whole number grades, which took into account a range of factors. These numbers put climbs in order: you can expect a grade 20 will be harder than a grade 19, and so on.

Yet even with this system, we rely on the opinions of first ascensionists and guidebook editors for fair grading. These opinions can be warped by grade inflation, quirky styles and intentional sandbagging. Another problem arises when trying to do maths with grade numbers. Normally we can add 1 to a whole number to get the next whole number, but there’s nothing consistent and measurable that’s equivalent to adding 1 to the grade of a route.

What if there was a way to quantify which routes were soft or sandbagged, some measure that reflected the experience of many climbers? Well, obviously you’d use it to find something soft to tick and impress your friends, right?!


Imagine you’re seeking glory and wander into an unfamiliar crag without a guidebook. Your mate, who can redpoint most 19s but has been working several grade 21 projects for months, proceeds to flash a route.  You guess it’s grade 19 or lower. If your mate dogged it instead, you might guess it’s above grade 19. The more climbers of different skill levels trying the route, the more information you have to pinpoint the grade.

But what if you don’t know how hard everyone else climbs to start with?  That’s fine, you can watch for climbers who are taking roughly the same number of attempts as you to send particular routes. When it’s your turn on the next route, you can guess that you’ll take the same number of attempts as they did.

This is the basis for a scientific grading system. Instead of using people’s opinions about how hard routes are, you observe the actual outcomes of the ascents. To put a single number on how difficult a climb is, you can use the odds that you’ll send it.


Last year, on weekends I couldn’t climb, I investigated how far I could take these ideas. I drew inspiration from what may seem like a surprising source: the world of boardgame player ratings. Games like chess have ratings systems for ranking professional players, and I figured this could be applied to climbing.

These ratings systems are grounded in probability theory: the current ratings of two players can be used to calculate odds on the outcome of a game. The difference in ratings can be transformed to a probability using the ‘logistic function’, an S-shaped curve. If players have the same rating (0 difference), the odds are even. As the difference between the players’ ratings gets bigger, the probability the favourite wins gets closer to 100%. These systems allow for the ratings to be adjusted: after a game, the winner’s rating is adjusted up, and the loser’s rating is adjusted down.

I applied the same function for climbing. Every ascent is like a chess game, but one of the players is the rock itself. We can also make some other assumptions: it’s unlikely that climbers make big improvements overnight, and the longer it’s been since a climber has logged any climbing, the less certain we can be about their current performance level. All these statements about ‘what’s likely’ can be combined into a single statistical model. Based on this model, I built software that picks the best climber and route ratings to fit with a list of ticks.

All I needed now was data. To start with I just used my friends’ logbooks, which let me sanity-check the software I’d written. But with so many routes, the model needed more data to narrow down grades. So I made contact with, an online guidebook and logbook website. The team from theCrag gave me access to the hundreds of thousands of ascents records available from the platform, which I used to validate and refine the model.

We were stoked with the results of running theCrag’s users’ logbooks through my program. In most cases, the estimated grades are similar to guidebook grades, so the model is able to calculate a reasonable grade without anyone’s opinion as input. It’s also good at putting odds on clean ascents: adding 1 to a grade has a real meaning in terms of the odds someone can send a route. These are the theoretical improvements over conventional grading.

However, the real gold is where the model results and the guidebook disagree. For some routes, particularly rarely repeated ones, this seems to be due to a lack of data: the clean first ascent belies many hours of working a project. But in many cases climbers’ online accusations of soft or sandbagged grading were agreeing with the model. I’d found what I’d been looking for, and soon enough my friends were asking for recommendations of soft routes at the next grade.

There are still many opportunities for improvement. How good the estimated grades are depends a lot on the ascent data. Picking the right grades relies on climbers logging all their ascents online, even their hangdogs. And there are lots of other factors that are currently left out of the model to keep it simple, such as the effect of particular styles, reachy moves, knowing the beta and broken holds.

For me, working on this project was an opportunity to combine two of my hobbies (climbing and programming), while also making something for the climbing community. I’ve made the open source estimation software available for free, and I’ve also published an academic paper detailing the theory and performance of the model. I’m also very grateful for the willingness of Simon Dale and his team at to work together.

As for the revised grades, a first version of them has been incorporated into route listings on – you can see it listed amongst the Grade Citations under the name grAId (a play on the word ‘grade’ and ‘Artificial Intelligence’).

If you are after a soft 26 then grAId suggests you have a look at Screaming Insanity (shown above) in the Coolum Cave.

The team from theCrag is currently working on having the grAId continuously adjusted as new data becomes available, to complement their existing CPR system.

Some Australian routes the model predicts as soft for their grade:

Some Australian routes the model predicts as sandbagged for their grade:

12 thoughts on “MAKING THE GRADE

  1. Tom Hoyle

    It is cool that someone put in the time and effort to make one of these. Is there any way of building in a conditions factor as an externality? In chess, the factors are the two players and how recently they’ve played/current form. In climbing, there is the climber, the route, the conditions when they tried it and all the past successful climbers and the conditions for all those people. It seems like an added layer of variability. Maybe it evens out over time? But for people climbing at their limit and at a crag or area that isn’t local and so they aren’t necessarily going to be able to wait for ‘better conditions’, some ascents are going to have more attempts simply due to trying in sub-optimal conditions. It seems this could influence the data. I don’t mean this as a criticism of the model, some data is better than no data, just an observation on the idea of it all.

    1. Dean Scarff

      Hi Tom, good idea. While the model could be modified to account for conditions, the challenge is collecting enough data in a structured way. Right now, none of the ascents on theCrag have data about the conditions in a structured format, so that kind of analysis would have to start collecting data from scratch.

  2. Daniel

    Such a good idea! May as well do something with all that data the crag is collecting.

    I guess a problem with this is that people are probably more likely to record their hangdog and bail attempts on harder climbs, while not as likely to put down a failed attempt where they fell off a climb at a grade they probably expected to flash/onsight?

    This would likely lead to easy/moderate climbs being graded down due to less incomplete ascents and as climbs get harder they may get graded up as more ‘working’ ascents get logged?

    The crag could help with your data collection here by adding a button for number of attempts taken to send when you log an ascent that could be for data collection purposes only, rather than having to add 10x individual hangdog logs? That would probably be useful for accuracy of their CPR system as well.

    I know if I hangdog a few times before I send I don’t usually log 3-4-5 hangdogs and then a redpoint, I will probably log 1x hangdog and 1x redpoint or sometimes just a redpoint, which doesn’t really give an accurate view of how long it took me to climb it from a data perspective.

  3. Roland Foster

    Hi Dean, I enjoy a discussion about grading as much as the next person, but unfortunately I think you have completely misunderstood the challenge of grading rock climbs because difficulty can never be an objective feature of a climb like its height or the number of bolts it has. This is because difficulty exists in the relations between a subject (the climber) and the climb which isn’t completely objective either as Tom notes about conditions – think how much harder a route covered in dust is, or after rainfall has washed off all the chalk, compared with a route where the previous ascentionist has kindly brushed all the key holds (but still left it obvious where the holds are). Then there is the temperature, humidity, cloudiness and shadyness among other variables of conditions such as seepy pockets or overly caked holds. All of these factors can lead to a climb feeling hard or soft on any given day.
    Yet, variability of conditions pales in comparison to understanding the subjectivity of the climber. The most obvious is body size – grading tends to default to how hard a 180cm male thinks it is, but a 150cm female might find turning the lip of roof that requires a mantel much easier than the male, conversely reachy climbs may feel 2-3 grades harder for the shorter person, and compression moves may be physically impossible if you can’t span between the sidepulls, yet be quite straightforward if you can.
    There are other fairly stable things about our bodies that can also alter our perception of the difficulty of particular climbs such as the flexibility of our hips, hamstrings, shoulders and ankles, and the ingrained movements our bodies have become conditioned to perform. This might be big moves on big holds on steep problems in the bouldering gym, or like me it might be a lifetime of climbing on small holds on 80°-100° walls which combined with my 160cm stature means I tend to find that powerful moves through steep terrain seem several grades harder than the given grade, but vertical crimpfests often seem quite soft for the grade, yet they might shutdown a climber who can climb 3-4 grades harder on steep ground. The ability to utilise technique quickly can also make a huge difference – just look at Magnus Midtbø and Pete Whittaker climbing Melvyn Bragg 7B on gritstone. The ability to jam various widths of cracks or horizontal breaks can allow one person to completely de-pump while the other has fallen off because they couldn’t find the jam, and of course our hand sizes vary dramatically too – one person’s rattly fingers is another’s bomber handjam. The variability of climbing bodies should never be left out to keep things simple in order to make an algorithm work just because the Crag doesn’t collect that data.
    There is also another important aspect to subjective climbing and that is our condition on and within the day. Is it our fourth day on with bleeding tips, were we on the turps till 3am the previous night, or have we had three perfect rest days with excellent nights’ sleep? Yet these factors are also sometimes confounded – you climb brilliantly when you thought you were exhausted, or have a heavy gravity day when you thought you were going to be really strong.
    Strategy during the day also complicates how difficult a climb might feel. Did you get on something too hard too early and get flash pumped and consequently ruin the rest of the day, or did you arrive at a hard route in that sweet spot of nicely warmed up but not worn out, and everyone has a different decay curve over the rest of the day, which will also vary between days for the same climber – you might be able to climb well all day at 80 per cent effort but only for an hour if you’re giving a climb 100 per cent. You are often able to recognise how well you are climbing on any given day if you’re on familiar routes, but if you have never tried a climb before it can be hard disentangle whether it’s the route’s conditions, your dimensions and skills or your condition on the day that makes a route seem hard (I think ‘sandbag’ should be reserved for those routes that were deliberately undergraded for egotistical or comedic reasons – I’m thinking of Mark Moorhead and Mike Law in particular, but not every old thrutchy route that seems stiff due to recent grade inflation) or easy for the grade.
    I find it rather strange that people are interested in going after the soft ticks rather than the best routes at a particular crag – surely you know you’re only kidding yourself how good a climber you are. When I did Thrust Gut in 2017 I thought it was at least two grades overgraded and a quick perusal of The Crag and confirmed that most people thought it was very soft too, which is why it gets 24 in the 2019 Blue Mountains guide, it didn’t take an algorithm to figure it out, but if you’re not honest with yourself then your ego will be bruised every time your breakthrough grade routes get downgraded. There is another tendency that is exacerbated by rating databases, such as that contribute towards a score though, which is that some people will happily take a higher grade even though they know it felt far easier for them in order to keep their score high.
    Another aspect that complicates grading is that whole areas might have internally consistent grading but they can feel hard or soft compared to another area. I’m thinking of the Blue Mountains (the Kalymnos of Australia?) versus the sport routes at Arapiles, for instance. It is often easier to see what you have to do in the Blue Mountains and you just have to execute the moves, whereas you can be completely baffled by a crux at Arapiles, because it is more three dimensional, yet once you have figured it out it may not be that hard to redpoint, or flash if you have watched someone else figure out the moves for you.
    What we have to remember is that a route starts out being climbed by one person who gives it a grade and there are plenty of reasons why that initial grade may not be particularly accurate; they may be able to climb much harder and have a poor idea of how hard something 5-10 grades easier is, or they may over grade to get the route attention and popularity, or undergrade in an attempt to humiliate their friends and other ascentionists, or they might have had to put in a huge amount of effort on a route that basically didn’t suit them and thought it was much harder than other people found it (I’m pretty sure I’ve been guilty of all these mistakes at some time or other, but usually not intentionally). But initial grades carry a certain inertia as repeaters don’t want to upgrade someone else’s climb or they might be happy to claim a soft tick, however once a few climbers have repeated a route it is usually easier to arrive at some sort of consensus grade – bearing in mind that difficulty occurs on a continuum not as discrete number and climbers will experience that difficulty individually depending upon their own strengths and weaknesses and the conditions they encounter. I really don’t think an algorithm can capture the complexity of grading very successfully, rather it is more helpful to know the first ascentionists reputation and your own climbing preferences and abilities, and not get too hung up on grade chasing as it will undermine your enjoyment of moving on rock and solving interesting climbing problems

    1. Daniel

      Roland, everything you have described as a critique of the algorithm there can equally be directed as a critique of any sort of grading scale. If we are going to use a grading scale at all, which is by nature highly subjective and prone to all of the weaknesses and misinterpretations that you describe, then why would we not use an algorithm to try and at least get it as close to a ‘true’ grade as possible? There are many climbs that are sandbagged or overgraded that never have their grade changed despite unanimous community consensus and the feedback of every climber getting on it being ‘that was soft/sandbagged af’. Instead of climbs just being a sandbag forever because the first ascensionist felt like it for whatever reason at the time, why not strive for a system that is more accurate.

  4. roger

    The point is there is no ‘true’ or ‘accurate’ grade. Climb grading is stupendously subjective and the pursuit of objectivity futile. Climbing would be more boring, and nothing more than gymnastics if grading wasn’t so subjective.

  5. Roland Foster

    Daniel, the whole point of my post is that there is no ‘true’ grade that is waiting to be uncovered with enough data – the difficulty of a particular climb is experienced subjectively by each individual climber based on the factors I mentioned and many others. I think one of the main problems with Dean’s article is his use of the chess game as an anology for climbing because chess is a completely closed system – only certain moves are possible on the 64 squares and there is no physical input by the players. This is a massive simplification when compared to climbing which involves myriad rock types in all sorts of conditions and a huge array of climbing bodies who are continually fluctuating on their own performance curves.
    Is there really a problem that this ‘artificial intelligence’ helps to solve? I don’t think there are that ‘many’ climbs that are overgraded or undergraded that don’t get changed. I have only done or tried three of the climbs Dean mentioned, as I said in the previous post Thrust Gut has already been downgraded to 24 and is probably only at the lower end of that grade, I wouldn’t presume to know whether The Bandoline Grip is hard or easy for 18, and I fell where everyone does on Loop the Loop – at the last move, but it certainly didn’t feel harder than many Arapiles 25s. Its staunch reputation may stem from its undoubted quality which might encourage many climbers who aren’t quite up to the last move crux to try it, but who can do shorter less pumpy 25s.
    A route I did find at least one grade undergraded (and poorly bolted) was The Loch Ness Whippet 23 at Mt Piddington, yet when I looked on The Crag there were few if any ascents. TLNW is a really excellent climb that has one stupidly placed bolt, but it also seemed really hard for the grade in the exact style I would regard as my forte. If everyone avoids a route because they know it’s hard, or a bit scary how will the algorithm pick that up?
    On the other hand with popular routes it doesn’t take much research on the Crag, or even reading the guidebook to find the soft touches or the hard as nails problems. Phrases like ‘reachy’, ‘bunched’, ‘soft’ or ‘considered hard for the grade’ impart far more information than some mythical ‘true’ grade which relies on only a (possibly) small subset of the climbing population – those who can be bothered logging every attempt they make on a climb, who, I suspect, (without having looked at the evidence) are younger and newer to climbing because I certainly know many climbers who would never bother to list their ascents on a website let alone multiple attempts or repeats of a particular climb because they would prefer to go and do more climbing instead. I keep a very detailed climbing diary – but it’s for improving my own climbing not broadcasting to the world my successes and failures
    Is there really a problem to be solved if climbs are soft or hard for the grade? I can see this might be an issue on severely undergraded trad routes but the algorithm specifically excludes trad routes because of the complexity involved in placing trad gear, whereas a simple exclamation mark or skull more effectively conveys the perils of certain routes. If you do get shut down on an undergraded sport route come back when you’re a better climber or realise that the climb didn’t suit your strengths or that you are very weak at a particular style, movement, or hold type – climbing is an ongoing learning process and you will always be able to improve the subtlety of your climbing technique even if strength, power or endurance are starting to wane. Conversely, what is the issue if a climb is considered soft for the grade, I usually feel disappointed if a route feels easy for the grade especially if it is supposedly near my limit – I certainly don’t think I’m a great climber if some supposedly highly graded climb goes easily. I still vividly remember the excitement of my first climbs at a new grade but I soon came to realise that I could only think of myself as that grade climber when I could climb that grade on different styles of climb and on different types of rock. As I mentioned in the previous post if you go hunting all the soggy ticks who are you kidding – only yourself.
    I actually get far more satisfaction doing something that is considered hard for the grade by the skin of my teeth, whether that’s redpointing a 27 in optimal conditions, solving a tricky 23 boulder problem first try, or a late in the day ascent of what feels like a staunch 19 that I’m pumped witless on, yet all these climbs could have different outcomes from a moment’s indecision – does that make them any harder? I don’t think it does, rather it illustrates there are myriad ways to fail.
    One classic way of thinking about why we climb is based on the idea of intrinsic or extrinsic motivation, with the former based around personal reasons for climbing while the latter is based on external factors such as impressing others, but we all exist on a continuum, and I would say I have moved more towards the intrinsic end of that spectrum as my climbing has progressed, but when we are motivated by grade improvements to the exclusion of other types of climbing then we tend to have a more extrinsic motivation for climbing and this can more easily undermine our enthusiasm for climbing if we don’t recognise that some climbs will feel much harder for us and others will suit us down to the ground. Don’t get sucked into thinking certain climbs should be upgraded because they feel much harder than others you’ve done of the same grade because we all have our blind spots when it comes to grading climbs, including the author of the article, but thank you Dean and the other commenters for stimulating discussion on an endlessly fascinating topic, and encouraging me to try and set down my thoughts coherently. I look forward to the ongoing discussion.

    1. Dean Scarff

      Hi Roland, you bring up a lot of great points about the complexities and shortfalls of trying to use a single number to summarize how a climber will perceive the difficulty of a climb. However, the ratings produced by the model are estimates of something subtly different, namely the proportion of successful ascents we could expect on average over some large population of climbers. This an abstract statistical concept, but it does have the advantage of being objective and scientifically verifiable (in that the error can be quantified with additional data). There are of course a bunch of caveats regarding the assumptions of the model and the quality of the data, but the essence really is an objective measure. Like conventional grades and even beta, this doesn’t define how a particular climber will feel climbing a route on a particular day, but is information climbers can use to make decisions.

      You’ll also have to forgive me for trying to engage climbers’ egos when writing the article; I hoped this would add some emotional engagement with grades to complement the relatively dry statistical explanations. We all have our own reasons to get psyched.

      In the context of Loop the Loop, I’ll leave you with what Mike Law told me about Ewbank’s grades:
      “I’d add that the difficulty of a climb is how hard it is to tick. ‘It’s easy but I keep falling off because I’m pumped’ means it’s hard”

  6. Michelle

    This is so cool! If only the crag had data on climber height, then you could create an individualised grade predictor. Nice work!

  7. Kinly

    Does this mean as average climbing skill improves generationally (assuming it would), the grades of climbs as perceived by this model would be effected?

  8. Sam

    I’m not so sure about the statement that the Ewbank system is the best grading system, as said at the start of the article. The only grading systems that I know of which work for all the complexities involved are Mount Cook, UIAA, Scottish Winter and British Trad.

    At least from the perspective of trad/alpine climbing. There is very little transparency about all the factors that contribute to a grade, unless the guidebook might also say “graded high for seriousness”, which many times they might not. In which case, you’re only getting half the picture. Sure you could say that this affords climbers the opportunity to have an adventure and be surprised. But that’s not the purpose of a simply numeric grading system. There’s no point pretending that a specific numeric number can articulate the many factors that define a Climb’s difficulty. So yes, as is commonly said, grades are only a guide. Then why not make them nonspecific and have grade bands (at least for traditional climbing). Or even better, let’s just use the british trad system.

  9. Craig Rowley

    Certainly at KP in Brisbane and some of the trad climbs in the Glasshouses the first ascenders in the 60s and early 70s were very skilled and fearless often using home made gear as commercial gear wasn’t available. Hence some of there gradings are sandbagged by today’s standards. At KP some dangerous runouts remain due to this early grading so some climber skill in part relates fear control on lead. Some climbs are highly inaccurate but most are OK. Would be great to be able to have a AI programme to suggest alternate grades for discussion.


Leave a Reply

Your email address will not be published. Required fields are marked *

JAz1yeIjXyvA rl

Please type the text above:

To download your free edition of Vertical Life Mag, please login to your account or create a new account by submitting your details below.

Sign Up






Lost your password?