Two Wrongs

When Is a Counter-Strike Player Good?

When Is a Counter-Strike Player Good?

I participated in a Counter-Strike tournament with a few coworkers during April. Not that I’m particularly good at the game1 I only played very casually back in the 1.6 days, peaked at the dmg rank in cs:go and 14,000 in cs2. This puts me somewhere near the 50th percentile worldwide, i.e. as average as they come. but I was the least bad fifth player they could scrape together. Although my coworkers are better players than I am, we ended up being one of the worst teams in the tournament. However! We did improve a lot from the start to the end, and that’s the more important thing in my book.

One of the things we used to help us improve was the Leetify website. The reason is they have a 2D match viewer, where we could upload our games and look at how we played and spot mistakes as well as things that worked well.

They also have something they call the Leetify rating, which attempts to capture how good a player is. This is difficult buisness, so it’s time for a quick diversion into what Counter-Strike is.

Counter-Strike is a tactical team game

Counter-Strike is played by two teams with five players in each.

  • One of the teams has the objective of defending two locations on the map for two minutes. If they manage to do that, they get one point.
  • The other team has as its objective to assault any of the two locations on the map, gain control over it, and maintain that control for 40 seconds. If they manage to do that, they get one point.

When any team has scored a point, the game resets and the next round is played. After 12 rounds have been played, the teams swap sides so the attacker gets to defend and vice versa. The first team to score 13 points wins the match.

We are talking about a first-person shooter, so if you’d guess that the primary means of attack and defense are shooting players on the other team, you’d be correct. But it’s worth emphasising that one of the draws of the game is that shooting is not the objective; it is theoretically possible (though highly unlikely) for either team to score a point without firing a single bullet.

This sets up the typical economy-of-force problem for the defenders (they have five players to allocate to two different locations) and concentration-of-force problem for the attackers (unless they spot an opportunity for a penetration attack, they will have to envelop the defenders, but they have just five players to allocate to fixing and flanking forces.) This is what makes it a tactical game at its heart – but it goes deeper.

Items and economy in Counter-Strike

In addition to shooting with different kinds of firearms, the attackers and defenders have access to a few items that help them achieve their goals. Two notable examples of items are:

  • Smoke screens create temporary walls on the playing field. The smoke obscures the view, but it is also difficult to pass through in practise because a player outside the smoke has better visibility into it than a player inside of it has out. This puts the player inside the smoke at a gunplay disadvantage.
  • Flashbangs temporarily blind and deafen players that look at it as it detonates. While smoke screens have roughly the same effect on all players, skillful deployment of flashbangs can have the blinding effect on the opponent without any effect on the team that used it.

The reason I bring this up is that using these items can be counter-intuitive2 One of my favourite aha! moments was when I realised that if my opponent has used a smokescreen to deny me entry somewhere, I can often deploy a smokescreen of my own just beyond that of my opponent, which gives me a protective wall between myself and my opponent as I wade through their smoke, and then gives me the choice of which angle I want to engage them from around my smoke. It’s one of those Columbus’ eggs that sound obvious once you’ve heard it – yet most people don’t think of it before they hear it. and a skill unto itself, completely separate from the shooting. These items (as well as the firearms used) need to be exchanged for in-game money, which is earned primarily by scoring points. Economising on in-game money is yet another skill that matters a lot – the right firearms and items give such an advantage it is often worth not buying anything one round (and run a large risk of giving the opponent a point) in order to be able to buy fully the next round.3 And, perhaps obviously, the team needs to coordinate buys.

When a player dies, their firearm and items appear on the playing field for other players (including oppoents) to pick up. If a player survives, they carry their firearm and items with them into the next round.

Resource management in Counter-Strike

In summary, one could say there are five resources both teams need to manage and trade off against each other during a full match:

  • Player health: many weapons in Counter-Strike kill with one well-aimed hit, but health still matters when using weaker weapons, or when fighting from behind cover.
  • Security areas: the locations one team denies the opponent access to.
  • Information: knowledge of opponent movements.
  • Time: both defender and attacker objectives are timed4 There are some nuances here I haven’t explained but I don’t think they matter for the thrust of the article..
  • Economy: consisting of firearms and items, as described above.

Some people would lump together security areas and information into the single umbrella of map control but I think it’s more clear to separate them.5 It makes a difference for in-game decisions. Spotting for information can be done in a way that doesn’t really let the spotter shoot anything, but also significantly reduces their risk of being shot. Spotting to be able to shoot, i.e. deny access, requires exposing oneself to damage much more.

As examples of basic and common tradeoffs,

  • A team yields one of their security areas to conserve health for future engagements.
  • A team sacrifices player health to gain information.
  • A team spends time to extend a security area.
  • A team sacrifices information gathering to have more time to execute on their objective.

There are many variations on this.

Evaluating Counter-Strike skill

Now, perhaps, it is clear why it is hard to evaluate individual player skill in Counter-Strike. There are many ways for players to contribute to winning a match. For example, making higher-level tactical choices that increase win probability will not have much of a quantifiable effect on in-game events attributable to that player. But even something as simple as deploying a smokescreen can be hard. A smokescreen that deploys 0.2 radii to the side of where it was meant to go can turn from being an advantage to a disadvantage by giving the opponent concealment rather than obstruction.

Trying to evaluate inidvidual skill has a history going back to at least 2010, when the hltv website launched it’s Rating 1.0. That is based on the observation that the traditional kills-over-deaths (k/d) ratio captures 75 % of the variance of game outcome. Practically speaking, the hltv 1.0 rating is a linear combination of the number of kills, deaths, and multikills per round. Notably, it is not at all concerned with progress against objective, because that’s hard to measure. A player could make a negative contribution to their team’s chance of winning, yet do it in a way that gives them a good hltv 1.0 rating.

They tried to fix that with the hltv 2.0 rating. Although it does factor in a variable called impact, it seems to still decompose perfectly into kills and deaths and proxies thereof.

Now we finally get into the Leetify rating. The people at Leetify have, from what I understand, started from a big list of all events (and some combinations of events) that can occur in-game, and based on the historic record of games, estimated how each of those events affect the chances of winning the round.

This means kills can be valued very differently, based on e.g.

  • If it’s the first kill of the round or happened late in the round. (Early kills are usually more impactful, i.e. change the probability of winning by more.)
  • If it was against a fully equipped opponent or one that didn’t buy to conserve in-game money for the next round. (Killing a lightly armed opponent usually doesn’t shift the probability of winning by as much as killing a fully equipped one.)
  • If it was isolated or part of a kill trade (where one player gets a kill but then dies to their opponent’s teammate right after.)
  • etc.

The Leetify rating of a player is the sum of these changes in win probability from the events attributed to player. In other words, a negative Leetify rating means the player actually reduced their team’s chances of winning compared to if they had run away and hid somewhere on the playing field. A positive rating means the player, on average, improved their team’s chances of winning.

Granted, I haven’t looked into many alternatives, but from what I know, the Leetify rating is the best proxy we have for individual player performance.

Modeling skill rating from common variables

It’s one thing to look at a rating after the fact. It’s another to learn how to drive in-game decisions from data. We could perform the analysis of win chance changes ourselves, but I’m lazy. If Leetify has already done it, let’s piggyback on their results.

The situation I’m particularly curious about is whether or not to take an firefight with an opposing player. To keep things simple, we can imagine it to be a situation in which we have to commit to this gunfight or abandon it entirely. If we commit, either us or our opponent will die. If we don’t, nothing happens.

When should we engage, and when should we go somewhere else?

There are some variables that could change as a result of this engagement, that are typically quoted on scoreboards after a game:

  • Number of kills,
  • Number of deaths,
  • Number of assists6 Defined – I think – as doing more than 30 % damage to an opponent that dies later in the round.,
  • Amount of damage dealt,
  • Number of multi-kills in a round (2K, 3K, 4K, 5K).

By downloading this data from a few of my own matches, it might be possible to see how each of these variables affect the Leetify rating, which we will take as the ground truth for when a player contributes positively to their team’s chances of winning.

Data validation

I have downloaded (well, scraped with some JavaScript and regexes) data from 28 matches I’ve played, so there are a total of 280 records, with each record specifying the per-round statistics of the above variables for each player in each match.

We can already conclude that matches where a player has killed the entire opposing team are rare enough that we shouldn’t include them in our analysis.

paste("Records with five-kill:", sum(df$mk5 > 0))
Records with five-kill: 0

How about four-kills, where one player kills everyone but one of the opposing team?

paste("Fraction of records with four-kill:", sum(df$mk4 > 0)/nrow(df))
Fraction of records with four-kill: 0

Surprisingly common! Let’s keep those for now. The other levels of multi-kill are also common.

Now, to get a better feel for the data, we can make a scatterplot of kill rate against Leetify rating.

cs-rating-pg-01.svg

Oh, wow. Already we get a clue as to how hltv concluded that the k/d ratio makes up 75 % of the explanation for winning! Just kill rate alone seems very strongly correlated with contributing positively to the team.

We do have some outliers:

  • There is one player in the data set that has 0 kills and 0 Leetify rating. If we look up the game in which that happened, that was a player that disconnected early and truly did not contribute at all. We can remove that data point for a more solid analysis.
  • Then there’s the group of five players that have a Leetify rating greater than 15 and a kill rate greater than 1.35. I looked into what set these apart, and it seems to be a combination of a few factors. Mostly, they got their kills near the end of the round when information is sparser and the element of surprise more important. Coupled with luck, good mechanics, and, in one case, cheating, that is apparently a really good strategy at my level of play.

    But! In this case we’re interested in how we can improve our early-and-mid round play, so we’re going to strike these records from the dataset also, to avoid their leverage tainting the analysis we want to do.

df <- df[df$kpr > 0 & df$kpr < 1.35,]

cs-rating-pg-02.svg

That’s better. Now let’s validate the other variables.

  • Kills strongly correlate with rating. (+0.85)
  • Deaths correlate negatively with rating. (-0.50)
  • Damage dealt correlate strongly with rating. (+0.81)
  • Multi-kills correlate with rating, but decreasingly so:
    • 2K (+0.61)
    • 3K (+0.51)
    • 4K (+0.29)

Perhaps surprisingly, assists are not correlated with rating at all! This makes sense though, when one sees an assist as a failed kill. It does, however, hint that getting the kill may be vastly more valuable than just dealing damage.

It is also worth mentioning that while damage dealt is strongly correlated with rating, it is also strongly correlated with kills (+0.91) so they are, practically speaking, the same variable. Coupled with the observation above – that kills are much more important than assists – we are getting quite a case against damage being significant.

Kills and rating

Let’s try on a first model, then. How well does kill rate predict rating?

summary(lm(rating ~ kpr, df))
Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) -13.3442     0.5030  -26.53   <2e-16 ***
kpr          18.3404     0.6764   27.11   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.74 on 272 degrees of freedom
Multiple R-squared:  0.7299,	Adjusted R-squared:  0.7289
F-statistic: 735.2 on 1 and 272 DF,  p-value: < 2.2e-16

In more mathsy form, where \(L\) is Leetify rating and \(k\) is kill rate:

\[L = -13 + 18k\]

The R² metric measures how much of the variation of the rating is explained by the model, and it is already up there near 75 %. For all its tactics and resource management, getting kills seem to be the main component of good Counter-Strike play. But that does not mean to become a good player, one needs to rush into more firefights! It might be that the players with high rating perform other actions that in turn put them in a good position to get kills. Important distinction.

But it also means that if a team mechanically outclasses their opponents (i.e. shoots at them better), then a very large amount of strategising is needed to overcome that disadvantage.

Kills, deaths, and rating

When one engages an opponent in a firefight, one runs the risk of dying oneself. We’ll add death rate to the model to see what happens.

summary(lm(rating ~ kpr + dpr, df))
Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -3.8132     1.1600  -3.287  0.00114 **
kpr          16.6049     0.6274  26.467  < 2e-16 ***
dpr         -11.3417     1.2755  -8.892  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.416 on 271 degrees of freedom
Multiple R-squared:  0.7909,	Adjusted R-squared:  0.7894
F-statistic: 512.6 on 2 and 271 DF,  p-value: < 2.2e-16

In mathsy terms, we now have

\[L = -4 + 17k - 11d\]

Adding death rate did not increase the explainatory power of the model by a lot (R² went up by 7 percentage points)7 This may at first be confusing, but it’s as simple as deaths being somewhat correlated with kills, so to a small extent the death variable was already included in the model through the proxy of kills., but it did something useful to the intercept: it put it closer to zero.

We would expect the intercept to be close to zero, because a player does not significantly disadvantage their team just by existing. The way we can interpret this is that the first model embedded the assumption of a certain number of deaths into the intercept. When we assume that a player that exists will die the average number of times, that makes their existence negative. Including the death rate as a separate variable gives players a more neutral existence, and then they can go out and make their own mistakes.

No further models

We already had a strong suspicion from the correlations that assists and damage dealt would be unimportant variables. This continues to be the case; they don’t add any explainatory power to the model, and assists are even Fisher insignificant.

Damage was a curious case, though. If we scale damage up to have similar-sized coefficients as the other two terms8 Specifically, divide it by 100 because it takes 100 damage to kill a player., and call the variable \(a\), the model with it included turned out to be something like:

\[L = -4 + 10k + 7a - 11d\]

In other words, the only effect including damage had was shifting some of the coefficient from kills to damage. If ever two variables sort of measured the same thing, these two are them!

Multi-kills are also strongly enough correlated with kills and they appear not to add anything useful to the model. That’s good to know, though – having two kills in two subsequent rounds is no better or worse than having two kills in one round and none in the next.

And that’s it. Those were the variables in the dataset I based this on. And they do, to be fair, explain 80 % of the Leetify rating, so it’s not a bad start.

Decision rules from analysis

Now, recalling that the latest good model was

\[L = -4 + 17k - 11d\]

we can create a decision rule. First, some basic observations:

  • The intercept is -4. This means that a player, just by existing, is providing negative value to their team. A player has to actively do something good to reach break-even.9 I don’t have a good explanation for what negative thing a player’s existence effects that is not covered by the death rate, but it could be small things such as standing in the way of other players, providing a false sense of security, blinding one’s own team with flashbangs, etc. Thus, if we think we won’t contribute positively to the match (and our goal is to improve our team’s chances of winning), we should not play.
  • But! Since the intercept is “just” -4, that means it’s better to hide and survive than die. If we know we’ll lose every firefight that remains in the round, we should not even try to engage and instead go and hide somewhere. By not doing anything good, we will contribute negatively, but significantly less than if we went out and died.
  • The coefficient for kill rate is higher than that for death rate. This means exposing ourselves to death while killing an opponent is a good deal.10 Counter-Strike tradition claims kill trades are only beneficial for the attacking team. This may be true, but my data is not detailed enough to shine light on that. In fact, a “kill trade” like that is just about enough to justify the existence of a player, because \(17-11 \approx 4\).

    This latter point also serves as a sanity check on the model. Getting a kill and then dying puts both teams on somewhat equal footing again, so the total value of the rating should be close to zero – and it is: \(17 - 11 - 4 =2 \approx 0\).

But what we wanted to get at was that engagement decision. We were deciding whether to engage an opponent in a do-or-die type firefight, and we want to perform the action that is net positive for our team’s chance of winning.

This is just a matter of solving the equation. Imagine there’s a probability \(p\) that we are successful in a firefight. Our rating as a function of \(p\) would then be

\[L(p) = -4 + 17p - 11(1-p)\]

or, equivalently,

\[L(p) = 28p - 15.\]

This is a linear function:

cs-rating-pg-03.svg

It seems that in practical terms, the Leetify rating goes from -15 to +15, where the endpoints are achieved by those that virtually never and virtually always win a firefight.11 My Leetify rating in recent games have been around -4, which must mean my success rate in firefights has been about 40 %. That seems reasonable, but also highlights how devilish this game can be. 40 % isn’t a bad success rate. It can even feel fairly successful thanks to natural variance around that. But it most definitely means I’ve been contributing negatively – and I know I have. The break-even point where we start contributing positively to our team’s chance of winning is at 54 %. So there we have it!

If we have at least a 54 % chance of getting the kill in a do-or-die firefight, we should take it. Otherwise we should go somewhere else.

Appendix A: Kill/death ratio analysis

We could perform the same sort of analysis but using the k/d ratio instead of separate variables for kill rate and death rate. We would end up with a function like

\[L = -11 + 10r\]

where \(r\) is the k/d ratio of a player.

This also has the negative intercept, and it implies a success rate of about 52 % is required to take a do-or-die firefight.

Interestingly, whereas the previous model suggested a kill trade was a small net positive, this model indicates a kill trade is a small net negative. Further evidence, I suppose, that these models are too simplistic to evaluate the effect of a kill trade.