Announcement

Collapse
No announcement yet.

Brainstorming a simple ELO rating system

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Brainstorming a simple ELO rating system

    I’ve done a small amount of research and would like to share my ideas for a simple ELO rating system. I would like to have this system tested in wbduel or a new arena if that is preferred. BIET - Please see if there is anything you can do to make this real. It would be much appreciated. You’ve done a great job for the game and I hope you continue doing so.

    This system aims to rate a player in a way k/d and gamescore cannot. Over time, all of the little “non-boxscore” things you do (or don’t do) will be accounted for. The context you are playing under will be accounted for. An unbiased number will be formed that gives you an idea of how impactful you are relative to your peers. It’s a long-term thing and requires heaps of rounds to calibrate you properly, so don’t freak out after a few games. If you’ve played hundreds of rounds and still have a bad rating, the problem isn’t the system, it’s you.

    A couple mandatory things:

    -Each player must start at a baseline rating. It’s arbitrary. It can be 1000, 1200, 1500 or whatever you want. I suggest 1000 or 1200 based on what I’ve seen working elsewhere.
    -Each player must have only 1 eligible name.
    -Each player must conduct themselves like a sportsman. Don’t troll. Play games to completion

    The bot will calculate each team's odds to win at the start of the game using the following formulas:

    Ea = 1 / (1 + (10^(Tb-Ta)/400))

    Eb = 1 - Ea

    Ea = Team A’s odds to win

    Eb = Team B’s odds to win

    Ta = Team A’s average rating

    Tb = Team B’s average rating

    Post-game, ratings will be adjusted based on your odds and a scaling factor K. With a K-value of 30 and a 50% chance to win, every player on the winning team would gain 15 rating, and every player on the losing team would lose 15. It’s a zero-sum system where your rating increases are inversely proportional to your chance to win, and your rating decreases are inversely proportional to your chance to lose.

    K = scaling factor (can be adjusted)

    Sa = Actual result for Team A. 1 for a win, 0 for a loss

    Sb = Actual result for Team B. same rules as above

    Every player on Team A will receive a rating change of K(Sa - Ea)

    Every player on Team B will receive a rating change of K(Sb - Eb)

    Assume the following teams play a round:

    Team A

    2000

    1700

    1300

    1100

    500

    Team B

    1800

    1550

    1350

    900

    700

    Ta = (2000 + 1700 + 1300 + 1100 + 500) / 5 = 1320

    Tb = (1800 + 1550 + 1350 + 900 + 700) / 5 = 1260

    Ea = 1 / (1 + (10^(1260-1320)/400)) = ~60% chance for Team A to win

    Eb = 1 - Ea = ~40% chance for Team B to win

    If Team A wins, their ratings would be adjusted as follows:

    K(Sa - Ea) = 30(1-.6) = +12

    If Team A’s players gain 12 rating, Team B’s players must lose 12 rating

    Team A

    2000 + 12 = 2012

    1700 + 12 = 1712

    1300 + 12 = 1312

    1100 + 12 = 1112

    500 + 12 = 512

    Team B

    1800 - 12 = 1788

    1350 - 12 = 1338

    1550 - 12 = 1538

    900 - 12 = 888

    700 - 12 = 688

    If Team B (our underdog) won, their rating change would have been K(Sb - Eb) = 30(1-.4) = +18 and Team A’s would have been -18.

    That is basically the core of this system which has been proven to work well across many games. I can testify that it even works fairly well in dead zones with 50 players. It is very simple.

    In the case of subs:

    Recalculate the odds to win at the end of the game

    Let’s say Team A’s 500 rating player played for 5 minutes, then got subbed out for a 1300 rating player who played for 10 minutes. In the recalculation, the subbed slot would have a rating of (5/15 * 500) + (10/15 * 1300) = 1033

    Team A’s average rating would be increased to (2000 + 1700 + 1300 + 1100 + 1033) / 5 = 1427

    Team A’s new odds to win would be 1 / (1 + 10^((1260-1427)/400) = ~72%

    Rating adjustment for Team A would then be K(Sa - Ea) = 30(1-.72) = +8

    Team A

    2000 + 8 = 2008

    1700 + 8 = 1708

    1300 + 8 = 1308

    1100 + 8 = 1108

    500 + 3 = 503

    1300 + 5 = 1305

    Team B

    1800 - 8 = 1792

    1350 - 8 = 1342

    1550 - 8 = 1542

    900 - 8 = 892

    700 - 8 = 692

    If Team B (our underdog) won, their rating change would have been K(Sb - Eb) = 30(1-.28) = +22 and Team A’s would have been -22.

    That’s my attempt at handling subbing. I don’t know if there is any perfect way of doing it, but I think this is acceptable and worth testing. I am all ears if anyone has a better idea.

    Regarding team-creation. As I see it there are 3 options:
    1. Captains select teams
    2. Bot attempts to create even teams
    3. Bot randomizes teams
    Keep in mind that team-creation has a negligible effect on ratings in the long term. I do however believe that games are most enjoyable when the teams are both even and balanced. Unfortunately it’s just way too much to ask for both of these things with a low population. There will be big skill gaps, and games will sometimes be miserable just like they are in TWDT. But there is a greater purpose here so I don't mind it personally.

    Regarding players getting to a high rating, refusing to play and sitting on their rating - who cares. You aren’t getting medals for this, it’s not intended to be a league.

    The beauty of this system is that you can stack all day with 90% chance to win every round. But once that 10% chance to lose hits (and it will hit) the harsh penalty will negate all of your gains. If you instead choose to carry as an underdog, you can be handsomely rewarded for it.

    I think if this system gains traction we'll have taken a significant step in shifting the focus away from k/d and putting a huge focus on doing absolutely whatever it takes to win.. Getting out of your comfort zone, playing with an urgency, and actually thinking about strategy. There is too much of this relaxed bullshit gameplay and people being content just being alive.

    P.S. Before you troll me... And you know who you are... If you do interviews, write guides, or turn on your pc with your lips, you're not allowed to bicker at me
    Last edited by saiyan; 11-10-2021, 11:55 PM.

  • #2
    I suppose with an ELO system similar to this one we would need to switch to one round matches or alternatively have it adjust player ratings after completion of each round. Imagine playing only one round, winning it, then having to leave and your team loses the other two rounds.

    Nevertheless I agree that the capability to consistently win and close out games is far more important than pulling off impressive scores. I wouldn't mind something like this granted it probably still needs some number tweaking.

    Comment


    • #3
      Yep, a simple W/L team-based Elo system (similar to what happens with TWD squads already) would be enough. The problems come in when humans try to guess at rating based on performance. In the end, consistent wins are the only thing that actually matter. Straight Elo shows your truest form of skill: whether, on average (regardless of team), you win more than you lose. Not whether you made a brilliant play that almost won. If you win the battle and lose the war, you've still lost the war.

      Your sub solution seems mathematically sound. From a cursory read I'm not certain I can say people wouldn't find a way to game it, but it would definitely be worth a try.

      With a rating based on actual Elo it's not difficult to create balanced teams automatically once enough games have been played. Even with a small population, run it long enough and you'll get increasingly balanced teams.

      As this continues, eventually everyone enters "Elo hell" at which point their rating floats around the same small range because the system has accurately rated their level of skill. Once this happens, the season usually resets so that people can once again enjoy the fun of having their skill level slowly and accurately assessed over time. The algo abides.

      Most modern games employ qualification rounds. In these, you're pitted against other people also in their qualifying rounds. Basically, your rating moves in much larger increments at this time, as you're slowly assigned a very rough initial approximation of your skill level. Ideally, qual game #1 players play other #1s, #2s play #2s, etc. With a small population that's probably not possible, but at least having 5 (or even 3) quals where all play all would help place people more accurately early on.

      We've talked about doing this many times over the past few years. Mostly manpower is the issue. It's a serious project, but it could also be the one that carries TW for the next 5 years.
      "You're a gentleman," they used to say to him. "You shouldn't have gone murdering people with a hatchet; that's no occupation for a gentleman."
      -Dostoevsky's Crime and Punishment

      Comment


      • #4
        This is a replica of what some people built in SVS Chaos League. It keeps that small population going, and it incentives playing the game to team, target players, etc. With a faster paced TW, it would be great.

        We've seen this work elsewhere (Chaos). I see that meaning as more likely to work here.

        Good shit with numbers break down, I think it helps with feasibility of the project, even if its larger scale. I think a lot of us would want something like this. I know I would. The way stats are recorded now encourages players to hold on to their lives recklessly, instead of doing what their team needs at that time (killing players with higher deaths).

        Comment


        • #5
          Saiyan, you should get this going in wbduel and javduel. People would eat it up
          3:Steadman> ive been a leader in every league of legends and basketball game ive ever played in

          Comment


          • #6
            Make this a reality and I might even play again just to see people's ratings. It's the only new thing I have seen proposed in probably 10 years.

            Comment


            • #7
              def down to test this out in wbduel/javduel if its doable
              7:Amnesti <TW>> racka is nestled in Candlekeep. hes nestled atop the cliffs that rise from the sword-coast.
              Amnesti> but i guess i havent tried so much yet, im only 19. when im 20+ i will on purpose make myself look as good as i can, and go to places where girls are
              1:Sika> "ezor" is #1

              Comment

              Working...
              X