Announcement

Collapse
No announcement yet.

Elim formula for ladder ratings is bad and should be changed

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Elim formula for ladder ratings is bad and should be changed

    I'd like to submit that this formula used for elim ratings is really bad and has major flaws. The proof it is broken is abundantly clear now after it has run consecutively for many months and the ratings ladder listed currently makes very little sense and shows many inconsistencies. The more you play the more it penalizes you. It is impossible to move up in ranking if you play a lot even when you do well. Can we discuss how it is set up? It seems really broken and this Adjusted Rating stuff seems to be the main culprit. I'm posting top 100 so you can see how random and nonsensical this formula has been working itself out to be. It needs fixing if a new elim season is going to be put into play because this is showing MAJOR FLAWS.








    TWDT-J CHAMPION POWER 2018
    TWDT-B CHAMPION POWER 2018
    TWDT TRIPLE CROWN MEMBER POWER 2018
    TSL TRIPLE CROWN FINALIST 2018
    TSLD CHAMPION 2018
    TSLB CHAMPION 2018

  • #2
    If you are having trouble reading it use ctrl and + to zoom in on your computer view. Hate screenshots sometimes squeeze text.. I can't make any rhyme or reason out of this ladder though lol.
    TWDT-J CHAMPION POWER 2018
    TWDT-B CHAMPION POWER 2018
    TWDT TRIPLE CROWN MEMBER POWER 2018
    TSL TRIPLE CROWN FINALIST 2018
    TSLD CHAMPION 2018
    TSLB CHAMPION 2018

    Comment


    • #3
      Here's the spreadsheet used to create the formula: https://docs.google.com/spreadsheets...k_U/edit#gid=0 You can play with the values in the box if you like. This link is also available in a command on the elim bot and explains how ratings work. Can't remember the name offhand but it should be in the !help.

      Adjusted rating is just "confidence" multiplied by rating. Once you reach 100% confidence, your rating is 100%. This prevents people from playing 10 fluke games (resetting constantly) and retiring from elim with their name at the top. If we removed confidence/adjusted rating, it would be the same as everyone having 100% confidence. So adjusted rating/confidence are not at fault here.

      The reason your rating keeps climbing when you first start is because your confidence is also rising. Once you hit 100%, your rating stops rising as a result of confidence, because it's just 1 x rating (100%) rather than .5 x rating (50%). At that point you get stuck in what is called "Elo hell." This is common to every game which uses some sort of rating system. In order for your rating to rise at this point, you need to consistently play better than your average, over and over again. For most people, this just doesn't happen. Your rating goes down when you play worse than your average and goes up when you play better than it. Just as with the old formula.


      I think it would be nice to run a new season of elim to give a ratings reset and a chance to feel that ratings climb as you approach toward 100% confidence. So far nobody's shown much interest in it, though. There's nothing particularly new that would need to be done, though if we wanted to run another Elim World Championship, an end condition better than "one elim game that decides the winner after almost a year of playing" would be necessary. Maybe a series of 5-10 games with the highest average rating over all games determining the winner, rather than the one who wins a single elimination match. That was incredibly underwhelming, to the point where the past elim runner was so demoralized that he didn't even bother to finish creating the A1 graphics for the final season or the winner of the championship.
      "You're a gentleman," they used to say to him. "You shouldn't have gone murdering people with a hatchet; that's no occupation for a gentleman."
      -Dostoevsky's Crime and Punishment

      Comment


      • #4
        Can we just do away with this adjusted rating / confidence stuff? It is obviously bad and plays into the hands of inactives who only play large populated elims. The player Luka is a prime example of how it is skewed, They played 20 games and are ranked above vet players like absent , ravage and burnt. This makes no sense.

        Why not have set dates for elim season and just do a flat out rating spread for games played and results given. Use the old formula for set dates that matter. It would eliminate the major weaknesses with the "confidence / adjusted ratings formula" being shown with months of stats piling up.

        I also noticed people were gaming this setup last elim by playing just a few elims with large crowds and then sitting on placement. It is obvious the more you play the more it hurts you. It should be the exact opposite of this if we are trying to get activity up in the zone.
        TWDT-J CHAMPION POWER 2018
        TWDT-B CHAMPION POWER 2018
        TWDT TRIPLE CROWN MEMBER POWER 2018
        TSL TRIPLE CROWN FINALIST 2018
        TSLD CHAMPION 2018
        TSLB CHAMPION 2018

        Comment


        • #5
          You're misunderstanding confidence. Luka's score would be far higher without confidence. With 20 games played, adjusted rating is far lower than rating. Confidence lower than 100% adjusts rating downward, not upward.

          Regardless, with or without confidence, if you play well in your early games and your performance in later games is worse than your average, your rating will go down. Again, this is how it works with every game that relies on a floating rating based on performance, unless you're constantly improving. Rating is meant to represent skill, not your willingness to play more games. One way to require more play in order to gain a higher rating is to raise the number of games required to reach 100% confidence. Right now it's at 100 games. It could be 125, 150 or even 200.

          The main issue is that we haven't had a season reset since January, though. This is because, again, nobody has yet been willing to step up and do the very small amount of work required to run either regular seasons (2 month intervals seemed to work just fine) or another EWC. The work involved is advertising the seasons, reminding me or another dev member to add seasons when they're close to ending, and creating (or getting someone to create) the graphic for A1. Kim was interested in this at one point but I'm not sure what he's up to now. Tiny was also interested, if I recall. Other people who are interested in elim and also understand how it works (or are just willing to just keep things as they are) would also be welcome candidates to run it. Right now nobody's expressed interest, though.

          Another piece that could use adjustment is the amount that the average rating of players killed affects your final score. If you're playing against players with lower rating but who have skill that is higher than the skill reflected by their rating, you're not getting a fair return when playing. The formula could be tweaked slightly to make rating less important, or seasons could simply be reset more regularly. (Easiest fix.)
          "You're a gentleman," they used to say to him. "You shouldn't have gone murdering people with a hatchet; that's no occupation for a gentleman."
          -Dostoevsky's Crime and Punishment

          Comment


          • #6
            Random thought...I would like to see a rating decay for people who rank and then stop. People who continously play and keep a high rating is better than people who cherry pick big games and best times and then stop playing.
            1:waven> u challenge
            1:waven> if i challenge it looks too scary

            Originally posted by MHz
            Hope you contract ebola from your, no doubt cheap, Easter Egg, you fucking shit-jav, pug-faced cunt.

            Comment


            • #7
              Decay would be interesting. We worked out a way to do it but it was never implemented. Check out the 3rd sheet on the link. You can play with the values in green.
              "You're a gentleman," they used to say to him. "You shouldn't have gone murdering people with a hatchet; that's no occupation for a gentleman."
              -Dostoevsky's Crime and Punishment

              Comment


              • #8
                I went over the sheet and the full equation is not really clear at all as the mathematics is written in words with many values missing it seems. The Test Values seem to have no equation total on page 1 either. I feel confidence being part of ratings calculations are causing the biggest problems and more troubles than it is worth.

                I feel a fair ratings equation can be made much simpler without adjusted ratings and confidence being part of any input. How about just use these game bonuses I suggest?-----

                1) Win Bonus

                2)Kill Bonuses, In Games to 5 a Bonus is received when you get 6 or more kills. Games to 10 a bonus is received when you get 11 or more kills. For each kill over the set death elimination (5 or 10) add even bonuses for EACH KILL OVER THE ALLOTED DEATHS ALLOWED. Simple addition of a value per kill over the base death rate of 5 or 10.

                3) Streak Bonuses starting at streaks of 3 kills made without dying with bonus improvement there after for each consecutive kill in streak.

                Then..

                4) Input the average player rating killed category into final equation. The values I feel should all be based off of 1000 base rate like we use in TWD. If you are going negative on average your rating will be under 1000. If you go positive in stats overall you will be above 1000. Bonuses will then be added through a formula made with preset values to these base rating.



                Things we DO NOT NEED in ratings equations are..

                1) KO's .. no need for a bonus here at all. The whole point of elim is to KO players. No bonus for this.

                2) Games of 5 or 10 being different in ratings. We ALL play under same elimination value at start and it is pointless to play around with different values for games to 5's or 10's. The death limit is set so no need for ANY change in ratings between the elimination death count.

                I feel games under 5 players should be to 5 deaths and all other games with 6 or more players involved should go to 10 deaths. It should make no difference in values determined on ladder though between games to 5 or 10. We could just only allow games to 10 if a active season is in play. Use the SAME rating formula for elims to 5 or 10 deaths though please. It is too confusing breaking it apart and making different values.

                3)Totally drop confidence. It is confusing and unneeded. It HURTS players who play more. There is proof it is broken by looking at the ladder. The formula we use now is bad and rewards inactivity and selective play. This system can easily be gamed we are using now once a season starts. That is bad..... It is WAYY too complicated and unneeded. I feel there is ample proof by the current ladder that real problems exist with this formula/system.

                4) DO NOT square bonuses for number of players playing. That to me is the biggest flaw. We can make a much simpler formula that does not reward sitting around waiting only for large games.

                Should winning a big game give you a bigger rating change.. YES IT SHOULD but to square it is silly. Just add a base value for every player who participates. If you have 25 players in, add 25 base values EQUALLY to the ratings.

                Below is a quick dissection of the confusion I see in the current system. There is no clear formula I see written and some values appear to be left blank. I could not manipulate this sheet either to test anything. The elim RoboRef does not have a link or show clearly anything either explaining the formula. We need a CLEAR MATHMATICAL EQUATION shown with value keys clearly showing algebraic value for ratings totals. I get what the old one was suggesting and trying to do.. it is not all bad but it has values being inputted that are breaking the rating system and totally unneeded.

                I did a quick critic below from what I saw.. its not very clear honestly the way it is laid out. The hard work obviously is tweaking the formula to make the equation but it can be made much simpler if my sugestions are followed.



                Last edited by Jessup; 10-17-2020, 04:28 PM.
                TWDT-J CHAMPION POWER 2018
                TWDT-B CHAMPION POWER 2018
                TWDT TRIPLE CROWN MEMBER POWER 2018
                TSL TRIPLE CROWN FINALIST 2018
                TSLD CHAMPION 2018
                TSLB CHAMPION 2018

                Comment


                • #9
                  Jessup, before saying "you've got all the answers" and "just do everything the way I say and it'll be great" (please forgive me if I am extremely tired of hearing this from dozens of people over the years who then retreat when their first iteration of an idea doesn't work, rather than tweaking it) ... you need to at least understand what's going on currently.

                  > "A bonus is given" -> What is the bonus? Is it 1?
                  Yes, look at the value to the left. Those are the variables you are tweaking. So clearly, yes, it's 1 (%). Default should be current values used in the formula, though they may be a bit outdated.

                  > Where is result of test value?
                  It's up in the beige square, which you can see because as you modify test values, the rating value modifies. This is simulating the result of a single game using specific rules for the formula.

                  >Where is the equation for Math.log?
                  This is the log function in mathematics. https://en.wikipedia.org/wiki/Logarithm

                  >"A bonus sum is applied over this sum: number of players"
                  This is the pubbux reward given for wins. You can effectively ignore the second sheet. It's not what's currently used anyhow, and it has nothing to do with rating. Note that ratings adjustments for size of game or # of deaths are no longer applied, as noted on the sheet.

                  >"Formula for confidence with 50 games."
                  It's marked as 100 on the sheet. I think we're currently using 100 games. I don't update this sheet every time we make a minor change. I could update it to reflect current values.

                  >Totally drop confidence. It is confusing and unneeded. It HURTS players who play more.
                  You haven't understood what I've said if you believe this is the case. This is not what hurts players who play more. Please read what I've written again. Then read it again. Then, perhaps consider reading what I've written a third time. Until you understand this most basic point, we've reached a standstill as to what can be accomplished with this discussion. You're entirely wrong on this point. Eliminating confidence makes the problem you are addressing far worse, which if you understand what it's doing, you would see. It returns us to the point where you can play 5 games and be at the top of the scoreboard. Games required for adjusted rating to = rating (which is called 100% confidence, or 100% rating, where you get the maximum value of rating possible) could be expanded for the new season if we believe this effect is too pronounced. Other measures could be implemented, such as expanding the minimum number of games before you are allowed to place on the scoreboard. Just doing away with confidence and changing nothing else would be disastrous, unless we want every elim season to be decided by who can cherry-pick the best 5-10 consecutive games and continue resetting their rating until they get it, like the old days.
                  "You're a gentleman," they used to say to him. "You shouldn't have gone murdering people with a hatchet; that's no occupation for a gentleman."
                  -Dostoevsky's Crime and Punishment

                  Comment


                  • #10
                    Qan, all I am doing is trying to help get a better and simpler rating system while sharing my opinion with what is wrong. If you see that as me saying "I've got all the answers" or "just do everything my way" you have misunderstood my last post. I didn't even lay out the actual values.. just a basis to work from. I just feel the system is too complicated when it doesn't need to be. I see wild upward rating swings early on when confidence builds with this setup and then it becomes quite static quickly when confidence is met and elim rating for everyone virtually freeze. I disagree with many of the functions that are contributing to ratings too like KO and and making large games count exponentially more. I'm not convinced logarithmic math is the best approach either. Algebraic can work quite well if certain parameters are set beforehand that integrate into the TWD style 1000 points as a base value for player ratings.

                    Definitely should have minimum number of games needed to be played before being placed on an official ladder too with my approach.

                    Also I was not saying to only do away with confidence and change nothing else. I wanted to remove KO bonuses. I wanted to remove bonus % for # of players being squared for final reward in large games. I suggested perhaps only doing games to 10 too to keep it simple.

                    I'd be willing to help too.. I am taking time to post this so obviously I care and am willing to help.
                    TWDT-J CHAMPION POWER 2018
                    TWDT-B CHAMPION POWER 2018
                    TWDT TRIPLE CROWN MEMBER POWER 2018
                    TSL TRIPLE CROWN FINALIST 2018
                    TSLD CHAMPION 2018
                    TSLB CHAMPION 2018

                    Comment


                    • #11
                      >I wanted to remove bonus % for # of players being squared for final reward in large games
                      I again think you didn't read my post, where I already explained this has nothing to do with rating but rather just pubbux. You're frequently becoming exasperating to correspond with.

                      Originally posted by Jessup
                      I see wild upward rating swings early on when confidence builds with this setup and then it becomes quite static quickly when confidence is met and elim rating for everyone virtually freeze
                      This is exactly how every rating system works. Those that represent skill as accurately as possible. The longer you play, the closer the rating gets to your actual skill level. Rating does not increase. The end of this rating system, without a reset, is having essentially 0 change to your rating the longer you play. This is "Elo hell" in which rating raises and lowers with games but makes very little change after the point of accurate estimation has been reached.

                      This isn't fun over the long term. This is why season resets exist, so everyone gets a fresh start and you get a new chance (to ride the elevator up to being evaluated at your correct rating, at which point it again stagnates).

                      Without completely abandoning that premise and instead, perhaps, increasing rating continually the more you play (which is completely non-competitive), every system will have this effect. Every one that attempts to accurately gauge rating, which starts at 300 in this system, but certainly could also start at 1000.

                      You don't like the game loop of a long season where rating "zeroes out" and stops changing/going up. This system works just fine as is with simply more frequent season resets. I do not have time to run the season or make changes to the algorithm right now. If anyone would like to run it, we can start running elim again. No technical skill is required. If we can also get a developer who is interested in modifying the algorithm to tweak it more, which could certainly use doing in places (but could just as easily make it worse -- it's all iterative, you just try and then fix), we can do that as well, but again, I'm not interested personally, especially when my 30ish years of amateur gamedev experience tells me that resetting the season accomplishes almost all of the goal. Small tweaks that are just changing variables or not counting certain things in rating I can do, but it does need to be more than 1 person supporting the idea. Change in a random direction is not necessarily progress, and is quite often its opposite.
                      "You're a gentleman," they used to say to him. "You shouldn't have gone murdering people with a hatchet; that's no occupation for a gentleman."
                      -Dostoevsky's Crime and Punishment

                      Comment


                      • #12
                        The rating system is fine. It's just the implementation of said system. Tiny's already got the solution, decay. Make it vicious as to reward activity in elim as it is the natural extension of the public arena. Lose points daily if you don't play, it will encourage participation and get people to log in. Not a huge amount, just enough so that if you have a great 5-10 game win streak and log off for the rest of the season you won't be champion. Combine that with regular resets to encourage people stuck in low-elo ratings who are very active.

                        Comment


                        • #13
                          Qan I'm not trying to make this elim rating stuff an attack on you personally so unsure why you seem a little huffy towards me. I read all you write but as I posted above I did not ever see the actual equation posted anywhere on the spreadsheet so I was confused by what I saw. I understand now that the % rating for # of players squared is for the pub bux system. That does explain my confusion there and sorry if it took you explaining it twice to me to sort that out but the page 2 I displayed no where said for pub bux payout on it. . You went above and beyond the call of duty to implement a pub bux system (if it was you which I assume it was). I personally don't care about pub bux being a part of elim but it being added was a very nice feature which I don't have a problem with. It is a definite incentive for pubbers to come play which is a great thing!! The pub bux reward seems fine to me! I just was unsure of what page 2's values were because no where on the spread sheet did it say it was for pub bux payout.

                          I agree that regular resets make the current confidence system work better and work smoother in smaller windows of time too. The issue I have with logarithms and this Elo system is it based on trending rating curves rather than hard numbers per game. Algebraic math is more factual on a per game basis and gives better hard results for statistics. It just needs minimum number of games played to stop high rating for players who dell well in 10 easy games. Algebraic ratings imo though would work much better if you have games on set times and dates that collect hard data. It is a much better control in a ladder system.

                          Let's do new elim season either way.. If we can tweak it some that would be great!
                          TWDT-J CHAMPION POWER 2018
                          TWDT-B CHAMPION POWER 2018
                          TWDT TRIPLE CROWN MEMBER POWER 2018
                          TSL TRIPLE CROWN FINALIST 2018
                          TSLD CHAMPION 2018
                          TSLB CHAMPION 2018

                          Comment


                          • #14
                            There's nothing wrong with the formula's for elim lol Far from broken.. Compared to what they were, it's 100% better. Perfect? What is in life. It works and does a great job I think. Can't wait for the next season to start!

                            Comment


                            • #15
                              System is good, just needs decay. I want to look at that formula, but I also dont...someone good with elim/formulas please adjust the formula to include a good and fair decay for inactivity, post for qan, and lets move on with next season.

                              please and thank you ahead of time.
                              1:waven> u challenge
                              1:waven> if i challenge it looks too scary

                              Originally posted by MHz
                              Hope you contract ebola from your, no doubt cheap, Easter Egg, you fucking shit-jav, pug-faced cunt.

                              Comment

                              Working...
                              X