Announcement

Collapse
No announcement yet.

Trench Wars Repairs!

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by Pure_Luck View Post
    been many code changes since then.new banc stuff, new stuff added to bwjs bots.
    A few extra bots can't make a sql server buckle under the load. I agree the new BWJS bots, the new elimination bots are putting extra strain on the database but I doubt they are the root of the problem here.

    Originally posted by Pure_Luck View Post
    I looked at the exception logs for the last month, haven't seen one from matchbots. TWD Can run by itself for decades without going down. I think we both know that.
    I clean up the logs almost every week now as they grow rapidly in size making them impossible to handle. If you would've have viewed the logs of last month, you would've seen the loads of errors from the MatchBot as it was trying to load the rules.
    TWD ran "for decades" when it was on RoboQueen but that was on Priitk's server with code of 2000. This is not any benchmark at all. If you want to run TWD like that, I will get you a copy of the source code of back then. Fixing bugs is all up to you aswell then.

    Originally posted by Pure_Luck View Post
    Dock's trying to help , I don't think he would make up that its connecting over 700 times at once. So I think its been proven that its happening.
    I'm not saying he's making it up, I'm asking what proof YOU have. I haven't talked to Dock> myself so whatever you quote him is pointless to me (plus you still haven't given me anything to work on).


    It's incredible how you think you know everything. If you want to solve the issues, please go ahead, I won't bother anymore.


    PS. You are still missing the point I'm making that I want YOU TO COMMUNICATE. It seems like you don't know how to at all. Like I've said before, a simple ?message would've prevented all of this.
    Last edited by Maverick; 01-28-2010, 05:32 AM.
    Maverick
    Retired SSCU Trench Wars Super Moderator
    Retired SSCU Trench Wars Bot Coordinator
    Retired Trench Wars Core Administrator
    Subspace Statistics Administrator
    Former Mervbot plugin developer

    Comment


    • #17
      Being illiterate in java, I can't really do much to help that front. However, if a new server would help things, I'd be happy to be one of the ones donating to make that happen. When everything gets figured out, I'd appreciate it if PL/HMS or someone else posts a "summary" thread of problems and what's required to fix it.
      5:royst> i was junior athlete of the year in my school! then i got a girlfriend
      5:the_paul> calculus is not a girlfriend
      5:royst> i wish it was calculus

      1:royst> did you all gangbang my gf or something

      1:fermata> why dont you get money fuck bitches instead

      Comment


      • #18
        Guys... for the sake of the uninformed public, that for the most part should STAY uninformed... can we take this to emails/in game pms/staff chat/etc?

        I mean... This is all fine and dandy to most forum users, but you don't need new players coming in and seeing the bickering. It just isn't very professional. While I'm almost positive we all appreciate the publicity of the inner workings of upper staff (you guys are in general far too secretive), in this case it probably isn't needed.

        If a new server and a donation fund is set up, then it is time to open the can of worms and give everyone a good reason to spend their money. If not, then this thread shouldn't be needed.


        Needless to say, we all appreciate everything you guys are doing.
        RaCka> imagine standing out as a retard on subspace
        RaCka> mad impressive

        Comment


        • #19
          I didn't want to drag out here either. I only made on sarcastic comment because of the way of how things recently were going. Pure_Luck responded in this forum. I'm more then happy to continue this discussion in the staff forum.
          Maverick
          Retired SSCU Trench Wars Super Moderator
          Retired SSCU Trench Wars Bot Coordinator
          Retired Trench Wars Core Administrator
          Subspace Statistics Administrator
          Former Mervbot plugin developer

          Comment


          • #20
            thanks stargazer

            Comment


            • #21
              This thread is nonetheless much needed as the public has been in the dark long enough to where they've become fearful of things to come. Now that someone has turned on the light at the end of the tunnel, we begin to wonder when that light will be turned off due to insufficient funds towards the electric bill. ;P

              Comment


              • #22
                AMD Sempron 3000+ (I believe that's a 1.6 GHZ single core)
                1 GB Ram
                This machine should should easily be able to run the Java bots and the tw websites.

                I don't think throwing blame around at this time is helpful, if it was working fine for a long time, its unlikely anythings happened with the code that suddenly broke it, and the server hardware is more than adaquet. Its likely the server crash has corrupted some part of the database or some important optimisation was lost when it was restored.

                I suspect the problem comes down to optimisation issues with mysql and apache and has very little to do with the bots at all. (apart from they probably should be caching/reusing mysql connections, there should be 1 connection per core in an idea world)

                If you guys need any help/assitance let me know because i have quite a bit of experience in this area (5 years at my previous job administration Linux Apache/Mysql and Windows 2000/2003 server IIS/MSSQL) Not to meantion I run my own server.
                Rediscover online gaming. Get Subspace

                Mantra-Slider> you like it rough
                Kitty> true

                I girl with BooBiez> OH I GET IT U PRETEND TO BE A MAN


                Flabby.tv - The Offical Flabby Website

                Comment


                • #23
                  i admittedly don't know a whole lot about server loads with regard to what hardware you need to handle whatever apps you're running. i also don't know what kind of system resources the bots eat up.

                  i do know that if space (edit: and doc flabby, whom i know has been involved in the ss dev community at large for quite some time now) thinks the hardware's up to snuff, that's good enough for me. 1GB of ram seems a bit on the low side, but again, that's just my (only somewhat informed) opinion.

                  sounds like we need to address programming issues on the double before we begin to bother with anything else.

                  so i'm going to leave it at that. if i can help by throwing a little cash into the community pot, please let me know.
                  jasonofabitch loves!!!!

                  Comment


                  • #24
                    This feels like I am at work, the classic software guy vs. hardware guy. After 18 years of managing firmware engineers fighting with electrical engineers take my word for it, defensiveness is a waste of time and effort. Obviously the common ground and common objective is to resolve the issue. Wherever the root cause ends up being is not relevant (in terms of blame), troubleshooting it efficiently and correctly is relevant. Even if you think the other team members are idiots, hold your tongue and appreciate that they care and are in this for the same purpose. I can’t tell you how many times I’ve seen the moronic theory actually become significant in troubleshooting hard issues. Want to feel really stupid? Try calling out the idiot with the stupidest theory you have ever seen only to have it end up resolving the issue.
                    P_L’s approach is applicable, systematic, and comprehensive for trying to eliminate possible software issues. Mav is absolutely correct pointing out that hardware could very well be the issue and P_L’s trouble-shooting approach does not take this into account. The even more obvious, and likely, trouble-shooting short-coming is that the issue could be the result of the software and hardware.

                    You can’t manage what you can’t measure. PL is contributing to the effort by collecting software data. What can be done to collect and measure hardware information? Are other hardware resources available to test with? Team-wise, instead of the current diffusion of responsibility, perhaps assign a single person to lead the team (should not be either of the primary trouble-shooting resources) and is the voice to the rest of the community?

                    But w/e happens in this typical stressful situation, try to consider the common ground you have with the other guy and be tolerant or any thoughts or ideas no matter how far off base you might think they are.

                    Comment


                    • #25
                      although this kinda turned into staff airing their dirty laundry in public it's still good to let the greater community know what's going on. I've always said communication should be more open and in public.

                      But eph is right don't forget what the greater goal is here , fixing the stability of the zone. Don't let personal pride and assigning blame get in the way.

                      surely the same server would have been under more strain when more ppl played in the past :P but then again hardware does fail over time ... so test what you can and try to find the root of the problem while not getting personal, we all want the same thing in the end and ppl appreciate the time and effort that staff & coders have devoted to this game.


                      anyhow would it be possible to give the code to Doc flabby to test on his own server to see if his server throws up these errors etc ? i don't know if that is feasible :P
                      In my world,
                      I am King

                      sigpic

                      Comment


                      • #26
                        That would be a brilliant idea, but I really don't think they would want all those settings being filed up and sent off like that.

                        Comment


                        • #27
                          Originally posted by Pressure Drop View Post
                          anyhow would it be possible to give the code to Doc flabby to test on his own server to see if his server throws up these errors etc ? i don't know if that is feasible :P
                          He can get the code at www.twcore.org .
                          Most of the bots are downloadable with the exception of a few; MatchBot, PubHub, PubBot and some others.
                          These are not publically available as knowledge of how the bot works can be used to cheat (not very likely though). If you want to get those bots, you have to ask permission from a Trench Wars sysop.
                          Maverick
                          Retired SSCU Trench Wars Super Moderator
                          Retired SSCU Trench Wars Bot Coordinator
                          Retired Trench Wars Core Administrator
                          Subspace Statistics Administrator
                          Former Mervbot plugin developer

                          Comment


                          • #28
                            So the bot server is actually crashing? Im not a software guy, but seeing as the same bots and websites have been working great for years, im pointing my finger at the 7 year old server. The recent harddrive crash should be proof enough that the hardware is getting old. It would be nice to see the bots running from the flabby server.

                            Comment


                            • #29
                              There's nothing wrong with the server hardware we have. We've been running approximately the same hardware since back in the day when we were worried that we'd hit the theoretical maximum of 1024 players online per server.

                              Here's what we've got.

                              AMD Sempron 3000+ 1.8GHz
                              Dual 160 GB SATA drives
                              1 GB RAM

                              Sounds skimpy, yes, but this is actually quite suitable for our needs. Sure, we could throw more hardware at this, but I'm fairly certain it'd still perform poorly, it'll just do so faster.

                              Here's some little things we've done to optimize it as best we can. Keep in mind this is way better than it was before the crash.

                              Ubuntu 9.10 Karmic Server 64-bit, based on Linux 2.6.31
                              Dedicated drive just for MySQL, short stroked to 36 gB of 160 for maximum IOPS
                              Swapped database for MariaDB 5.1, OurDelta build (as contributed to by Google, Facebook, etc, not the crappy stock one)
                              Switched all tables to InnoDB to prevent big table locks from holding up other queries
                              Added a few indexes here and there - most tables already have good indexes.
                              Apache has keepalives turned off, and is limited in number of spawned processes for predictable memory usage

                              And how does it perform?

                              Pretty good! CPU usage is fairly low most of the time. The server's nominal CPU usage is just a few percent. Disk transactions per second for MySQL is very low on average versus the ceiling. The only time when it gets bogged down are when there are suboptimal queries in play. There is always free memory, though some inactive stuff is swapped out. There is no swap churn. Basically normal.

                              What does this mean?

                              It means the hardware is sufficient for the load that's placed on it. It means there's not a hell of a lot we could do better in optimizing this beast. While the hardware may seem like a joke in today's terms, just a few years ago it was still pretty good!

                              So what are the symptoms of the problem?

                              Bots disconnect fairly regularly, MySQL database has literally thousands of active connections going to it (The limit should be around 30 per core). This morning I woke up to find 1970 open, persistant connections to the database, which is just a retardedly high amount.

                              What are the causes of the problem? There are three main causes that I can see, with my somewhat distant point of view.

                              The first one is the sheer number of MySQL database connections. I can't tell whether this is a bug in the Java - MySQL connector, the bot core's connection queuing pool library (certainly possible), or the bots themselves (highly likely), but this needs to be addressed first. I should be able to limit this to 100 and have it not go bezerk.

                              The second one is the total number of UDP packets per second that the bots send to the server. I believe the limit is set to tolerate a little burstiness, but if the number of UDP packets exceeds a reasonable threshold the server will defend itself and disconnect any connection coming from that IP address, basically causing all bots to disconnect themselves. We had this problem earlier, and it was the reason we put RoboQueen on the server itself. This allowed us to use the local IP address pool of 127.0.0.0/8 to give every bot its own IP address, and it's own limits. This prevented the entire core from killing itself if a single mistake was made in a bot. Additionally it allowed us to bypass limits on login rate, so we could spawn bots quickly and without delay. However since access was limited, RoboQueen bots were designed to simply never be changed. One thing we had to do however was create a synchronization system where data was transferred from the MySQL database on the server to the main Trench Wars database, which was external. Also very few people had access to RoboQueen, and the server had extremely strict permissions set for access, making working conditions difficult, so at some point in the past this practice was stopped. That's OK, but all the bots these days must behave nicely so that problems do not occur.

                              The third one is lack of maintenance of the database, and lack of maintenance on the bots. While the majority of the large queries have been eliminated, there are still a few left that completely trash the MySQL server causing it to slow down. This causes the bots in the game to slow down. These need to be eliminated. Sure, a faster server would plow through this faster, but really it shouldn't be happening in the first place. Also, until very recently nobody seems to be doing any work to figure out what the problems are with the bots. I had proposed a series of simple strategies to determine what's wrong with the bots, but for months not enough has been accomplished.

                              So that's where we stand. I hope everyone can work together to find the solution, but this requires that people stop pointing fingers at each other and get down to business. Don't waste time, go do stuff. Talk to each other, get this done.
                              Last edited by DoCk>; 01-28-2010, 02:12 PM.
                              TWSites.com - TWSites.com Web Hosting Services
                              qan> dock's raw animal magnetism and sheer ability to reboot bot cores inspires lust in all genders :P
                              3:wadi> no yawning on the internet.

                              Comment


                              • #30
                                Originally posted by Maverick View Post
                                He can get the code at www.twcore.org .
                                Most of the bots are downloadable with the exception of a few; MatchBot, PubHub, PubBot and some others.
                                These are not publically available as knowledge of how the bot works can be used to cheat (not very likely though). If you want to get those bots, you have to ask permission from a Trench Wars sysop.
                                I don't think the problem is entirely coding related, i think its down to mySQL/apache/java settings.
                                Rediscover online gaming. Get Subspace

                                Mantra-Slider> you like it rough
                                Kitty> true

                                I girl with BooBiez> OH I GET IT U PRETEND TO BE A MAN


                                Flabby.tv - The Offical Flabby Website

                                Comment

                                Working...
                                X