How To Deal With a Big High Score Table?
It all started with a task on ActiveCollab, a “simple” bug fix on high score table. It’s funny how everything looks simple in the eyes of users, clients or even project managers. Well, to be honest, I’ve underestimated this specific problem too, and I’m a developer.
The problem was with high score table. For some unknown reason, it takes over 15 seconds to load if you’re lucky, it usually takes even more. So, I’ve rolled up my sleeves and started going trough undocumented code previously written by one developer and then altered/upgraded by another (it was a hard read). Couple of minutes later, the problem was found.
Did you predict thousands of users to swarm the high score table?
The developer who made the initial code didn’t test or predict how his code will behave with a large number of users. My guess is he didn’t even have all requirement specifications when he started and just had to “wing it”. More specifically, at first, he probably just had to load high score table based on user score, and later came the project manager and said that he needs to add additional score to users for every sent and accepted invite. The requirements like that usually happen in the middle of the project or even near the deadline without any extensions because “Hey, it’s just this simple feature, you can do it in 5 minutes, right?”. And it doesn’t end there. At some point, he also had to make position sharing for users with the same score, BUT the ones who got that score first, had to be at the top. At first, high score worked just fine. If tested on 50-100 users, it would render in no time, but as the high score table gets more and more populated, it takes more time to render, culminating at 7k+ users with 15-20 seconds delay.
Cheaper & “expensive” development solution
In order to optimize, we have some choices, rebuild the whole feature with high number of users in mind which can be slow and painful and, at most times, too expensive solution. A cheaper solution is to think of a workaround and just tweak a bit here and there.
Cheaper solution
In a case of an upcoming deadline or if we’re unable to invest enough time to make a complex solution, we can optimize the existing code. The first thing we can do is decrease number of database queries. The current code gets all players ordered by score, and then for each player makes a check with all other players to compare scores and save current position, checking time when the score is altered to reposition users. By eliminating position share feature, we don’t need to calculate position of every user, we can use current player position from array of users, and make a time check only on players who have the same score.
This solution, even though not perfect, decreased rendering time from 15-20 to 2-3 seconds which is good enough for a temporary fix.
Expensive (complex) solution
More complex solutions would include storing a fraction of high score table in memory, make all necessary alterations and writing to database when data is ready.
For example, if position sharing feature is required, we need to alter positions of players who are in the range. Let’s say a user in position 15 has a score of 120, he gets 10 more points and now has more than players in positions 14, 13 and 12. In this case, we would only need to alter players from 12-15, a recursive function that would check current player score vs the next one and switching positions of both players if needed. Unavoidable problem pops up again when we have the same score. It can be a lot faster if position is not shared with 2 or more players, because now, let’s say the player in position 12 has the same score as player who was previously in position 15. They need to share position 12, and we have to go back (or down on the high score table) and alter every position beneath the position 12 (also now we have a hole in the table). Let’s visualize that a bit:
position – score
12 – 130
13 – 128
14 – 124
15 – 120 (+10)
16 – 118
———————
12 – 130
13 – 128
14 – 130
15 – 124
———————
12 – 130
13 – 130
14 – 128
15 – 124
———————
Without position sharing, this works pretty fast. Now, when the position has to be shared, we get this problem:
12 – 130
12 – 130
-empty-
14 – 128
15 – 124
———————
So, now we need to remove that empty position by affecting almost the entire table. Changing position 14 to 13, 15 to 14, 16 to 15 etc. Now imagine if you have million players.
Position sharing feature, is it worth?
I don’t think position sharing feature is worth the resource abuse. Also, let’s not forget that in the case when there is no position sharing, we do not need to check the timestamp when the score was achieved, we only switch positions IF the score is greater, not equal.