After being back, I've often heard that new item release would cause the site to slow down, or the site to go down all together!
Some had attributed this to badly created CI or bundles, while others think it might have to do with large inventories.
Yesterday night the problem came back. The items database had went down, taking the whole site along with it! gonk We had to disable avatar inventory until the website came back.
But this time, our operations engineer helped us locate the slow database call which caused the slow down. (thanks James!)
Finally, we're able to track down the "bug" that had been causing the site to go down.
On other news, a few more inventory problems were tackled. Looks like the front end code need a lot of optimization, especially after global stacking is enabled. More about that later.
I wrote a report of the site outage bug to other developers, thought it might be
interesting to share with you guys too:
Site outage bug
Our inventory reader is responsible for bringing up everyone's inventories. Whether for marketplace, trading, avatar dress up, or item arranging. The reader has two modes:
- regular query join - which read from the user inventory table then read the gaia_items table for item properties such as item names. These queries are big and slow, but data is cached on a page by page basis
- mashup mode - this mode will read and store all of the user's inventory info in memory, then perform filtering and pagination on the web server instead of the database server. This mode is great for lowering the amount of queries to the database. It takes up a lot of web server memory, so we use it for inventories under 1000 items.
As it turns out, mashup mode need to store a ton of metadata so it can filter items, such as filter by housing items, tops, bottoms etc. To have this ability the script need to collect and cache item_ids for every item group type (we have 15), reading through the item database table with 250K rows for 15 times. As it turns out, creating this filtering data takes about 2 minutes and it's refreshed every hour.
Now here's the kicker - this 2 minute task can be triggered by any users who brings up the inventory. So while one wait, if 10 users brings up inventories, then we have 10 more update requests happening at the same time!
There are several fixes.
- First of all the refreshing could happen less often.
- The database could have a better indexing scheme to speed up the process
- The process itself can be optimized to not run through 15 times to get the desired result
- The actual update task need to be run by one scheduler, not by hundreds of users at the same time
But at the end of the day, I did some performance testing, and looks like the mashup mode does not give us that big of a speed saving. I ended up disabling it, and nobody should really notice the 0.1 second difference in load time.
I'm not 100% sure if that's the only thing that is causing servers to go down, but I'll be sure to keep an eye on things starting tomorrow. 3nodding
I have another item server optimization job to do. After that, we will no longer need to close the avatar builder from time to time! whee
View User's Journal
Lanzer's Journal
User Comments: [10]
User Comments: [10]