The final stretch
Sorry it's been such a long time since the last update. For the past few weeks I've been living under a rock, with nothing but coding for this month's server migration. In case it wasn't mentioned before, we are moving all of Gaia's servers to cloud computing systems.
Today Gaia runs on more than 300 servers. We purchased and ran those servers in a "colocation facility" which is a giant air conditioned wear-house full of computers from different companies. We pay a hefty rental fee on top of power usage fee to the service provider, while every week we deal with maintaining the computers. Remember how old computers could have a noisy fan that died of old age, or a hard drive might start getting corrupted? With hundreds of machines, parts die and require replacement on a weekly basis. Maintaining these hardware ourselves is just too much.
Over the years we also wrote our own software for monitoring the servers, make sure they run smoothly and alert us if things go wrong.
But with the advent of cloud computing, companies such as Amazon had built a huge suite of tools to make it easy for companies like Gaia to setup and maintain a large number of servers. Whether it's setting up a system that will add or remove serves depending on usage, or adding more backup or security, the new system will save us a lot of time from having to do everything ourselves.
The idea of moving everything to the cloud is great. But the actual work is a massive undertaking. Mainly because we have so many different systems that were built since 2004. Gaia is much more than a bunch of web servers, we have:
- 30 database clusters, each with its own master/slave replication
- API servers for handling games such as Gaia Towns
- MMORPG servers for handling chatting and positioning info in Towns and zOMG (sushi, smartfox)
- memory cache servers for speeding up data retrieval
- search engines for searching through forums/guilds/avatars
- file system servers for managing terabytes of avatar data
- host of dynamic web servers
- load balancing servers to distribute load between all web and game servers
- static server for loading images
- avatar sprite server for avatar compilation
- internal wiki and bug reporting server
- intranet servers for remote developers/artists to access internal data
- versioning servers for backing up program source code and item sprites
- management servers for accessing databases and server health monitoring
- email server
Then there are softwares that works in conjunction with existing hardware that need to be changed for the new system
- system to deploy new code to all web servers when changes were made
- system to edit or upload new item sprites
- search related applications need to use new search engine system
- all applications involving file storage require modifications
My mind is blurry so I'm missing a lot of details in the migration. But we reserved 3 months of our time for the migration effort, and we're using up every single minute of it. After 2 and a half months, we're starting to come close with all our data uploaded, content delivery networks setup, database replicated, web server farm created, and most of the applications had been configured to work with new systems and their locations.
Of course, for such a large project, the last two weeks are going to be even more hectic as we patch up all the bugs from the many massive changes that were made. Some of you had already seen different bugs popping up here and there in the past few weeks, and unfortunately there might be more to come, especially for smaller features such as our flash games.
I can't wait to complete this migration so we can have a stable system, less downtime, and get more time to work on features you love. I very much look forward to features like image uploading which we should have had years ago.
Time to jump back to work! I hope that I'll be less stressed the next time I work on my journal. smile