I admit – I play Pokémon GO. I was already walking 3-6 miles a day anyway, so it just gave me something to break the monotony of long hauls in the wee hours of the morning. That being said, it’s also been a great case study in scalable systems engineering and massively deployed client systems.So far Pokémon GO has been one of the fastest launching and scaling applications in recent history; quite possibly even of the internet age.
Between iOS and Android devices, #PokmeonGO is downloaded 81 times per second every day. [credit] @bargainfox https://t.co/nVl1gJ3jso
8/4/16, 3:30 PM
That being said, it hasn’t exactly been the smoothest launch. Server uptime issues, app bugs, and lack of transparency from Niantic have stirred the community up quite a bit, as can be see on Reddit and Twitter. Having watched the ecosystem roll out and blow up the internet from about Day 2, here are my takeaways on lessons learned regarding the launch of something of this scale (if I ever get the opportunity to do so).
Assume it’s gonna take off. Scale accordingly.
The single biggest problem Pokémon GO has suffered has been server outages. Granted there have been asshat-ish DDoS attacks on the cluster and some irresponsibly-built “radar” apps for locating Pokémon by repeatedly hammering the servers from fake accounts. I’m not sure what Niantic spec’d out for their initial cluster, but it clearly wasn’t enough, and didn’t scale fast enough for the load. Lessons learned:
- Make sure your cluster either starts off big enough to scale manually and not sacrifice uptime, or make sure it can autoscale to keep up with load.
- Try to anticipate services that third parties might try to build, and provide that service yourself in a controlled manner, or provide official throttled APIs for vendors to leverage instead of them hijacking your app channels.
- Make sure you build your service in a manner that supports automated scaling solutions and anticipates burst usage.
Make your client tolerant of server outages.
The only reason the server downtimes are such an issue to Pokémon GO users is – well, the app doesn’t work when the server goes down. Tightly coupling every action in the app to a server response might seem like a good idea at the time, but if your servers can’t keep up, then you just wind up with frustrated customers. Lessons learned:
- Try to offload as much logic as possible to the app.
- There is no reason in the middle of an interaction that a server query should hang or kill an application. Timeouts and sane use of Optimistic Models in app interactions can limit the service coupling.
- Many reverse engineering attempts of Pokémon GO (due to lack of a public API, see above) have identified that the app pins itself to a single server at login. When that server rolls over, your app hangs. Period. Have some reasonable fallback strategy (e.g. heartbeat or timeout fails, query the DHT service for a new server).
- Remember that in the CAP theorem, only two features can be effectively supported at a time. In a game app where lives and fortunes are not on the line, I’d say focus on the Availability and Partition Tolerance support and err on the side of the customer for Consistency.
Manage your community.
If Pokémon GO had been a startup in the app economy, you can bet they would have had a dedicated Community Manager as one of their first hires pre-launch. Having an established brand and almost cult-like following of that brand, this should have been a no-brainer for Niantic. After all, they even had one for Ingress, which was it’s own product. Lessons learned:
- If you’re going to release something that could potentially have or tap a large community and has a social aspect to it (Teams Mystic, Valor and Instinct anyone?) you need to have official channels to support the community. Community fosters growth, adoption, and happy customers.
- Communication is key. I understand there is a lot of proprietary black-box stuff going on in the Niantic system, but communicating clearly and regularly with your community about what you’re doing and why is important. Otherwise you risk alienation, false rumors, and speculation which is then beyond your control. After the fact is often after it matters.
- Make sure your service supports your entire community, not just a fraction of it. Players of Pokémon GO not in dense urban areas are often frustrated with the lack of critters, stops and gyms within any reasonable distance. These players will most likely drop off the game from frustration once the novelty wears off, which is lost revenue opportunity.
Be mindful of your platform.
So, can anyone out there play Pokémon GO for more than an hour or so before having to plug in to recharge? Unity is without a doubt an incredible game engine, but it is also incredibly power hungry. If you sit and play a traditional 3D video game for an hour on your phone, you’re most likely in the minority and still never see your battery levels drop like a basejumper. Pokémon GO is intended to be kept running for hours at a time, and as such the game engine is forcing people to strap on battery packs, have multiple devices, or launch-kill-launch the app in cycles to conserve power in between key locations. Lessons learned:
- Be mindful of your platform, and what its resource limits are. And if you do offer a “battery optimization” mode, please make sure it works and doesn’t sacrifice app effectiveness.
- Be aware of how your app is going to be used – a lot, a little, in the foreground or as a background service. Are long-running CPU-intensive tasks required for the app to provide its features?
- Be flexible. Give your customers the option to throttle the app’s resource usage in a way that best matches their system (phone, tablet, etc).
I love the game – and wish Niantic the best in continuing to roll out the game to more markets as well as adding new features that customers are clamoring for. The platform offers a business opportunity only (barely) limited by the global map, and is one of the first MMORPG to go truly massive in such a short time in the augmented reality space. All that being said, it’s also been a pioneer in a launch of this scale, and in no way am I slamming the company, but instead leveraging the opportunity to learn.