Recently, several game company's operation and maintenance buddies approached me to complain, saying that overseas players are always lagging and dropping, and the DDoS directly paralyzed the whole service. I took a look at their CDN configuration on the happy - these days there are still people close to the source without adding any defense strategy?
End game this thing than hand and page tour are pretentious, latency over 80ms can be players sprayed into the sieve. Some vendors think to buy a large bandwidth CDN on everything, the results of the nodes are concentrated in a few rooms, Southeast Asian players connected to the North American nodes around Europe, which can not card?
I have tested a manufacturer's “global nodes”, claiming that 200 + nodes, the results of South America and the Middle East are all virtual nodes, the actual traffic are dumped to Europe and the United States room. Player ping value shows 120ms looks good, the actual traceroute a go, around half the world.
True node coverage depends on three hard indicators:The number of physical server rooms, the quality of BGP links and the localized operators are docked. Like CDN07, a veteran service provider, there are 12 physical access points in Southeast Asia alone, and local operators are directly peered, Jakarta player latency can be pressed to 40ms or less.
Last year to a shooting game to do migration, we did a comparison test: the same Tokyo player access, the traditional CDN to take the path of Tokyo → Los Angeles → Tokyo, and 08Host's anycast network directly localized access, latency from 147ms down to 61ms.
High-defense CDN is a pitfall disaster area. Some vendors of the so-called “T-level protection” is simply false labeling, the actual encounter 300G or more attacks directly null route your domain name. Really want to see the defensive ability to look at the distribution of cleaning centers and black hole trigger mechanism.
Now the reliable programs are mixed scheduling: normal traffic goes to the acceleration line, and the attack traffic is automatically scheduled to the cleaning node. Like CDN5 mechanism is more intelligent, according to the type of attack to choose a different strategy: SYN Flood to go outside the country cleaning, CC attacks with behavior analysis engine, dnsQuery Flood directly local solution.
This is the nginx scheduling configuration we are using:
Never believe those who claim “unlimited protection” vendors, really encountered more than 2T attack, either cost explosion or direct dumping flow. It is best to do a good job of traffic grading, core game traffic with high defense line road, update packets and announcements go ordinary CDN.
The node selection strategy must also follow the players. We found an interesting phenomenon before: Brazilian players were connecting faster than local nodes through Miami nodes, only to realize that the local carriers' interconnections were too poor. Now it's all about dynamic routing with real-time probing tools:
Cross-country link optimization is the real deal. Some regions look at more nodes, but they are single-line server rooms. Like the Middle East, you have to look for a multi-line BGP server room with UAE Telecom, Ooredoo, etc. Otherwise, Dubai players can lose packets even in Qatar's server room.
Now the more ruthless is the intelligent route prediction, which automatically adjusts the path according to the time. Europe and the United States go to the Asia Pacific link during the evening rush hour, and the Asian golden hour goes to the European transit, which has been measured to reduce the 22% delay fluctuations.
Protocol optimization is the hidden buff. QUIC protocol performs so much better than TCP on mobile networks, especially in South Africa and India where the network fluctuates a lot. After we forced QUIC on for South African players, the lag rate dropped 38% directly.
But pay attention to the CPU consumption of QUIC, it is best to use a load balancer that supports hardware acceleration. Tested a vendor's software implementation, the CPU directly soared to 70%, and then changed CDN5's dedicated hardware to pressure to 15% below.
The monitoring system has to be kept up to date as well, just looking at bandwidth usage is for amateurs. We are now focusing on three key metrics: TCP establishment time by region, number of new connections per second, and SSL handshake latency. Suddenly, we found that the SSL latency in Japan had skyrocketed, and we investigated that it was a certificate chain validation anomaly.
Finally, a lesson in tears: don't use a shared IP CDN to save money! One time, a competitor was DDoS'd, and as a result, our game suffered from the same IP segment. Nowadays, vendors are required to give independent IP segments, at least isolated/24 network segments.
If you are really worried about the cost, you can do what we did: use a high defense exclusive IP for core battles, and use a shared CDN for game halls and update downloads. this way, you can carry the attacks without exploding the bandwidth cost.
Nowadays, good CDN vendors offer APIs to dynamically adjust policies, such as automatically switching cleaning modes based on real-time attacks. Here's a snippet of the automated scheduling script we wrote:
In fact, the player experience is a systematic project, it is not enough to have nodes, but also intelligent scheduling, protocol optimization and security defense. Now look down, CDN5 is the strongest performance in the Asia-Pacific region, 08Host's European coverage is denser, and CDN07's protection system is the most complete. It is recommended to mix and match according to the distribution of players.
One last truth: no CDN can 100% cover all regions. We use a local vendor CDN77 in Eastern Europe, although there are not many global nodes, but the quality of access to localized operators is stronger than the international big players. The key is still to monitor the real experience of players in real time, do not just look at the console data.
Next time you encounter players complaining about lag, don't rush to add bandwidth first. Take out the traceroute to see the actual path, maybe just a cross-border link congestion, change the access point will be solved. The water in this business is very deep, but there are only a few key points when you get through it.

