[Essay] Game Security: A Brief Discourse on Data Mining

@codewiz · April 29, 2013 · 7 min read

One trend blowing through the game security industry these days is data mining. It has some relevance to today's hottest keywords such as big data and cloud computing, and academia has been consistently pushing this method for a few years, resulting in considerable attention to it in the industry.

But is data mining truly effective? Frankly, I am somewhat skeptical about the current level of data mining. As I mentioned in a previous post, the data mining techniques that we currently attempt or claim to use have inherent limitations due to their origins. However, there seems to be a perception in the game industry these days that data mining approaches are much more fundamental problem-solving methods than the existing game security solutions, so I would like to add a few thoughts on this matter.

#0. Cost

For example, NCSoft has reported that games like Lineage, Aion, and Blade & Soul generate about 2-3TB of game log data per day, and they've built a data mining system capable of storing this data for about two months. Through this system, they are identifying and weeding out users who engage in illicit acts such as botting or running illegal workshops. If you need to store 2-3TB of data for 60 days, that alone requires 180TB of storage space. Considering errors and such, you would need even more storage capacity. Building and maintaining a system to reliably store and manage this data is no small feat.

The real issue, however, is not data storage. Extracting meaningful data from the stored records is another formidable challenge. Even if an open-source platform is used to build it, considering the shortage of experts in the field, we can surmise that building a data mining system is costlier than expected. In addition, as there are no companies offering tailored solutions for mining game data, everything has to be built from scratch.

This leads to a conclusion. While data mining might be valid for leading companies with investment purposes or a desire for much higher levels of user surveillance, most game companies may find it challenging to adopt due to cost considerations.

#1. Speed

Data mining, just by the sound of it, seems elegant and impressive. However, for data mining to exert its power, data must accumulate. No data, no system—just an empty shell. The need for data accumulation implies that players must persistently engage in activities that meet the system's detection criteria. Only after significant time has passed and systematic abuse aligns with the data mining results will there be enough evidence to sanction the offenders. Currently, the data mining system is slow and geared towards very late reactions.

The only sanction that companies using data mining have is to block accounts. It's worth thinking about the effects of account blocking, which largely depends on how the game charges its players—whether it’s through microtransactions or a subscription fee.

For microtransaction-based games, account blocking is virtually worthless because creating a new account is as easy as it gets. This may vary slightly in countries like South Korea, where a unique identifier like a social security number is a scarce resource. However, few countries have this system, making account blocking less effective internationally compared to its domestic use. Although modern game security solutions like XIGNCODE3 offer auxiliary tools to aid in account blocking, a fundamental issue remains—the inability to block offenders at the moment of infraction.

Next are subscription-based games. Here, account blocking means the user loses the subscription fee, meaning the effectiveness of this system depends on how significant the fee is to the hacker. Suppose it takes two days to identify cheating behavior. If the cheater can make more money from hacking than the cost of a two-day subscription, the system inherently encourages continuous cheating.

The critical issue here is the Return on Effort (ROE). Regrettably, "gold farmers" can often recover their subscription costs and generate significant extra value within the time it takes to be detected. If they concluded it wasn't profitable, the gold farming operations would cease. Sadly, they usually disappear not because of the gaming companies' actions but due to declining game popularity or saturation of the black market with items, driving prices down.

#2. Judgment Criteria

Consider a player achieving 100 consecutive wins within an hour is theoretically impossible in a game. The game company sets this as a benchmark, telling the mining system to flag users who accomplish this. However, all norms come with an allowed error margin because a legitimate user could achieve this by chance. First-time offenders or those who hit the benchmark only occasionally are overlooked, and consistent offenders are blocked.

One might think this sounds reasonable. But what do hackers do? They won't wander eternally unable to find the rule, getting all their accounts blocked. Instead, they learn that if they stop at 99 wins, they won't be caught—a fact quickly disseminated among the cheating community. Perversely, this hidden rule is often revealed by the users themselves. With popular games, a broad user base with varying approaches to exploits inadvertently leads to this collective intelligence unveiling the game company's obscure sanctions simply and swiftly. Adding human psychology to the mix accelerates information sharing. If there are hackers not yet blocked, they would want to boast, divulging the secret in forums or small circles as proof of their knowledge, thereby publicizing the method.

So the game company tightens its criteria, adjusting it to 90 wins with increased error margin. The hackers adapt again and again. Eventually, the standard may drop so low that it becomes achievable by average players, making data noisy and useless.

#3. The Critical Minority Report

False positives are always problematic, but in data mining, they are devastating. A typical security product mistaking a legit user's actions can be resolved by using a different computer or reformatting. However, a legitimate user blocked by data mining faces considerable penalties. They can potentially appeal and restore their account, but most won’t bother due to the emotional component—why keep playing and paying for a game that has wrongly banned you?

Data mining blocks can be the final resort, potentially maintaining game concurrency without any use. Hackers, when blocked, quickly move on to new accounts, while legit users, driven by human emotion, abandon the game, possibly leaving the game with a barren user base overrun with bots—eventually leading to an eerily empty game world. Last year, an online game boasting its ability to detect tens of thousands of bot users per day quickly lost its initial popularity, reminding us that bravado often ends as just that.

#4. Is Data Mining Useless, Then?

Despite its weaknesses, current data mining isn't entirely futile. Its major strength lies in its ability to confront unknown hacking tools effectively. Even used solely for monitoring, it has value because knowing the extent of hacking tool usage changes everything. For instance, if monitoring identifies specific users, more focused analyses can reveal the nature of the undetected tools and potentially enable the application of more stringent security policies for these users.

Years ago, we ambitiously embarked on a white list policy—aiming to filter all clean binary files through Wellbia.com. We had to shift directions due to inter-company conflicts and the overwhelming quantity of clean binaries. However, limited environments could still find such policies meaningful. For example, we recently received an inquiry from a game publisher about maintaining heightened security on servers designated for online game tournaments. With restricted user access and no need to tolerate unrelated malware, a white list policy fits perfectly. Similarly, synergizing data mining results and game security solutions may yield more effective penalties and maintain higher levels of game security service.

It's said the world operates on 0.1% pioneers, 0.9% imitators, and 99% surplus. So game companies with the capacity should persistently research in this direction — not necessarily for current but for future generations, who from their early years may get to enjoy a more pleasant and fair online gaming environment.

Note) The XIGNCODE3 development team is very interested in alternative game security solutions like data mining. We are looking for people with backend system development experience or skills in Python and web programming. Reading this, we sound like we're searching for a MacGyver. But isn't that what all small companies look for? ㅠㅜ~ Anyways, if you are interested in such work, please contact codewiz at wellbia.com.

Hackers avoid it, gamers want it...<br><br>Reputation isn't built overnight.<br><br>XIGNCODE3 is the easiest way to make your game fair.

Hackers avoid it, gamers want it...

Reputation isn't built overnight.

XIGNCODE3 is the easiest way to make your game fair.

@codewiz
Looking back, there were good days and bad days. I record all of my little everyday experiences and learnings here. Everything written here is from my personal perspective and opinion, and it has absolutely nothing to do with the organization I am a part of.
(C) 2001 YoungJin Shin, 0일째 운영 중