[AntiCheat] The Unending War: Non-Client Bots

@codewiz · July 31, 2011 · 22 min read

Summary

Like the universe that evolves over the eons, the battle between the sword and shield of security is structured to be unending. Here, we take a look at one such issue in this eternal struggle—the war against non-client bots. We'll explore the definition and structural characteristics of non-client bots, as well as review the strategies that have been discussed to counter such bots.

About the Author

Shin YoungJin [email protected], http://www.shinyoungjin.com

I founded Wellbia.com Co., Ltd. where I am developing game security products, XIGNCODE and XIGNCODE3. I have a keen interest in system programming and have participated in the development of numerous PC security programs. I am active as a Microsoft Visual C++ MVP and the moderator for the Visual C++ section on Devpia. I am a huge fan of Steve Barakatt and Grey's Anatomy, and there was a time when I was obsessed with WoW. In a nutshell, I'm an eccentric.

Author's Notes

I wrestled with the content for a while after being asked to write a post on the topic of "security programming." I did not want to include somewhat cliché topics that anyone would think of when it comes to security programming – such as the risks of integer operations, format string bugs, buffer overflows, and the like. On a different note, perhaps I was afraid to write about the topic of "security" despite having worked for nearly a decade at a company specializing in PC client security. The word "security" encompasses content that is too vast and extensive after all.

Hoping to avoid stale topics and believing that writing about the unknown is plain foolish, I decided to pull from the concerns I have pondered over the years. The non-client bots I'm about to explain have been a serious issue in the past, continue to be a problem in the present, and astonishingly, there still isn't a perfect strategy to counteract them even at the time of this writing. I start this post with the hope that maybe, just maybe, it will stir some inspiration in the readers to come up with innovative new countermeasures that would be tremendously meaningful.

For readers who may be interested in learning about general "security programming," as previously mentioned, I highly recommend these three books: Writing Secure Code 2/e, Windows Vista Security Programming, and Secure Coding in C and C++. I confidently assert that no other books can provide more inspiration regarding the perspective of general security programming.

Introduction

This problem is similar to the well-known Turing Test. Think about your girlfriend or boyfriend, or, if you're not currently in a relationship, your parents instead. Imagine there's a wall, and on the other side sits your beloved person. You can ask questions, and they can answer you. However, the conversation occurs through an intermediary, who can manipulate the answers and questions to make the person behind the wall appear to be the one you know. The crux of the issue is whether, under these circumstances, you could discern whether the hidden respondent is truly the person you know or not. Of course, you can continue asking questions until you are certain of their identity.

Most people would first think to ask something only they and the other person would know—the location of their first date, physical characteristics, or preferences. Yet, if there's a chance that an unknown entity behind the wall has taken your loved one hostage, even these personal details become meaningless as they could be forced out through threats. If we consider such extreme scenarios, what questions should we ask to solve this problem? And can such questions truly verify the identity of the unseen individual behind the wall?

Though briefly explained, this issue has long been a concern in the security industry in relation to remote client authentication. If you immediately think of usernames and passwords upon hearing 'authentication,' you're off the mark. Here, it's not the authentication of remote users, but rather the authentication of the remote client programs themselves that is critical. Why is the authentication of a remote client so vital?

The gaming industry has highlighted this challenge prominently. In games, it’s possible for either a person or a machine to play. Consider this: you’re playing a game of Go online with Chulsoo, but what if you found out that it wasn’t Chulsoo playing, but his computer, especially if you lost the game? If that doesn't bother you, then consider the popular MMORPGs. These games involve character development through actions within a virtual world, which invariably requires time and effort. If you invested your own time and effort to develop your character, how would you feel to find out that Chulsoo's computer played on his behalf and achieved those same results automatically?

For those who don't play games, these scenarios might just seem like unfortunate events. However, the real problem surfaces when money enters the equation. In online games like MMORPGs, the virtual currency used in the game world can be exchanged for real-world money—one gold in the game might be traded for 100 won in real life. In fact, some popular games have their own exchange rates akin to currency markets. The acquisition of gold in these games is usually through what is essentially a simple, repetitive task—hunting monsters. If a machine performs these repetitive tasks and the operator profits, is that fair? And what if these machines reduce the resources available for human players in the game world?

There are two significant issues at stake here. One is that such illicit gaming practices negatively impact the operation of the game. If everyone resorts to cheating, who would honestly want to play the game by the rules? Eventually, the game would be abandoned, populated solely by machines, and then by no one at all. The other issue is the distortion of the item trading market. In a normal environment, the resources available to the market would be limited. But with bots flooding the market with easily-obtained resources, these can be sold cheaply, effectively devaluing the efforts of honest players and rendering the market unfair.

The automation of gameplay by illegitimate client programs harms not just the gaming companies but also the players. Hence, blocking these illicit programs is an urgent and important task for game developers.

Non-Client Bots

In the gaming industry, there are malicious programs like the ones previously discussed that sit across the wall, pretending to be real people we know. These programs are known as Non-client Bots. More precisely, a Non-client Bot imitates game protocols, enabling a separate client, not the actual game client, to connect to the game server and perform actions just like a player would. In other words, 'Non-client' means there is no actual client, and 'Bot' signifies that the game is played automatically.

The reason such complex protocol imitation is involved in creating a Non-client Bot is that it doesn't require the heavy graphical resources loaded by a game client, making it possible for multiple Non-client Bot programs to run simultaneously on a single PC, facilitating smooth concurrent operations. To put it differently, more efficient simultaneous execution means that more game resources can be acquired per unit of time. For this reason, Non-client Bots are often secretly produced in places known as 'farms,' making it difficult to find the actual running programs.

Figure 1 shows the communication structure between a normal game client and the server, where C represents the game client, and S stands for the game server. The illustration conveys that communication occurs through the server sending a packet Q, to which the client responds with R. Figure 2 presents the structure of a Non-client program that has hijacked this system. Though the client is modified or newly created and labeled as NC, it imitates the communication structure so well that from the game server's perspective, it's impossible to distinguish whether this client is a legitimate game client or not. The core technology in implementing a Non-client Bot lies in appropriating this communication structure between the game server and client.

Figure 1 Normal Client Communication Structure

Figure 1 Normal Client Communication Structure

Figure 2 Non-client Communication Structure

Figure 2 Non-client Communication Structure

Protocol Modification

The first response developers took against the emergence of non-client bots was protocol modification. Figure 3 depicts the change in communication methods. The original values Q and R have been changed to new values, Qa, and Ra. As a result, the older non-client bots could no longer communicate with the server.

Figure 3 Communication Structure of Protocol Tampering

Figure 3 Communication Structure of Protocol Tampering

At first glance, this approach appears to be highly effective, as it ostensibly wipes out existing non-client bots and saddles their creators with the burden of analyzing the new protocol. However, this is a great misconception. The main reason this belief is wrong is that redesigning the entire protocol structure of existing servers and clients is not at all a mild task. In fact, such an endeavor is almost akin to madness. Typically, the most that can be chosen in a serviceable program are additions of new packets or changes to the constant values of existing packets. Yet, to a bot creator who has already analyzed the entire protocol, these tasks are far from difficult; they only need to track the changed parts. In conclusion, this method requires significant effort on the defenders' side but not so much on the attackers'.

The usual first reaction upon hearing this is to ask, "Then why not encrypt the packets?" This question comes from a lack of understanding of the discussion so far. It's not as if non-client bot creators do not examine the raw packets; since they almost know the entirety of the game's code, encryption is hardly a big obstacle for them. The decryption code inevitably has to be present in the client. Assuming the case of an exceptionally skilled hacker, the effectiveness of packet encryption could be considered virtually zero. Of course, when you take into account the numerous mediocre hackers that exist in the real world, encryption is more effective than not doing it at all. However, it’s unlikely to have a significant impact on those with the skills to create non-client bots.

Key Authentication

The first idea that surfaced with the active involvement of a security company was key authentication. Key authentication refers to the exchange of secret questions and answers that only the client and server know. This method involves the server transmitting a specific key to the client, which performs a highly specialized operation on the key, and then sends the value back to the server. The communication structure of this method is shown in <Figure 4>. Both the server and the client possess a prearranged and highly secret operation E. With this setup, the server sends Q to the client, which then processes it using its operation E to produce the result R, which is sent back to the server. The server then performs the same operation to verify if the R responded by the client matches its calculated answer, determining whether the client is genuine or not.

Figure 4 Key Authentication Communication Structure

Figure 4 Key Authentication Communication Structure

Picture this like applying the security cards used in banks to game clients for better understanding. The Q sent by the server corresponds to the index on a security card, E represents the process of finding the value corresponding to the index, and R signifies the four-digit security code recorded on the security card. As this method is quite similar, it shares the same vulnerability as the bank security cards—if the card itself is lost, it becomes useless. In other words, if the highly confidential operation E we referred to earlier is stolen, hacking tools can disguise themselves as legitimate clients. <Figure 5> shows the communication structure of a non-client bot that has stolen the operation function E.

Figure 5 Non-Client Communication Structure Using Stolen Key Authentication Function

Figure 5 Non-Client Communication Structure Using Stolen Key Authentication Function

What does it mean to steal an operation? In the context of computing, operations essentially become a set of codes that are executed, meaning they become a function. Consider that the operation E we saw in <Figure 4> can be a simple XOR operation, with both Q and R being 4-byte integers, and E being XOR. While the communication structure may seem complex without knowing the rule, it's easy to exploit once the rule is known. Of course, non-client bot creators are not deducing the rules by observing raw packets; instead, they look at the client game code and directly use that code.

List 1 shows an operation called SomeFantasticFunction, which one might imagine was used in the original client. Here it simply performs an arithmetic operation and returns the result. If this code were compiled and run, the associated assembly code would exist somewhere. Non-client bot creators' task is to locate that code.

List 1 Original Operation Function

ULONG SomeFantasticFunction(ULONG Q)
{
    ULONG R;
    R = (Q >> 13) * 34 + 1573;
    return R;
}
displays a screen capture of using a static analysis tool like IDA to find the aforementioned code. On the right, we see the assembly code, and on the left, the byte code. A skilled non-client bot creator would understand the function's operation, but in fact, that's not even necessary. Once they verify that the function is executed at a specific moment, they can simply scrape the byte code for use. List 2 shows the stolen function code, which stores the byte code and copies it to virtual memory upon execution before calling the function again. This code is presented simply to demonstrate the concept; in reality, there are many other ways to easily steal code without even the need to craft such complex codes. Ultimately, this story concludes that once it's determined what a particular code does or when it's executed, stealing the code is as easy as eating cold rice soup.

Figure 6 SomeFantasticFunction Examined with IDA

Figure 6 SomeFantasticFunction Examined with IDA

List 2 Stolen Function Code

typedef ULONG (*SFFT)(ULONG Q);
ULONG SomeFantasticFunction(ULONG Q)
{
    UCHAR code[] =  "\x55\x8b\xec\x81\xec\xcc\x00\x00\x00"
                    "\x53\x56\x57\x8d\x8d\x34\xff\xff\xff"
                    "\xb9\x33\x00\x00\x00\xb8\xcc\xcc\xcc\xcc"
                    "\xf3\xab\x8b\x45\x08\xc1\xe8\x0d\x6b\xc0\x22"
                    "\x05\x25\x06\x00\x00\x89\x45\xf8\x8b\x45\xf8"
                    "\x5f\x5e\x5b\x8b\xe5\x5d\xc3";

    PVOID cptr = VirtualAlloc(NULL
                                , sizeof(code)
                                , MEM_COMMIT | MEM_RESERVE
                                , PAGE_EXECUTE_READWRITE);
    if(cptr)
    {
        memcpy(cptr, code, sizeof(code));
        SFFT sfft = (SFFT) cptr;
        ULONG R = sfft(Q);
        VirtualFree(cptr, 0, MEM_FREE);
        return R;
    }

    return 0;
}

Integrity Checks

The counteroffensive of the security industry after the theft of E has led to the implementation of integrity checks. As the name implies, this involves examining the client content to determine its authenticity and whether any tampering has occurred. If previous key authentication was like asking a question about a memory only known between loved ones, then this would be akin to asking about the unique physical characteristics of a loved one.

Figure 7 illustrates the communication structure for such a case. The server sends a random value, Q, and the client inputs this value along with the client program itself into the hash function H. The function H then calculates the hash value for the part marked as Q in the client and stores it in R. The server uses the obtained hash value to determine whether the client has been tampered with or not.

Figure 7 Client Integrity Check Communication Structure

Figure 7 Client Integrity Check Communication Structure

The most distinctive feature of this method is that the client program itself is used as input for the calculation H. Therefore, even minor modifications made to the client can be detected. However, was this able to prevent non-client bots? Unfortunately, the answer is no. As with the previous key authentication, if a non-client bot steals the client program itself, the method becomes futile. Figure 8 shows the communication structure of a non-client bot that has stolen the game client program C and the hash function H.

Figure 8 Non-client Bot Communication Structure Using Stolen Client and Hash Function

Figure 8 Non-client Bot Communication Structure Using Stolen Client and Hash Function

The theft of the function has been previously explained. So what does it mean to steal the game client? The answer to this question can be found in Listing 3 and Listing 4. Listing 3 shows the response function of a normal game client, while Listing 4 shows the response function of a non-client bot. If we look at the difference between the two, we can see that while the normal client inputs the currently running program to the H function, the non-client bot inputs the stolen client program. In other words, in this scenario, if someone steals the Reply function entirely as in previous examples of function theft, it wouldn't work; however, if they make some modifications as shown in Listing 4, communication with the server is still possible.

Listing 3 Original Response Function Code

ULONG Reply(pvoid received_data)
{
    R = H(GetModuleHandle(NULL), data);
    send(R);
}

Listing 4 Non-client Bot Response Function Code

GameClient = LoadGameToMemory();

ULONG Reply(pvoid received_data)
{
    R = H(GameClient, data);
    send(R);
}

The key to bypassing this method for non-client bots lies in the LoadGameToMemory function showcased in Listing 4. This function has the role of loading the game client. Since the structure of Windows executables, known as PE files, has been well analyzed, the task of loading a separate executable file into memory is quite simple. Of course, self-modifying code or techniques like polymorphism or metamorphism can be applied to the executable file so that merely loading it would result in a different operational state than that of the actual running client. Yet, this too can be simply resolved by using a VM environment. A VM environment refers to the process of emulating game client code using virtual memory space and a virtual CPU, as opposed to running it on an actual CPU.

Timing Check

This method is often ineffective in practice, but it is a common solution found in papers. As shown in <Figure 9>, the core concept is to send the time it takes to perform a given operation. In the figure, t represents the time it took to perform operation E. Of course, the value of t can also be secretly packaged within R.

Figure 9 Timing Check

Figure 9 Timing Check

The biggest problem with this method is that general-purpose operating systems (OS) are usually not Real-Time Operating Systems (RTOS), meaning that time measurement may not be accurate. All threads or processes can be preempted by other threads or processes, which means that even in a normal environment, actual operation time can be delayed. Another issue is the diversity of the client's operating environment. Some users might run the same game on a Celeron CPU while others might use an i7 quad-core computer. The execution time between these two will obviously differ greatly.

Moreover, as previously discussed in the integrity check section, operation E can be not only stolen but also tampered with. Therefore, in such cases, if t is expected to be constant in normal scenarios, manipulations can be made to send the same amount of time regardless of the actual execution time, making it easy to bypass.

Despite its shortcomings for practical use, this method is introduced because, unlike previous methods, it attempts to diagnose non-client bots through indirect values rather than direct ones. In other words, while the traditional methods seek a yes or no answer to a closed question, this approach is more like an open question that requires a subjective answer. That is, there might be ways to determine a legitimate client through such open questions beyond the variable of time. If so, this technique could serve as an effective countermeasure against non-client bots.

CAPTCHA

Until now, we have considered the problem in terms of distinguishing non-clients from normal clients. However, CAPTCHA offers a solution from a completely different perspective—separating machines from humans. CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) is a method often used on the internet for preventing automatic sign-ups, essentially testing whether a user is a person or a machine. A typical example, as shown in <Figure 10>, involves displaying a strangely formed string of characters and asking users to re-enter them exactly as shown. In fact, it's often not easy to interpret even for people.

The challenging CAPTCHA that even humans struggle to understand

The challenging CAPTCHA that even humans struggle to understand

Applying this mechanism to the non-client bot issue is straightforward. The image code like the one above acts as the question Q, and the string entered by the user becomes the response R. Naturally, the function accepting the input serves as the computation E. In this instance, since the CAPTCHA method is designed so that machines cannot automatically determine the answer, theoretically, human intervention becomes necessary. This means that it becomes impossible to automate the input, which consequently renders non-client bots ineffectual.

Of course, there are still ways for determined non-client bots to automate this process. They would have to collect all the possible image codes and their corresponding results stored on the server and then create a mapping function for those values. This method is illustrated in <Figure 11>. By figuring out the responses R for all possible questions Q, and then constructing a mapping function M that automatically provides the answer, bots could theoretically respond without human input. Yet, similar to changing protocols, while it's a difficult task for non-client bot creators, it's easy for the server admins to switch the request content. Essentially, server-side data replacement can promptly neutralize all non-client bots, while the creators would need to gather all request and answer data anew. Moreover, the server-side can automate these replacement operations, making the creation and maintenance of such mapping functions M almost futile for the bot creators.

Non-client bot using mapping functions

Non-client bot using mapping functions

The issue with the CAPTCHA method is not its ability to counter non-client bots but the inconvenience it causes. If players are required to go through the bothersome process of entering these characters throughout their gameplay, it raises the question of who would be willing to continue playing the game. It's a fact that even for internet websites, places implementing CAPTCHA tend to be avoided for sign-up unless necessary. More importantly, it is tough to decipher for machines and humans alike, with even people frequently making mistakes. Thus, practically integrating such measures into games or specific client programs presents practical limitations.

Could a CAPTCHA method that doesn't annoy users be the answer? One example is when Game Masters (GMs) in a game initiate conversations with players to check for abnormal behavior. If players don't respond appropriately, they're assumed to be using automation tools and are expelled from the game. This seems plausible, but today's non-client bots come prepared with pre-set answers to GM's inquiries, or they might be programmed to move elsewhere automatically if a GM gets close. So, non-intrusive CAPTCHA styles are unlikely to be effective.

Nevertheless, this method is significant because if a new form of CAPTCHA emerges that enforces compliance without bothering people—in fact, intrigues and almost certainly won't be answered incorrectly by humans—then we have a very powerful means to easily incapacitate non-client bots.

Data Mining

Finally, the last method to be introduced is data mining. This approach solves the non-client bot issue by viewing it as a problem distinguishing between legitimate and illegitimate players. To put it simply, it involves the server monitoring each player's activities and then conducting data mining to determine whether the player falls within the normal or abnormal range of behavior.

Theoretically, this seems like a clean solution, but in reality, this method has several disadvantages. For one, it requires recording every action of the player or making real-time determinations. This inevitably demands the allocation of additional significant resources. Creating an automatic function for these determinations is no easy task either. This is especially true in MMORPGs, where non-client bots are a particular issue, as there are many legitimate players who play in a way that resembles non-client bots.

However, even if all these issues are overcome, there's one final weakness that renders this method potentially ineffective. If non-client bots play in a way that falls within the normal range, it is impossible to identify them with this method. While some may argue this isn't a problem, from the standpoint of fairness, where a machine plays the game automatically instead of a human, it is still fundamentally inequitable.

To explain conclusively, let's assume there's a function within this method that distinguishes between normal and abnormal behaviors, and let's call it P. The more P narrows the definition of normality, the more likely it is that legitimate players will also be blocked. Conversely, the wider the scope of normality defined by P, the more likely it is that illegitimate players will not be blocked. Crucially, if non-client bot developers can figure out the range of P's criteria, they can play just to the maximum limit of this range with their nefarious methods, rendering this solution useless. Thus, no matter what range P's judgment encompasses, non-client bots will always be able to pass through, making this technique a difficult option to rely on as an effective alternative.

There Is No End

This marks the extent of the video game industry, security industry, and academic thoughts concerning the response to non-client bots thus far. However, as explained in the individual sections, there's no method without disadvantages, and even more so, there is no way to completely detect non-client bots. Perhaps that is what makes this problem even more intriguing.

Let's revisit the question posed at the very beginning of the introduction: Can you really determine the identity of the person on the other side of the wall simply through questions? The answer is impossible. This is because the very mechanism of questions and answers is predetermined. The moment you ask a question, its answer is already out there. This means that if your opponent knows all the questions in advance, they could easily cloud your judgment.

But there's no need to be disheartened by this theoretical conclusion. Non-client bot creators in the real world are not the omniscient beings we have hypothesized. They are human and thus cannot grasp all codes at once. The issue hinges on how effectively we can hassle them with minimal resources.

As this discussion draws to a close, people always wonder why security companies can't block these threats perfectly. They add to their curiosity by hoping that if they think hard enough, they might find a magical 'silver bullet' that can solve everything. If this fight involved monkeys, dogs, or cats, perhaps such a silver bullet might have existed. But this battle is against opponents with equal or superior intelligence to ours. To think that we could come up with a perfect blocking method means that they could also think of a perfect way to circumvent it. In the end, the struggle is like the Möbius strip - an endlessly repeating structure.

@codewiz
Looking back, there were good days and bad days. I record all of my little everyday experiences and learnings here. Everything written here is from my personal perspective and opinion, and it has absolutely nothing to do with the organization I am a part of.
(C) 2001 YoungJin Shin, 0일째 운영 중