Cross-platform thread Game Development

Source: Internet
Author: User
Tags zip archive file

Address: http://www.intel.com/cd/ids/developer/apac/zho/325610.htm

 

Computer Game Technology is undergoing a major conceptual shift: turning to a multi-threaded engine running on a multi-core processor. The multi-core processor powers the next generation of PCs and the game console. game developers need to target more platforms. Unfortunately, although thread execution and cross-platform support are crucial, many developers find it difficult to use these features for their respectiveCode. This article tries to study these two features through a simple demonstration application, so as to successfully complete this conversion. With an in-depth understanding of these technologies, game developers can enhance their understanding of these technologies and implement them in the project they are responsible.

You can find the demo application in this article. This demo applicationProgramThis includes a Microsoft Visual Studio * 2005 solution file generated and running on Windows and a file generated on Linux. During the running process, a window is opened and an OpenGL scenario is drawn (figure 1). the demo application and the code that constitutes it will be applied to all examples in this article.

Figure 1: Demo Application in startup

Click here to view the enlarged image
 

Multi-threaded software design is a long-term in-depth research topic in the industry, and its basic principles are easy to understand. Generally, a single-threaded software executes all its code in serial mode. For games, all game tasks are executed by a central loop (processing input, updating the game world, rendering, etc.), and only one frame can be rendered on each screen. This serial execution mode is sufficient for Single-core processors, but some processing resources in multi-core processors are not fully used. These processing resources can be reused to convert game tasks into independent "Threads" that can be executed on any logic core of a multi-core processor. This is the so-called parallel or thread execution model. Parallel Execution is the key to achieving a high-performance gaming experience on a multi-core processor.

The demo application has three threads: one thread runs the basic event loop, one thread updates the location of the game object, and the other thread renders the game to the window. The content displayed at the bottom of the window shows the frequency of tasks executed by each thread, measured in the number of calls per second (figure 2 ). This unit is equivalent to the number of frames per second used to render the task, but applies a more general name to all threads that execute the task repeatedly. The content at the bottom of the window can also show that tasks are executed in serial or parallel order. Press the tab key to switch to this setting to view its impact on the CPS of each task.

For rendering tasks and update tasks, you can press different keys (for the rendering task is Z and X keys, for the update task is ". "(period) and"/"(slash) keys) to increase or reduce the workload in interactive mode. By adjusting these workloads, you can simulate games with graphics or computing speed constraints.

Some developers find it difficult to reduce the execution rate of all tasks. Do we need to process events faster than game updates? This issue is closely related to how to reasonably determine the number of threads required by the game. However, the answer is different from that of different games and different processing environments. High-performance games can make full use of available logic cores. Therefore, when running a single-core processor, there will be no slow speed. Therefore, you must check the number of logical cores before determining the number of threads required by the game. However, when running in the modern Thread Scheduling System, the actual effect of this optimization function is hard to go beyond the mode where the processing operations are dispersed into independent tasks. This demonstration application demonstrates this by allowing users to execute all tasks in a thread in a serial manner. On a single-core processor, the number of threads cannot be larger than the number of logical cores due to insufficient overhead.

Figure 2: demonstration Application of increasing workload

Implementation details the demo contains four C ++ classes and some sort of glue code that aggregates these classes. These classes include:

Threadmanager:This class is used to manage the thread pool, which is a group of threads created at the beginning of the program and distributes tasks throughout the entire life cycle of the program. One-time thread creation aims to save the overhead required for thread creation and repeated reconstruction during program execution. This thread pool assigns each thread a game task. After the thread is started, the same function is called repeatedly. By using this convenient solution, you can easily start or stop a thread or calculate the number of calls per second. The future work chapter discusses other thread pool management solutions.

The threadmanager class also provides convenient methods for thread-related tasks. This class assigns each thread to the local storage area (the variable can be associated with the thread itself), so that different code can be run according to different execution threads. This function is used by the demo application to ensure that the rendering function can continue to play a role when other tasks on the serial master thread or when rendering changes (switching back and forth between full screens. The threadmanager class also defines a method that allows a thread to sleep for a period of time or concession to other threads (if these threads are waiting for execution.

This demo application uses the threadmanager subclass called the threadmanager series (threadmanagerserial ). This subclass also has other methods used to demonstrate how an application moves tasks between a dedicated thread and the main thread.

Criticalsection:This is a help class that creates and manages key code segments in threadmanager and demonstration applications, which are used to prevent multiple threads from simultaneously reading or modifying shared data.

Openglwindow:This class uses SDL (simple direct media layer) Library 1 to provide cross-platform, fixed-resolution rendering context. This class eliminates some of the obvious disadvantages of OpenGL rendering. Rendering can be performed in a window or in full screen mode. When you recreate a window (for example, when you go to full screen mode), OpenGL cannot render the context. To control this, this class provides a function for the thread to determine whether the context is rendered effectively. This is necessary because OpenGL requires that each rendering thread be associated with a single rendering context and will not automatically prompt the thread if the context is invalid.

World:This class is dedicated to this demo application. It can manage the static triangle background and a set of moving points updated and rendered by the demo thread task in the foreground. The background triangle provides workload for rendering tasks. Foreground points are used to create a model for the "N-body" (N-body) problem-display the law of gravitation. Suppose that each vertex is a planet or an minor in space, attracting each other with other celestial bodies, it will combine upon collision. The space center also has an invisible black hole that absorbs all the substances in collision with it, but eventually "overflows" and releases new celestial bodies. The N-body problem is equivalent to the workload problem of the update task. Demonstration Experiment this demonstration can simulate the performance of games with common runtime characteristics. Some games spend most of their time creating complex scenarios. In addition, it takes time to compute complex game conversions. By adjusting the workload of the demo task, you can simulate these running conditions and evaluate the advantages of thread execution in this situation.

The key to demonstrating the application output is whether it runs on a multi-core processor. The thread application only runs on a single-core processor computer. However, in most cases, if there is no multi-core processor, programs running thread tasks in parallel will not be more effective than programs running the same task in serial mode. A common exception to this general rule is the workload caused by a large number of network or disk I/O blocking. Even on a single-core processor, as long as these tasks are threaded, other computations can be executed while retrieving data.

The following solutions list the expected results for Single-core and multi-core implementations. Note that changing the workload of any task in a single core affects the CPS of all tasks. In a multi-core architecture, changing the workload of a task rarely or even does not affect the CPS of other tasks.

When the demo is started, the background displays 10000, the foreground displays 50 points, and runs in window mode. All of the following schemes assume this state as the starting state. By Z and X, you can adjust the triangle by 10000 at a time, and by. (period) and/(slash) You can adjust the object by 50 at a time. Press the tab key to switch between the serial and parallel task allocation.

Solution 1:Computing binding execution. Press the/(slash) Key to add an object to the N-body question and adjust the update task. Add enough objects so that the updated CPS is significantly less than the rendered CPS, but not less than 5 (if possible ). This is difficult to implement on a single-core computer, because increasing the load of updating tasks reduces the CPS of the other two tasks. However, in multi-core computers, this state is easy to implement. In this state, objects seem to move in waves, which plays a negative role in performing rendering and updating tasks for the same data at the same time. From the thread perspective, interaction is not safe, so the rendering task displays partial updates of each frame. The "Future Work" section details thread-safe rendering methods.

Note that the CPS of the update thread is switched to serial execution by pressing the tab key. In a single core, all tasks will reduce its CPs to the same lower value (maybe a little higher ). In multiple cores, all tasks fall to "or even lower" numbers. Why? When a single core is used, all tasks are run on the same logic core. After switching to the serial execution mode, only the fast tasks are run at the same speed as slow tasks, therefore, this task only needs to compete with a small number of jobs. In multiple cores, slow tasks may have an independent logical core (For details, refer to "Future Work"). Therefore, there is no competition with other tasks of this core. However, when the task is serialized, the slow task may run on the same core as other tasks, so the speed becomes slower. We can conclude that:In multiple cores, running a game in a thread mode can speed up the execution of all tasks..

Solution 2:Graph binding is executed. Press the X key to add a triangle to the back scene to adjust the rendering task. Add enough triangles so that the rendering CPS is between 5 and 10, and the update CPS is relatively higher (on a single core, it may not be very high ). The problem is that, although rendering is very slow, update tasks that simulate N-body problems are frequently called, so high accuracy can be maintained. Even if you use lower rendering cps/FPs, you can track the path of each object.

Press the tab key to switch to the serial execution. On single-core and multi-core computers, rendering tasks use the same CPS, but update tasks only have low CPS. Currently, update tasks are not called frequently, which reduces the accuracy of simulating N-body problems. The result is that even if the application renders the same number of frames per second, the operations on the screen will be more chaotic and more difficult to execute. The collision between an object and a black hole will accelerate, but it will not be sucked in. Instead, it will continue to move at high speed through the "Tunnel. The Mobile Behavior of this tunnel is a manifestation of slow game updates. In different types of games, this issue may occur in the form of snapshots, showing the presence of enemies or players focused on reinforcing walls. Inference:Graphics-bound games can also benefit from more frequent world/Physical updates. Conclusion computer games are always a high-performance business. As the processor technology has undergone a major change, it is necessary to better implement game thread processing to make full use of all the features of the host platform. In addition, you can target multiple platforms to expand the game market. Cross-platform development is relatively simple and intuitive because it can correctly extract important parts of the Code. Thread processing and cross-platform development technologies provide significant business opportunities for the gaming (from high-end games to self-made games) market. The technology introduced here can be applied to the existing code library or used to start the development of the next game. Future work this demo spans the boundaries between being an experimental tool and a viable start point for developing a game using modern technology. The focus of future work will be on one of the highlights.

Add more platforms:MacOS * is obviously a good choice and is easy to add. You can also add other platforms (such as the console and handheld devices ). However, the wider the platform strategy, the more you need to consider issues such as interfaces and controls.

Create a synchronization primitive for a sorting task:In a computing binding project, there is no need to render duplicate frames. The rendering thread can access the conditional variables set by the update thread (Another commonly implemented thread API function ). This function is extracted from the threadmanager class so that it can always be used across platforms.

Provides a fixed update frequency for the gaming world:The N-body problem is a sensitive physical simulation, and its accuracy is affected by the length of game frequency. Fixed game task update frequency can eliminate accuracy deviation. After the game is updated, the task will wait for a while.

Two-way buffer game world updates to achieve thread-safe rendering while increasing FPS:If the project is bound to computing, the game status can be divided into static and dynamic. For two copies of the dynamic half, the game world thread can update one, while the rendering thread can render the other (and static world data ). This is how many commercial games speed up rendering on multi-core processors.

Use the platform-specific thread scheduling API:We can allocate threads to any available logic core, but it is not guaranteed that any two threads will be executed at the same time. Each platform has policies on how to allocate and how to distribute the total running time to each thread. Some platforms provide APIs to control the allocation and scheduling policies. We can simplify these APIs and apply them to the threadmanager class.

Add more thread tasks:AI loops, dynamic content generation, audios, and networks are common features in game projects. Other thread tasks can better utilize the multi-core processors with more than two logic cores that will be launched in the future. In addition, you can modify tasks to make full use of the data parallel function (using threads to break up large tasks into smaller parallel tasks ). The Data Parallel function is a way for games with a small number of tasks to fully utilize multi-core processors. Similarly, with the development of multi-core technology and the trend that the number of logical cores gradually controls the number of parallel tasks, data parallelism will become the key to the effective use of processing resources.

Abandon serialization and use threadmanager to replace the threadmanager series:The purpose is to allow the task to use more effective blocking calls when the conditions are met. For example, in windows, the runwindowloop function may call waitmessage instead of peekmessage. Blocking continues until an event needs to be processed. The rendering function may call glfinish instead of glflush. Blocking continues until all workers are drawn. Alternatively, you can modify threadmanager to enable the one-time delivery thread pool policy (producer-consumer Queue), perfectly match the number of threads with the number of logical cores, and theoretically achieve the absolute minimum thread overhead. Appendix: Build a demo application to unpack the tcpgd.zip archive file and create the tcpgd directory.

Windows: Start Microsoft Visual Studio * 2005 and open the tcpgd. sln solution file in the tcpgd directory. Select release configuration and build and run the solution. When building a solution, glf project 2 may generate some warnings, but this does not affect the normal construction and running of the application.

Linux: Enter the tcpgd directory. To build a demo application, you must build a separate glf and main project. Run the following command to build and run the program:

CD glf
Make
CD ../main
Make
./Main thanks:

    1. Http://www.libsdl.org * SDL-simple direct media layer library. SDL is used to create windows and OpenGL rendering contexts in a cross-platform manner.
    2. Http://www.forexseek.com/glf/* glf-OpenGL font rendering library. Glf is used to display text in the demo application.
    3. OpenGL programming guide (fifth edition), Author: Addison Wesley, published on August 1, 2005. This "Red Book" is an invaluable resource for OpenGL development at all levels.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.