[Hotspot recommendation] use C Programs in JavaScript

Source: Internet
Author: User
Tags emscripten

[Hotspot recommendation] use C Programs in JavaScript

JavaScript is a flexible scripting language that can easily process business logic. When communication needs to be transmitted, most of us select JSON or XML format.

However, when the data length is very demanding, the efficiency of the text protocol is very low, and binary format is required.

Today last year, we encountered this problem when we had a combination of front-end and back-end WAF.

Because front-end scripts need to collect a lot of data and ultimately write it in a cookie, the available length is very limited, with only dozens of bytes.

If JSON is used without thinking about it, simply mark the field{"enableXX": true}It takes up half of the length. However, in binary, marking true or false is only a bit, which can save a hundred times of space.

At the same time, the data must be verified, encrypted, and so on. Only the binary format can be used to conveniently call these algorithms.

Elegant implementation

However, JavaScript does not support binary.

The "not supported" here does not mean "not implemented", but "elegant implementation 」. The invention of language is used to solve problems elegantly. Even if there is no language, humans can use machine commands to write programs.

If you have to use JavaScript to perform binary operations, it will eventually look like this:

var flags = +enableXX1 << 16 | +enableXX2 << 15 | ...

Although it can be implemented, it is ugly. Various Hard coding and bit operations.

However, for languages that support binary data, it looks very elegant:

 
 
  1. union { 
  2.     struct { 
  3.         int enableXX1: 1; 
  4.         int enableXX2: 1; 
  5.         ... 
  6.     }; 
  7.     int16_t value; 
  8. } flags; 
  9.  
  10. flags.enableXX1 = enableXX1; 
  11. flags.enableXX2 = enableXX2; 

Developers only need to define a description. When using this function, you do not need to worry about the number of Field offsets and how to read and write them.

To achieve similar results, a JS version of struct was first encapsulated:

 
 
  1. // Initial Scheme: encapsulate a JS struct
  2. Var s = new Struct ([
  3. {Name: 'month', bit: 4, signed: false },
  4. ...
  5. ]);
  6. S. set ('month', 12 );
  7. S. get ('month ');

The details are hidden and it looks more elegant.

Elegant but not perfect

However, this is not the perfect one. Struct, which should have been provided by the language, is now implemented with additional code and is still running.

In addition, backend decoding is implemented in C, so two sets of code must be maintained. Once the data structure or algorithm changes, it is very troublesome to update JS and C at the same time.

So I wondered if I could share a set of C code for both front-end and back-end applications?

That is to say, C needs to be compiled into JS for running.

Understand emscripten

There are many tools that can compile C into JS, and emscripten is the most professional.

Emscripten is easy to use, similar to the traditional C compiler, but only generates JS Code.

 
 
  1. ./emcc hello.c -o hello.html 
  2. // hello.c 
  3. #include <stdio.h> 
  4. #include <time.h>  
  5.  
  6. int main() { 
  7.     time_t now; 
  8.     time(&now); 
  9.     printf("Hello World: %s", ctime(&now)); return 0; 

After compilation, you can run:

Interesting ~ You can try it. I will not discuss it here.

Practical Defects

However, what we care about is not fun, but practical.

In fact, even a javascript code compiled by Hello World contains tens of thousands of lines, up to hundreds of KB. Even if it is compressed and then GZIP, there are still dozens of KB.

At the same time, emscripten uses the asm. js specification, and the memory access is implemented through TypedArray.

This means that users under IE10 cannot run. This is also unacceptable.

Therefore, we must make the following improvements:

  • Reduce volume

  • Added compatibility

First, pin emscripten to see if we can achieve our goal by setting parameters.

However, after some attempts, it was not successful. You can only implement it by yourself.

Reduce volume

Why is the final script so big that it contains something? The following content is analyzed:

  • Auxiliary Functions

  • Interface Simulation

  • Initialization

  • Runtime functions

  • Program Logic

Auxiliary Functions

For example, string and binary conversion, callback packaging, and so on. These are basically unnecessary. We can write a special callback function for ourselves.

Interface Simulation

Provides interfaces such as files, terminals, networks, and rendering. I have seen client games transplanted with emscripten before. It seems that many interfaces have been simulated.

Initialization

Global memory, runtime, and initialization of various modules.

Runtime functions

Pure C can only perform simple calculations, and many functions depend on runtime functions.

However, the implementation behind some common functions is complicated. For example, for malloc and free, there are nearly 2000 lines of JS!

Program Logic

This is the JS Code that really corresponds to the C program. The logic may become invisible because it is optimized by LLVM during compilation.

This part of the code is small and is what we really want.

In fact, if the program does not use some special functions, it can still run the logic functions independently!

Considering that our C program is very simple, it is okay to extract it in a simple and crude way.

The JS logic corresponding to the C program is located in// EMSCRIPTEN_START_FUNCSAnd// EMSCRIPTEN_END_FUNCS. Filter out the runtime function, and the rest is 100% of the logic code.

Added compatibility

Then solve the compatibility problem of memory access.

First, let's see why TypedArray is used.

Emscripten applied for a large ArrayBuffer to simulate the memory, and then associated someHEAPVariable.

These different types of HEAP share the same memory, which enables efficient pointer operations.

However, browsers that do not support TypedArray obviously cannot run. Therefore, you must provide polyfill compatibility.

However, after analysis, this is almost impossible-Because TypedArray is the same as an array, It is accessed through indexes:

 
 
  1. var buf = new UInt8Array(100); 
  2. buf[0] = 123;     // set 
  3. alert(buf[0]);    // get 

However[]Operators cannot be rewritten in JS, so it is difficult to convert them into setter and getter. Besides, the earlier version of IE does not support TypedArray, so you do not need to consider the features of es6.

So I thought about the private interface of IE. For example, the onpropertychange event is used to simulate the setter. However, this is extremely inefficient, and getter is still not easy to implement.

After some consideration, I decided not to use hooks, but to solve the problem directly from the source-Modify the syntax!

We use regular expressions to find the value assignment operation in the source code:

 
 
  1. HEAP[index] = val; 

Replace:

 
 
  1. HEAP_SET(index, val); 

Similarly, the read operation is as follows:

 
 
  1. HEAP[index] 

Replace:

 
 
  1. HEAP_GET(index) 

In this way, the original index operation becomes a function call. We can take over the memory read and write without any compatibility issues!

Then the 8, 16, and 32-bit signed versions are implemented. It is very simple to simulate through JS Array.

The trouble is to simulateFloat32AndFloat64Two types. However, this C program does not use floating point, so it will not be implemented for the moment.

At this point, the compatibility problem is solved.

Success

After solving these defects, we can happily use the C logic in JS.

As a script, you only need to care about the data collected. In this way, JS code is very elegant:

Data storage, encryption, and encoding. These underlying data operations are implemented through C.

Used during compilation-OsParameter to optimize the volume. After the final JavaScript obfuscation and compression, less than 2 KB, very small and refined.

More perfectly, we only need to maintain a piece of code to compile both the front-end and back-end versions at the same time.

Therefore, it is much easier to develop this "front-end WAF.

All data structures and algorithms are implemented by C. The frontend is compiled into JS Code, and the backend is compiled into the lua module for nginx-lua.

The frontend and backend scripts only need to focus on the business functions, and do not involve data-level details at all.

Test version

In fact, there is a third version, the local version.

Because all the C code is together, you can easily write a test program.

In this way, you do not need to start WebServer or open a browser to test. You only need to simulate some data and directly run the program to test, which is very lightweight.

At the same time, it is easier to debug with IDE.

Summary

Each language has its own advantages and disadvantages. By combining the advantages of different languages, the program can become more elegant and perfect.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.