Kingsoft's word acquisition technology (Discussion Draft)

Source: Internet
Author: User
Tags drawtext textout
Topic screen word Extraction Technology series (1)
Many authors are interested in this issue.
The reason is that this technology is amazing and commercially valuable.
At present, Kingsoft word overlord occupies an absolute advantage in the dictionary market, so there is no future for dictionary creation. That's what I think. Even though I have mastered this technology, I didn't develop dictionary software. I only used a software similar to the word overlord and wanted to share it with myself. However, my word library was "stolen" and there were not many words, so I thought it would be okay. The word library was too small, what can I do if I can only use words? In addition, the word Overlord has a shared version.
But since many people want to understand this technology, I will not keep it. I am going to explain all the details of this technology several times.
About once or twice a week. People who want to know can often come and see it! I. Basic Knowledge
First of all, it requires some basic knowledge to compile such a program.
It uses VC ++, including 16/32 bits.
Proficient in Windows APIs, especially GDI and kernel.
If you understand the assembly language, you can use SoftICE to debug the program, because it is best to use SoftICE to debug the program. Ii. Basic Principles
In the era of window 3. X, there are only a few character output functions provided by windows.
Textout
Exttextout
Drawtext
......
Drawtext is implemented using exttextout. Therefore, all windows character output is implemented by calling textout and exttextout. Therefore, if you can modify the entry of these two functions and let the program call your own function before calling the system's character output, you can get all the output characters of windows. In the Windows 95 era, the principle has not changed, but 95 is more complex than 3. X. At the beginning, some word Extraction software written in Windows 3.x can still be used. However, an Internet Explorer 4 was launched later. As a result, many dictionary software was eliminated because it does not support Internet Explorer 4, but it also created opportunities for some software, such as Kingsoft. In fact, ie4 is not complicated, but it outputs Unicode characters, which are output by textoutw and exttextoutw. If you know this, you only need to intercept it. However, the implementation method is more complex and will be explained in detail later. Now I have another ie5, and it turns out that it is not easy to use, microsoft is really # ^ @ # $ % $ * & ^ &#@#@..........
After my research, I found a solution, but there are still some problems, sometimes wrong, and I am continuing to study it. I hope you can discuss it together. In addition, there is WindowsNT, And the principle is the same, but the implementation method is completely different from that in 95. Iii. Key Technical Points
The following technical problems must be solved to achieve word extraction.
1. Intercept the API entry to obtain API parameters.
2. Securely intrude into windows and be compatible with various versions of Windows
3. Calculate the words and letters of the mouse.
4. If you are using a 32-bit program in Windows 95, it also involves the technology of Windows 32/16 hybrid programming. Come here today! We 'd better prepare a copy of SoftICE for 95/98 and Kingsoft, so let's first analyze how others do it. Contact me.
E-mail: yeedong@163.net theme screen buzzword Technology series lectures (2)
The author yidong is sorry to have kept everyone waiting!
I read replies from some people and found that many people are still not quite clear about the principle of word extraction.
First, let me explain the hook problem. The hook is indeed used in the word overlord, and he uses two types of hooks, one of which is the Windows Standard hook. He uses setjavaswhook to install a callback function, which installs a mouse hook, it is used to respond to the mouse message in a timely manner and does not have much to do with word retrieval.
Another type of Hook is API hook, which is the core technology of word retrieval. He wrote a JMP statement at the beginning of a function such as textout to jump to his own code.
You can't see this jump statement with SoftICE because it only exists at the moment of word extraction, and it is not available at ordinary times.
You can set a read/write breakpoint at the beginning of textout.
BPM textout
Then, you will find the code that the bully uses to write hooks. /**********************************
Therefore, I have stressed that to learn this technology, you must understand the Assembly Language and be proficient in using SoftICE.
* *********************************/Dump from cjktl95 the undisclosed functions are related to the mixed programming of Windows 32/16, I will mention them later. Let me first talk about the process of getting words. 0 determines whether the mouse has stayed in a place for a while.
1. Get the current mouse position
2. Generate a rectangle centered on the mouse position
3. hook the API
4. Make this rectangle produce a re-draw message.
5. Output characters in the hook
6. Calculate the word on which the mouse is located and save the word.
7. If a word is obtained, the API hook is removed. After a period of time, the API hook is removed no matter whether the word is obtained or not.
8. Use the word dictionary to display the interpretation box. Many steps are difficult to implement, so there are only a few people in China who can make a perfect word dictionary. The values 0, 1, 2, 7, and 8 are relatively simple. First, let's talk about how to hook a child:
The so-called hook is actually to write a jmp xxxx: XXXX statement in the windowsapi portal, jump to your own code. The procedure is as follows:
1. Use getprocaddress to obtain the Windows API Portal
2. Save the first five bytes of the API entry, because JMP is 0xea and the address is 4 bytes.
3. Write jump statements
This step is the most complex
Windows code segments cannot be written, but Microsoft left a backdoor for itself.
An undisclosed function is alloccstodsalias,
Uint winapi alloccstodsalias (uint );
You can get the entry of this function and pass the selection character of the API code segment (if you do not know what a selection character is, learn Protection Mode Programming first) to it, it returns a writable Data Segment selector. This selector is used up to release. Combine a pointer with the offset of the new selector and API entry to write the Windows code segment. This is the core of the word-taking technology. It is not only about word-taking, but also about full screen Chinese-based plug-in platforms. Now I know why I seldom know a few simple words? Because too many products use him, too many companies rely on him to make money.
These companies and products include: Star of the Chinese language, sitong lifang, Antarctic star, Kingsoft, Dongdong express of Shida mingtai, roboword, yundiantong, and real-time Chinese experts .... There are at least 20 small companies. Their specific implementations are different, but their general principles are the same. I am writing with my hand, and I have no outlines or other things. If I have a chance, I will sort them out. Let's take a look! Xixi... topic discussion about screen words (3)
I'm sorry that the hard drive was broken when I was busy working some time ago. Here we come back to zhenge. Let's take textout as an example. The following code: // intercept textout typedef uint (winapi * alloccstodsalias) (uint); alloccstodsalias; byte newvalue [5]; // Save the new entry code
Byte oldvalue [5]; // original API entry code
Unsigned char * address = NULL; // writable API entry address
Uint dsselector = NULL; // specifies the writable selector pointing to the API entry.
Word offsetentry = NULL; // API offset bool bhookalready = false; // whether to hook the sub sign bool inithook ()
{
Hmodule hkernel, hgdi;
Hkernel = getmodulehandle ("kernel ");
If (hkernel = NULL)
Return false; alloccstodsalias = (alloccstodsalias) getprocaddress (hkernel, "alloccstodsalias"); // This is an undisclosed API, so you need to obtain the address like this
If (alloccstodsalias = NULL)
Return false; hgdi = getmodulehandle ("GDI ");
If (HMDI = NULL)
Return false; farproc entry = getprocaddress (hgdi, "textout ");
If (Entry = NULL)
Return false; offsetentry = (Word) (fp_off (entry); // gets the API code snippet selector.
Dsselector = alloccstodsalias (fp_seg (entry); // assign an equivalent writable selector.
Address = (unsigned char *) mk_fp (dsselector, offsetentry); // merged address newvalue [0] = 0xea;
* (DWORD *) (newvalue + 1) = (DWORD) mytextout; oldvalue [0] = Address [0];
* (DWORD *) (oldvalue + 1) = * (DWORD *) (address + 1 ));
} Bool clearhook ()
{
If (bhookalready)
Hookoff (); freeselector (dsselector );
} Bool hookon ()
{
If (! Bhookalready ){
For (INT I = 0; I <5; I ++ ){
Address [I] = newvalue [I];
}
Bhookalready = true;
}
} Bool hookoff ()
{
If (bhookalready ){
For (INT I = 0; I <5; I ++ ){
Address [I] = oldvalue [I];
}
Bhookalready = false;
}
} // The hook function must have the same parameters and declarations as the API.
Bool winapi mytextout (HDC, int nxstart, int nystart, lpcstr lpszstring, uint cbstring)
{
Bool ret;
Hookoff ();
Ret = textout (HDC, nxstart, nystart, lpszstring, cbstring); // call the original textout
Hookon ();
Return ret;
} The above Code is the simplest example of hanging API hooks. I would like to remind everyone that I wrote this code based on my memory and I lost my previous code, I have not compiled and tested
Because I don't have VC ++, the Code may be wrong. We recommend that you use Borland C ++ for 16-bit compilation.
If VC ++ 1.52 is used, you need to change the option in the option of VC ++ 1.52. There is a memory mode setting, select the big mode, and "Ds! = Ss ds load on function entry. ", remember, otherwise the system will crash. Can you write me anything you don't understand?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.