Title: Re-posted: "Kingsoft word overlord" screen word Extraction Technology (Discussion Draft) (17 thousand words)

Source: Internet
Author: User
Tags drawtext textout

Title: Re-posted: "Kingsoft word overlord" screen word Extraction Technology (Discussion Draft) (17 thousand words)
Sender: 5,555
Time: 9:30:12
Details:

Word Extraction Technology for Kingsoft (Discussion Draft)

Topic screen word Extraction Technology series (1)
Author yidong

Many people are interested in this issue.
The reason is that this technology is amazing and commercially valuable.
At present, Kingsoft word overlord occupies an absolute advantage in the dictionary market, so there is no future for dictionary creation. That's what I think. Even though I have mastered this technology, I didn't develop dictionary software. I only used a software similar to the word overlord and wanted to share it with myself. However, my word library was "stolen" and there were not many words, so I thought it would be okay. The word library was too small, what can I do if I can only use words? In addition, the word Overlord has a shared version.
But since many people want to understand this technology, I will not keep it. I am going to explain all the details of this technology several times.
About once or twice a week. People who want to know can often come and see it!

I. Basic Knowledge
First of all, it requires some basic knowledge to compile such a program.
It uses VC ++, including 16/32 bits.
Proficient in Windows APIs, especially GDI and kernel.
If you understand the assembly language, you can use SoftICE to debug the program, because it is best to use SoftICE to debug the program.

Ii. Basic Principles
In the era of window 3. X, there are only a few character output functions provided by windows.
Textout
Exttextout
Drawtext
......
Drawtext is implemented using exttextout.

Therefore, all windows character output is implemented by calling textout and exttextout. Therefore, if you can modify the entry of these two functions and let the program call your own function before calling the system's character output, you can get all the output characters of windows.

In the Windows 95 era, the principle has not changed, but 95 is more complex than 3. X. At the beginning, some word Extraction software written in Windows 3.x can still be used. However, an Internet Explorer 4 was launched later. As a result, many dictionary software was eliminated because it does not support Internet Explorer 4, but it also created opportunities for some software, such as Kingsoft. In fact, ie4 is not complicated, but it outputs Unicode characters, which are output by textoutw and exttextoutw. If you know this, you only need to intercept it. However, the implementation method is more complex and will be explained in detail later. Now I have another ie5, and it turns out that it is not easy to use, microsoft is really # ^ @ # $ % $ * & ^ &#@#@..........
After my research, I found a solution, but there are still some problems, sometimes wrong, and I am continuing to study it. I hope you can discuss it together.

In addition, there is WindowsNT, And the principle is the same, but the implementation method is completely different from that in 95.

Iii. Key Technical Points
The following technical problems must be solved to achieve word extraction.
1. Intercept the API entry to obtain API parameters.
2. Securely intrude into windows and be compatible with various versions of Windows
3. Calculate the words and letters of the mouse.
4. If you are using a 32-bit program in Windows 95, it also involves the technology of Windows 32/16 hybrid programming.

Come here today! We 'd better prepare a copy of SoftICE for 95/98 and Kingsoft, so let's first analyze how others do it.

Contact me.
E-mail: yeedong@163.net

Guest 16:00:48
Can't I use the debugger of VC? Why SoftICE? I have never used SoftICE. what's special about it?

Gourd 19:15:03
I am also interested in this issue. I have previously studied the process of intercepting textout and exttextout in 16-bit versions;

However, we used SoftICE to track Kingsoft Mac and found that setwindowshookex installed a mouse hook when loading the program. pausing word acquisition/resuming the acquisition is only an internal variable, however, I found that Kingsoft did not modify textout and exttextout like the 16-bit version.

Fly (555021552) 08:56:57
Which of the following prawns would like to introduce SoftICE first?

Cockroaches 13:58:22
We can use tdump to analyze cjktl95.dll of Kingsoft, and we can see that it does not use hook, but uses functions not included in Win32 SDK in kernel32.dll, use these functions to replace the original several GDI functions. It seems that Win95 is not public. Some people know, why don't they say it? Not necessarily so advanced!

Gourd 23:28:07
You may not have studied its target code. How can you assert that no hook is used? In fact, the mouse hook is installed when Kingsoft is running.

Yidong 09:52:42
Not necessarily so advanced?
Do you know how some people know about these undisclosed Win95 items? Don't you think Microsoft secretly told them? In fact, most of them use SoftICE to follow Win95 like me.

In addition, the undisclosed functions that you have dumped from cjktl95.dll have nothing to do with the hook, which is used for other purposes.
Setwindowshook is in the user

Yidong 16:16:14
The undisclosed functions output from tdump in cjktl95.dll are related to intercalls between 32-bit and 16-bit.

Topic screen word Extraction Technology series (2)
Author yidong

Sorry to keep everyone waiting!
I read replies from some people and found that many people are still not quite clear about the principle of word extraction.
First, let me explain the hook problem. The hook is indeed used in the word overlord, and he uses two types of hooks, one of which is the Windows Standard hook. He uses setjavaswhook to install a callback function, which installs a mouse hook, it is used to respond to the mouse message in a timely manner and does not have much to do with word retrieval.
Another type of Hook is API hook, which is the core technology of word retrieval. He wrote a JMP statement at the beginning of a function such as textout to jump to his own code.
You can't see this jump statement with SoftICE because it only exists at the moment of word extraction, and it is not available at ordinary times.
You can set a read/write breakpoint at the beginning of textout.
BPM textout
Then, you will find the code that the bully uses to write hooks.

/**********************************
Therefore, I have stressed that to learn this technology, you must understand the Assembly Language and be proficient in using SoftICE.
**********************************/

The undisclosed functions dumped from cjktl95 are related to mixed windows 32/16 programming. I will mention them later.

Let me first talk about the process of getting words,

0: determines whether the mouse has stayed in a place for a period of time.
1. Get the current mouse position
2. Generate a rectangle centered on the mouse position
3. hook the API
4. Make this rectangle produce a re-draw message.
5. Output characters in the hook
6. Calculate the word on which the mouse is located and save the word.
7. If a word is obtained, the API hook is removed. After a period of time, the API hook is removed no matter whether the word is obtained or not.
8. Use the word dictionary to display the interpretation box.

Many steps are difficult to implement, so there are only a few people in China who can make a perfect word dictionary.

The values 0, 1, 2, 7, and 8 are relatively simple.

First, let's talk about how to hook a child:
The so-called hook is actually to write a jmp xxxx: XXXX statement in the windowsapi portal, jump to your own code.

The procedure is as follows:
1. Use getprocaddress to obtain the Windows API Portal
2. Save the first five bytes of the API entry, because JMP is 0xea and the address is 4 bytes.
3. Write jump statements
This step is the most complex
Windows code segments cannot be written, but Microsoft left a backdoor for itself.
An undisclosed function is alloccstodsalias,
Uint winapi alloccstodsalias (uint );
You can get the entry of this function and pass the selection character of the API code segment (if you do not know what a selection character is, learn Protection Mode Programming first) to it, it returns a writable Data Segment selector. This selector is used up to release. Combine a pointer with the offset of the new selector and API entry to write the Windows code segment.

This is the core of the word-taking technology. It is not only about word-taking, but also about full screen Chinese-based plug-in platforms. Now I know why I seldom know a few simple words? Because too many products use him, too many companies rely on him to make money.
These companies and products include: Star of the Chinese language, sitong lifang, Antarctic star, Kingsoft, Dongdong express of Shida mingtai, roboword, yundiantong, and real-time Chinese experts .... There are at least 20 small companies. Their specific implementations are different, but their general principles are the same.

I am writing with my hand, and I have no outlines or other things. If I have a chance, I will sort them out. Let's take a look! Xixi...

Gourd 14:52:30
The technology you mentioned is 16-bit. At least Kingsoft poweriii does not use the 16-bit alloccstodsalias function, nor does it modify the statement starting with textout to jmp xxxxxxxx. SoftICE is very proficient in itself, and it will never be wrong!

Gourd 15:38:40
Who impersonates my name and has a bad reputation ?! I hereby solemnly announce that the above posts are not posted by myself!
After tracking and analysis, Kingsoft bully III does modify the textout entry to the jmp xxxxxxxx statement that redirects to its own code. It only modifies the statements that do not call alloccstodsalias and are not modified when the words are obtained, it is often modified (it is estimated that it is in timer ).

Yidong 17:25:23
Did I say alloccstodsalias is used in the word overlord?
I mean alloccstodsalias can be used, but it is actually the same as that of dpmi int 31 in the word Overlord. You can see it with SoftICE.
Bpint 31 sets an interrupt breakpoint.

In fact, I have at least five methods to write windows code segments.
I will introduce how to use alloccstodsalias because this method is the simplest and other methods are much more troublesome.
It is for compatibility consideration that the word overlord uses int 31,
Alloccstodsalias is not a public function, but dpmi is a standard.

When your mouse stays at a certain point for more than ms, it will take words, so he has a timer that will often modify the API entry.

It seems that "fake" Hulu only thinks that he is very proficient in SoftICE.

Mao 19:29:28
There is a program in Microsoft's msdn that contains all the source code. The name seems to be "inject" or stealth or something, and I forgot. Provides a perfect function to hook any windows function. It is applicable to Win16 and also in Win95, but not nt.

It is helpful to look for this example.

In addition, Microsoft's advanced windows developed by Tsinghua also introduced specific methods.

It is recommended that yidong simply open the source code to make it easier for interested friends to use and develop the application of word extraction technology!

Yidong 09:40:56
I do not advocate "Promoting the Evolving Application of word Extraction Technology", which is why I have mastered this technology for more than a year. I do not want everyone to use dictionary software. On the contrary, I want no more people to do dictionary. Now there are too many people doing dictionary. When we see that some software is profitable, we will be overwhelmed. This vicious competition is extremely harmful to the development of China's software industry. I open his aim to improve everyone's programming skills. You will find that your programming level and understanding of windows will be greatly improved during the study of this technology, and I will benefit from it myself.

This code is available in msdn.
There is even a function called prochook. dll that provides functions such as setprocaddress, but there is no source code. The source code was hard to obtain on MSJ in 1994.
I will disclose the source code several times, with detailed instructions each time.

If you want the source code, prepare a copy of VC ++ 1.52 or Borland C ++. It is best to have SoftICE. Next time, I will provide a piece of code to teach you how to modify the Windows code.

Laoku (555036) 11:56:46
It would be a big mistake to think that the application of words on the screen is to make a dictionary. In fact, the stuff about the Windows API has been published on the undocument windows for 93 years.

In fact, Adobe Type Manager has implemented beautiful fonts in the Windows 3.0 era. (TTF does not need atm now)

The stuff on msdn has all source and sample, which I have compiled. It was on an msdn Level 2 disc in the summer of 1996. Now I don't know where to put it. If you are interested, find it for yourself.

It's better to use open and complete source. Many friends don't want to know the details as long as they use this technology, do they.

Yidong 13:28:19
This technology is either full screen dictionary or plug-in language platform. Since Wang Zhidong used it, it has never been used to compile other software. Maybe, but I don't know.

There were still examples of such technical materials on so many books and other materials. I was so ignorant that I didn't know them. Here I will tell you some well-known things. Back, I want to take a good look and see where rasir Dex's word overlord was copied.

I don't know if the details in a certain technology are important. If many people only want to use it but don't want to compile it themselves, how can they reply to more than 50 replies downstairs?

Laoku 13:54:18
Well, you have to stick to the lecture; otherwise, the 50-plus response team should not flat me ...:)

I have not touched the underlying technology for more than two years, and this aspect is very backward... I can't cope with so many eager knowledgeable friends, and I want to hold on to it!

Welcome to join me for more information!

P.s. Where is the current height of yidong Daxia? What projects are busy? Communication

Yidong 17:46:06
There's nothing to do. It's everywhere!

Gourd 21:50:29
I have a negative opinion on this. Do not be smart and think alloccstodsalias can be used. In fact, alloccstodsalias is only a 16-bit WINDOWS function, and 32-bit Win95 programs cannot use this function, you can try it in VC 5.0 or 6.0.
In addition, int 31 H does not mean that it can be used in use, in Windows 3. there is no problem with using it in X (I have published this article), but using it in Win95 will produce GP errors, mainly because 32-bit does not support direct calling of dpmi, I do not know whether Mr. yidong has studied this. I have published many theories here! I would like to ask: how do 32-bit programs call 16-bit functions or dynamic libraries?

Gourd 02:39:10
The post mentioned above does not mean to attack anyone. I just hope that you will take a serious attitude and do not understand the issue. At least one thing you should know: 32-bit programs cannot use int 31 h at all. It is not easy to call alloccstodsalias in the 16-bit dynamic library kernel.

Nn_zdm (555031742) 16:35:35
Using the hook function, the available function is not just a dictionary full screen Chinese and plug-in language platform. You can use the hook function to debug the program. As you said, SoftICE itself uses the hook function.

Nn_zdm (555031742) 16:42:05
In addition, the hook function can also be used in game modification tools, and I have developed such tools. This method is also used by the whole person expert. Of course there are two other methods.

Yidong 16:10:59
What you said makes sense.
But there are two types of hooks. One is Windows Standard hooks, which are mounted through setwindowshook.
The other is non-standard. It is implemented by writing jmp xxxxxxxx In the API portal.
SoftICE hooks are more advanced. They are all mounted to VxD.
From 32 for the code to call 16-bit DLL happens to me.

Defeat mitd nationalism !!!

Guaguo 17:07:00
Who knows where to get SoftICE? I never used it before!

Gourd 21:39:46
May I continue to read your lectures later.

SoftICE? There are many CDs, including for DOS, for windows95, and for Windows NT versions.

Sun Wei (555031339) 11:08:35
Can I upload Si for NT to 10.82.46.33?
(Using FTP) User: haotao
Pass: haotao123

Tommy 11:34:11
Http://www.swww.com.cn/htm/down/others/main.html can download

Yidong 14:01:19
I have been busy with anti-beauty recently. I have no time to write it again. I will provide the source code later.
According to the latest news, the US Navy was hacked.
Http://www.nctsw.navy.mil/

Defeat mitd nationalism !!!

Golden Lion 13:19:32
I am very interested in the discussion of prawns.

I have several questions:
1. Does alloccstodsalias use thunk for 32-bit calls?

Is there a function similar to 2.32-bit?

3. Jeffrey Richter's remote thread stack in "advanced windows" to remotely inject DLL functions.

4. I have the source code of prochook. dll of MSJ 1994-1. I don't know how to change it For WinNT.

5. In short, there is no source code for hook api under WINNT, please inform.

(I also have SoftICE 3.24 for Win95, 3.25 For WinNT .)

-- This topic is good.

Mole 09:41:20
Please send me the source code of prochool. dll of MSJ 1994-1. I am in urgent need of this information. Thank you !!

Email Address: yanshg@263.net

The following is an API hook software for Australians. It is based on VxD technology.
Molten Home Page:
Http://ourworld.compuserve.com/homepages/molten

Golden Lion 14:34:54
On Win95 and NT, you can use writeprocessmemory () to directly write code segment. (previously, createremotethread () was called for "advanced windows" because writeprocessmemory () can only write code segments and stack segments)

A Tao 19:45:05
It is no longer necessary to know how to publish the prochook code step by step. The code of this program can be easily obtained. You only need to check it on the MSJ site.

Yidong 21:44:39
Let's get a needle in the sea of MSJ.

In 95, try writeprocessmemory to write kernel user GDI.

Createremotethread is not called to write code segments. It is used for other purposes to allocate memory in other processes. Go back and have a good look at "advanced windows". You 'd better try a program and you will understand it.

In fact, it is unnecessary to call createremotethread in NT4.0 to ensure compatibility with nt3.51.

Nn_zdm (555031742) 13:57:48
The createremotethread () function is very useful in winnt4.0. It is one of the three methods to break through process boundaries in winnt. In WINNT, it can be used for remote debugging, and modify others' code.

Nn_zdm (555031742) 14:07:17
Write code with writeprocessmemory () in winnt
The code is acceptable. Win95 has never been tried, but it is acceptable on msdn. However, it may be useless, because the createremotethread () function is only useful in winnt.
For example:
Writeprocessmemory (..., "loadlibrary (...," mydll. dll ",..).
Createremotethread (...)

Nn_zdm (555031742) 14:14:19
The above is missing. It should be
Writeprocessmemory (..., "loadlibrary (...," mydll. dll ",..).");
Createremotethread (...);

Topic discussion about screen words (3)
Author yidong

Sorry for your long wait. The hard disk was broken when you were busy working some time ago. It was so unfortunate.

Here we come back to zhenge.

Let's take textout as an example.

The following code is used:

// Intercept textout

Typedef uint (winapi * alloccstodsalias) (uint );

Alloccstodsalias;

Byte newvalue [5]; // Save the new entry code
Byte oldvalue [5]; // original API entry code
Unsigned char * address = NULL; // writable API entry address
Uint dsselector = NULL; // specifies the writable selector pointing to the API entry.
Word offsetentry = NULL; // API offset

Bool bhookalready = false; // indicates whether to hook the child.

Bool inithook ()
{
Hmodule hkernel, hgdi;
Hkernel = getmodulehandle ("kernel ");
If (hkernel = NULL)
Return false;

Alloccstodsalias = (alloccstodsalias) getprocaddress (hkernel, "alloccstodsalias"); // This is an undisclosed API.
If (alloccstodsalias = NULL)
Return false;

Hgdi = getmodulehandle ("GDI ");
If (HMDI = NULL)
Return false;

Farproc entry = getprocaddress (hgdi, "textout ");
If (Entry = NULL)
Return false;

Offsetentry = (Word) (fp_off (entry); // gets the selection character of the API code segment
Dsselector = alloccstodsalias (fp_seg (entry); // assign an equivalent writable selector.
Address = (unsigned char *) mk_fp (dsselector, offsetentry); // merged address

Newvalue [0] = 0xea;
* (DWORD *) (newvalue + 1) = (DWORD) mytextout;

Oldvalue [0] = Address [0];
* (DWORD *) (oldvalue + 1) = * (DWORD *) (address + 1 ));
}

Bool clearhook ()
{
If (bhookalready)
Hookoff ();

Freeselector (dsselector );
}

Bool hookon ()
{
If (! Bhookalready ){
For (INT I = 0; I <5; I ++ ){
Address [I] = newvalue [I];
}
Bhookalready = true;
}
}

Bool hookoff ()
{
If (bhookalready ){
For (INT I = 0; I <5; I ++ ){
Address [I] = oldvalue [I];
}
Bhookalready = false;
}
}

// The hook function must have the same parameters and declarations as the API.
Bool winapi mytextout (HDC, int nxstart, int nystart, lpcstr lpszstring, uint cbstring)
{
Bool ret;
Hookoff ();
Ret = textout (HDC, nxstart, nystart, lpszstring, cbstring); // call the original textout
Hookon ();
Return ret;
}

The above code is the simplest example of hanging API hooks. I want to remind you that I wrote this code based on my memory and I lost my previous code, I have not compiled and tested
Because I don't have VC ++, the Code may be wrong.

We recommend that you use Borland C ++ for 16-bit compilation.
If VC ++ 1.52 is used, you need to change the option

In the option of VC ++ 1.52, there is a memory mode setting, select the large mode, and "Ds! = Ss ds load on function entry. ", remember, otherwise the system will crash.

Can you write me anything you don't understand?
Yeedong@163.net

Guest 22:20:47
This is the 16-address access mode.
You can see that mk_fp is not used by Win32,
And getprocaddress (hkernel, "alloccstodsalias ")
Is this API useful?
Sorry

Guest 22:31:47
Yidong, I 'd like to ask you a question,
Under Win32, each process has
Your own address space, Process
A obtains the handle of Window C of process B.
Is the handle value equal to the handle value of Window C obtained by process B itself?
I think they should not be equal, but how is the system converted? (For example, process
How does Window C send messages?
Know the handle in process a and
Handle refers to Window C)
Is duplicatehandle () used ()?
(Declaration, I really don't know

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.