Insider of Microsoft's net Search engine (Turn)

Source: Internet
Author: User
Tags bool constructor integer object model versions web services xpath xsl stylesheet
Larry Jordan, developer Michael Ruggiero and Michael Stanton of Microsoft Search Development and the. NET Framework Project manager Hari Sekhar secretly built a. NET technology based Microsof T Web site search engine new version. To date, only a small number of outside developers who have participated in a special session of the "Professional Development Staff Seminar" held in Orlando in July this year have been informed of some details. Now we can finally put the facts to the public.

If you frequently visit the Insider news site, you will know that the Microsoft Web group launched a new version of its Search engine before the Professional Developer Symposium, held in July 2000. You know this version introduces advanced synonym matching, extended Best Bets logic that returns the most relevant chapeau search results, and smart caching for the most common searches.

However, the insider information about the version is much more than what is on the surface.

We are excited, of course, because the rich features of the search version and the improved search results clearly lead to a better search experience for our customers (see Search 2.5 technology Insider). However, most people did not realize that we were also behind the scenes of porting the traditional Search version 2.5 based on ASP (Active Server Page activity Server Pages) to the new Microsoft. NET Framework.

This is the most cutting-edge development for a search group. Because we've gone deep into the future of Internet services. And we hope so. Let's talk about the reasons for this.

Why do you want to migrate to. NET?
Obviously, we are entering the next phase of the Internet. We are spanning web pages in the usual sense and developing powerful Web services. At this stage, the systematic use of resources and information is of paramount importance. In this way, we can take advantage of these resources and information as services, rather than keep them in a cluttered data warehouse.

Extensible Markup Language (XML) is a means of implementing multiple data set transmission between super distributed systems. It also enables developers to aggregate and assemble data from various sources in a more valuable and innovative way-so that users can benefit directly from them.

As far as Search is concerned, we have designed the core functionality of finding information on Microsoft.com for a variety of custom and localized search versions. Our group faces challenges in how to make data access both flexible and available. Before. NET appeared, we did not have customers design programs for our functionality without using DCOM (Distributed Component Object Model) on a secure port, or customers had to install multiple versions of our software on their servers for access to code and COM.

Our team studied the upcoming. NET technology and realized that all of the remote issues could be addressed by porting code to the. NET Framework. Also, there is an unexpected gain, and we can also implement HTTP and SOAP ubiquitous connections. For the vast majority of people, it doesn't matter whether someone at Microsoft or somewhere in the world uses our WEB services to develop in-house applications for completely different purposes. We support both situations, and we also have free access to technical benefits.

The latest search version 2.5 is now running on Site Server 3.0 and still uses COM to get results from the search directory. All other aspects of the application are based on XML. XML as a means of publishing data (for example, vocabulary and Best Bets) to a Web server allows us to easily expand our web space.

We also performed a scenario that caches the most commonly used queries and results for client requests by keeping these queries and results on the WEB server, and thus enhancing scalability and further improving performance. Since our core architecture is xml-based, porting to a model that will take advantage of the. NET Framework Web services is indeed very simple, and these. NET Framework Web services are built on the basis of new asp+ technologies (asp+ technology is called the Active Server Method (ASMX) page )。

Transformation
The Search architecture consists of three components:
Word Parsing and vocabulary
Best Bets
Search Results
The. NET Port of Search has the same architecture as the ASP version (see Figure 1). Let's take a closer look at each of these components.

Figure 1. After the user submits the query, (1) submits the query to the parser (Parser) for term segmentation and lexical parsing, (2) passes the display term (displayed Term) of the found item to the best Bets, (3) the preferred term for the project to be found (preferred Ter m) and the remaining items are passed to Search Results, (4) compiling the generated XML document using the XSL stylesheet, (5) submitting HTML to the user's Web browser. Click to enlarge.

Word Parsing and Vocabulary _ This is a Windows scripting component that contains a C + + COM object that exposes various entry-splitting programs for all languages supported in Search. This design is necessary because the interface of the entry-splitting program is not easy to script, and usually requires a C + + scriptable encapsulation (although there is a way to do this: this will be explained in detail later). During porting to the. NET Framework, we used the Type list exporter (TLBIMP) on C + + objects. EXE) and call it through the Interop technology in. NET so that you can invoke an existing COM object.

Vocabulary Object runs Xpath (the language of the query XML document) query to map the search terms to the preferred term. It also goes beyond interfering with the entry and produces a formatted data structure that is appropriate for the best bets and Search Results components to consume. An important result is that this fairly complex little script is ported to C #, and we can continue to invoke traditional objects from it. The following is a small code example in vocabulary Object:


We return a array of vocabularyobjects after parsing the user ' s search
Text. This is ability to create simple typed structures in C # vastly improves
Our code modularity and self-documentation. This is the definition of
Vocabularyobject:
public struct Vocabularyobject {
public string preferredterm; Structure members
public string displayterm;
public bool FOUND;
public string origphrase;
public bool MULTITERM;
public bool Multiword;

Constructor
Public Vocabularyobject (String Preferredterm,bool found,string origphrase,
BOOL Multiterm,bool multiword,string displayterm) {
Preferredterm = Preferredterm;
FOUND = FOUND;
Origphrase = Origphrase;
MULTITERM = MULTITERM;
Multiword = Multiword;
Displayterm=displayterm;
}
}

Example usage. Because the parameters to the objects constructor are
Typed, we ' ll get a compiler the error message if we passed an integer
Where a string is expected, for example. This is a very nice feature
Over traditional scripting environments!
Vocabularyobject vo ("Microsoft DirectX", True, "DX", False,false, "DirectX");
One of the advantages of the. NET environment is that you can create multiple data structures for the entire code. The last line above is a statement instance that illustrates how to use the code structure of these vocabulary Object.

Best Bets _ This is a small scripting component that provides XPath queries for localized XML documents and can produce URL links with chapeau. The XML document is loaded into the scope of application for each Search application instance and can be tightly coupled with the methods of the vocabulary object. The transplanted foot is a 100% conversion to the. NET framework and can take advantage of the System.IO and XML DataNavigator classes (System.newxml namespaces).

This is the simplest porting component. It is almost a row of rows from Jscript to C # conversion. We only made some changes to the code in some places to take advantage of the new XML DataNavigator class-the. NET common Language runtime portion of the XML document that was queried for and updated.

Search Results-This complex component is connected to Site Server 3.0 to obtain the actual page descriptions and links that match the customer's search query. It also includes a perfect caching algorithm.

Building a parallel solution
The biggest challenge we had at the time was that we were developing search 2.5 while also porting the entire search application to the asp+ technology of the. NET Framework. Since the application was to be launched before the PDC date and ported to. NET, the turnaround time was tight, so we decided to launch both versions at the same time and put them on the market simultaneously. Obviously, this is a daunting task because we have to manage the new version, understand all the features of the new. NET Framework and new language metaphors, build servers with a variety of software platform services, and so on.

There's an interesting story about how we got into this project. To ensure that both versions are available at the same time (Search 2.5 and. NET framework), we determine in the project planning phase which components are unchanged first, which components change the most during development, and which components are appropriate for which technology and language.

We also identify goals early and try to break down the application and migrate in the way that the customer might adopt. Because our microsoft.com are always serious about the problems customers face when making technical decisions and studying ROI, we break down the application porting process into many parts, each as close as possible to what the customer might have done. We want to make sure that every effort is done, including the simplest porting (i.e., small script porting to Jscript classes) until maximum time and technical benefit input – full use of the C # programming language to fully migrate to the. NET Framework (100% manageable code space).

Here are some of the steps we have taken to address this challenge:
First, we convert the main ASP page to asp+. Initially, we invoked the small script through the. NET Reflection technology so that we could invoke the typical COM object at run time by querying the type library.
Important: We start with an ASP programming model where data, business logic, and representations are all mixed together, and then adopt a fully object-oriented approach to asp+, and finally data separation, programming, and UI.
Second, handle the simplest small script and migrate it. Bestbets is the simplest component and does not depend on COM components. We decided to migrate this component as a DLL using System.IO, XML Data Navigator, and the C # programming language. We want to completely migrate this component to a controlled environment and make it take full advantage of the XML Data Navigator.
Important knowledge: We understand the Newxml namespace. At the same time, we removed the. NET Reflection while porting the components. This allows us to invoke these components locally.
We then deal with vocabulary small scripts in the same way. This component is in the middle of this application in terms of complexity and line of code. It consists of a small script that contains business and text parsing rules for Search and calls to C + + components that we created to wrap COM's bootstrap calls to hyphenation programs. This component has the greatest advantage in moving to controlled space. This complex component is all ported to the. NET Framework and the C # programming language. This requires some skill, because it contains more complex function logic and needs to take advantage of a custom COM object. But that's not too hard. The next step is to discard the C + + wrapper and call these interfaces directly.
Important: We have changed functions and logic to benefit from C # 's key advantages like type safety. When using Jscript, developers must keep in mind the type of each variable (integer, String). C # will do that for you. All variables are determined at the time of Declaration, and C # checks your work to make sure that there are no boundaries. This helps a lot when working with complex code. Note: In the next version of JScript, programmers will be able to choose exactly what type of variable to determine.
Porting Final components: SearchResults. Initially, we called this component through the. NET Reflection, and it was in good condition. Because this code is too big and complex, and since we made some fundamental changes to that version before we launched Search 2.5, the porting of the code continues to the present. It was not found in the. NET Beta, but the work has made significant progress. This version of the update will be released later in October.


In short, this architecture is a masterpiece. We have some real C #. NET components, and we have all the ASMX pages. Also, we demonstrated that you can invoke custom COM objects through Interop, and invoke small scripts through. NET Reflection. Traditional objects (such as searchresults) can consume data structures created by C # objects (such as vocabulary), which is very good.

Before you review the. NET Search Beta, it is worth mentioning that there is no user interface in the architecture. What you see is the default for a Web service. We could have added a UI, but we kept it that way so that you could see it as it really is.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.