Not long ago, Infoq recommended a few new books on software architecture, which attracted a wide range of interest from domestic readers. One of these is the Open source application architecture (the Architecture of Open source applications), which describes the software design by authors from well-known open source projects. By making an overview of these successful system architectures, software engineers can thoroughly understand best practices and pitfalls. The Infoq Chinese station responds to readers ' needs by collating the book's fascinating content on well-known open source software architectures for domestic development communities to learn from. This issue introduces the software architecture of the famous browser automation tool selenium Webdriver, the first part of which mainly shares the evolution history and architectural viewpoints of selenium webdriver.
Selenium is a browser automation tool that is typically used to write end-to-end testing of Web applications. The browser automation tool performs exactly the behavior you expect: Automating a control on your browser so that you can automatically repeat tasks. It sounds like an easy problem to solve, but as we're about to see, there's a lot of work behind Selenium's success.
The technical expert introducing the Selenium Webdriver software architecture is Simon Stewart from Google, who is selenium's core contributor and founder of Selenium Webdriver.
Simon Stewart first talked about the components of selenium:
Before you introduce the selenium architecture, it's a good idea to look at how the various related components of the project are combined. At a higher level, selenium consists of three tools. The first tool, the Selenium IDE, is a Firefox extension that supports user recording and back-visit testing. The recording/return mode has limitations and is not suitable for many users, so the second tool,--selenium Webdriver, provides APIs for various locales to support more control and to write applications that conform to standard software development practices. The last tool--selenium grid helps engineers use the Selenium API to control browser instances distributed across a range of machines, enabling more tests to run concurrently. Within the project, they are referred to as "IDE", "Webdriver", and "Grid", respectively.
Traced to its origins, selenium and Webdriver were initially two independent projects, and Simon Stewart explained the history of development:
Jason Huggins launched the Selenium project in 2004, when he developed an internal time and expenses system in ThoughtWorks, which used a lot of JavaScript. Although Internet Explorer was the mainstream browser at the time, ThoughtWorks also used some other browsers (especially the Mozilla family) to submit bug reports when employees were unable to run the T&e system properly in their browsers. The open-source testing tools were either focused on a single browser (IE) or an analog browser (such as HttpUnit). The cost of purchasing a business tool license will deplete the limited budget for this small internal project, so they are not a viable test option.
In situations where automation is difficult, it is common to rely on manual testing. This approach is not appropriate when the development team is small or the build is very frequent. At the same time, it is a waste of manpower to manually execute scripts that could otherwise be automated. The more boring the tedious and repetitive tasks, the more slowly people work and make more mistakes than machines. Manual testing is also not an option.
Fortunately, JavaScript is supported on all browsers that are tested. Jason and his team have reason to use JavaScript to write a test tool to verify the behavior of the app. They are inspired by fit (Framework for Integrated Test), which replaces the original JavaScript with table-based syntax, which allows people with limited programming experience to write tests in HTML files using keyword-driven methods. The tool, originally called "Selenium", later called "Selenium Core", was released in 2004 based on the Apache 2 licensing.
The Selenium table format is similar to Fit's actionfixture. Each row of the table is divided into three columns. The first column gives the name of the command to execute, the second column usually contains the element tag, and the third column contains an optional value. For example, the following format indicates how to enter the string "Selenium Webdriver" in an element with the name "Q":
Type name=q Selenium webdriver
Because selenium used to be written in pure JavaScript, its initial design required developers to deploy the application to be tested and selenium Core, test scripts on the same server to avoid offending browser security rules and JavaScript sandbox policies. In practical development, this requirement is not always feasible. To make things worse, although the developer's IDE can help them quickly process code and browse a large codebase, there is no tool for HTML. It was quickly realized that maintaining a medium-sized test set was a clumsy and painful process.
To solve this and other problems, we have written an HTTP proxy so that all HTTP requests will be intercepted by selenium. Using proxies bypasses many of the limitations of the "same-origin" rule (the browser does not support JavaScript calling anything other than the server on which the current page resides), thereby easing the primary weakness. This design makes it possible to write selenium bindings in multiple languages: they simply send an HTTP request to a specific URL. The connection method is strictly modeled based on the Selenium core table syntax, called "Selenese". Because language bindings are controlled remotely by the browser, the tool is called "Selenium remote Control" or "Selenium RC".
While selenium is in the development phase, another browser automation framework Webdriver is ThoughtWorks the company's brewing. Webdriver's initial code was released in early 2007. The Webdriver project was originally designed to isolate end-to-end testing from the underlying testing tools. Typically, this isolation is done through the adapter (Adapter) mode. Webdriver is derived from the practice of this approach in many projects, initially the Htmlunit package, which soon began to support Internet Explorer and Firefox after the tool was released.
When Webdriver was originally released, there were significant differences with selenium RC, although they all belonged to browser Automation API tools. The most obvious difference for the user is that selenium RC provides a dictionary-based API, all of which are open in one class, and the Webdriver API is more object-oriented. In addition, Webdriver only supports Java, while Selenium RC offers a wide range of language support. The technical differences are also obvious: Selenium Core (the foundation of RC) is basically a JavaScript application that runs within the browser's security sandbox. Webdriver attempts to natively bind to the browser, bypassing the browser's security model, at the expense of a significant increase in the development input of the framework itself.
In August 2009, two projects were announced for merger, and Selenium Webdriver was the result of the merger. Currently, the language bindings supported by Webdriver include Java, C #, Python, and Ruby. It supports Chrome, Firefox, Opera and Android, and the iphone browser. In addition, there are other related items that are not maintained in the same source code base, but work closely with the main project (Selenium Webdriver), such as Perl binding support, BlackBerry browser support, and headless webkit-- A condition that is used for continuous integration testing that does not display properly. The initial selenium RC mechanism is still maintained to help Webdriver support the browser if it is not supported.
Before introducing the Selenium Webdriver software architecture, Simon Stewart talked about the important topics of architecture and project development. Summarized as follows:
Keep costs low.
Impersonate the user.
Prove dirver run well ...
...... But you don't need to know all the details.
Reduced bus factor (buses factor).
Prefer JavaScript implementations.
All method calls are RPC calls.
We are an open source project.
Specifically:
Keep costs Low
Supporting the X browser on the Y platform is inherently an expensive proposition, whether from a development or maintenance perspective. If we can find a way to maintain the high quality of the product without violating too many other principles, then it is worth adopting. The most obvious manifestation of this idea is that we use JavaScript as much as possible, as you'll see later.
Impersonate a user
Webdriver is designed to accurately simulate the way users interact with Web applications. A common way to simulate user input is to use JavaScript to merge and trigger a series of events (the app handles the same event if the real user performs the same interaction). The synthesized events method has many difficulties in facing different browsers and sometimes different versions of the same browser, triggering events and related values slightly different. To keep things from getting complicated, most browsers are security reasons that do not allow users to interact with form elements, such as file input elements, in this way.
Webdriver always use the method of triggering events at the operating system level whenever possible. Because these "native events" are not generated by the browser, this approach avoids the security limitations caused by synthetic events, and because they are specific to the operating system, once the browser on one platform is running well, reusing the code on another browser is relatively easy. The difficulty is that this approach must satisfy two o'clock: Webdriver is tightly bound to the browser, and the development team sends native events without having to focus on the browser window (because the selenium test runs longer, it is best to support other tasks on the machine at the same time). Currently, native events are available for Linux, Windows platforms, and not Mac OS X.
Regardless of how webdriver simulates user input, we are trying to mimic user behavior as much as possible. RC just the opposite, it provides an API hierarchy that is much lower than the user operation.
Prove that the driver is running well
Trying to make things perfect (motherhood and apple pie) may be too idealistic, but I believe it makes no sense to write code that doesn't work. Proves driver (refers to the specific implementation of the Webdriver API. For example, Firefox and Internet Explorer each have their own driver implementations) a good way to run the selenium project is to have a broad set of automated test cases. These are usually "integration tests" that need to compile code and interact with the Web server using a browser, but we write "unit tests" as much as possible, unlike integration tests, which run without having to be fully recompiled. Currently, there are about 500 integration tests and 250 unit tests covering all browsers. We are adding tests when patching defects and writing new code, and our focus is shifting to writing more unit tests.
Not all tests are run on every browser. Some of these tests are for specific features that some browsers do not support, or features that are handled differently on different browsers. For example, some tests are used for new HTML5 features that are not supported by all browsers. Nonetheless, every mainstream browser is fully tested. As you can imagine, running more than 500 tests per browser on different platforms is a huge challenge that we've been working on.
You don't need to know all the details.
Few developers are proficient in a variety of languages and technologies. As a result, our architecture needs to help developers put their talents where they do best, without having to deal with snippets of code that are not appropriate for them.
Reducing bus factors
There is an (informal) concept in the field of software development called the "bus Factor". It refers to the number of key developers who, if they are unlucky enough to leave the project after being hit by a bus, cannot proceed. Complex technologies such as browser automation can prove the importance of bus factors in particular, so many of our architectural decisions want to maximize the number of key developers.
Prefer JavaScript implementations
Webdiver uses pure JavaScript to drive the browser in the absence of any other way of controlling the browser. This means that all of the APIs we add should tend to favor JavaScript implementations. To give a concrete example, HTML5 introduces the Localstorage, which is the API for storing structured data on the client side. It is usually implemented in a browser using SQLite. A more natural approach is to use JDBC-like technology to provide a database connection to the underlying data store. In the end, we decided to use the API implemented by the underlying JavaScript because the database access API is generally not compatible with JavaScript implementations.
All method calls are RPC calls
Webdriver controls the browsers that run in other processes. One easy to overlook fact is that this means that all API calls are RPC calls, so the performance of the framework is on network latency. In normal operation, this is not necessarily obvious-most operating systems optimize the route to native (localhost)-but as the network latency between the browser and the test code increases, the original efficient invocation worsens for the API designer and user.
This situation brings some pressure to the design of the API. A coarser, larger-scale API may help reduce latency by merging multiple calls, but this requires a balance that keeps the API's readability and ease of use at all times. For example, it is sometimes necessary to detect whether an element is visible to the end user. Not only do we need to consider various CSS properties (which may need to be inferred by looking at the parent element), we should also check the dimensions of the elements. At a minimum, the API should perform all of these tests separately. Webdriver combines these functions into a single method isdisplayed.
This is an open source project
While this is strictly not an architectural point of view, it is important to stress that selenium is an open source project. All of the points mentioned above are linked by the idea that we want to help the new developer to participate in the project as easily as possible. Measures to reduce participation thresholds include making the required knowledge as simple as possible, using fewer languages, and relying on automated test validation.
Initially the project was divided into a series of modules, each representing a specific browser, and the other modules were generic code and support code. Each bound code tree is saved under these modules. This approach is useful for languages like Java and C #, but is painful for developers of Ruby and Python. This situation leads directly to a limited number of participants, with only a small percentage of people involved in Python and ruby binding work. To address this issue, project source code was re-organized in October and November in 2010, and Ruby and Python code were stored in separate top-level folders for each language. This approach meets the expectations of open source developers and immediately attracts the Community's wide participation.
The second section of this article introduces the architecture design and implementation of selenium webdriver in detail, and interested readers can read the online version of the book.
The open source application architecture? Selenium Webdriver (Upper)