Session in Web Development

Source: Internet
Author: User
Tags rfc

WebDevelopment session

 

In web development, session is a very important concept. In the view of many developers of dynamic websites, session is a variable and its performance is like a black hole. He only needs to put things in this hole at the right time, take things out when necessary. This is the most intuitive feeling for the developer on the session, but how does the scene in the black hole or the session work internally? When I ask some colleagues or friends around me about further details, many people tend to be either vague or subjective.

As a result, I think that many developers, including myself, are entangled in frameworks or even secondary development platforms, and have little or no knowledge of the core and foundation, or are powerless or even indifferent to them, every time I recall this, I am ashamed of my lack of the spirit of tracing. I used to implement a simple httpserver. However, due to the knowledge reserve and time issues, I did not consider the session. However, I recently reviewed some documents and conducted related practices, in the spirit of sharing, I will present my personal understanding of sessions to readers as much as possible in this article. At the same time, I will do my best to discuss some relevant knowledge, it is expected that the reader will be able to learn something about the session and learn something about it.

What is a session?

 

Sessions are generally translated into sessions. The Oxford Dictionary interprets sessions as consecutive periods of time. The session has a similar but not identical meaning at different levels. For example, in the view of a Web application user, it is a session to open a browser to access an e-commerce website, log on, and complete the shopping until the browser is closed. For Web application developers, I need to create a data structure to store user login information. This structure is also called session. Therefore, pay attention to the context when talking about sessions. This article talks about a mechanism or scheme based on HTTP to Enhance web application capabilities. It does not mean a specific dynamic page technology, but this capability is maintained, it can also be called session persistence.

Why session?

 

When talking about session, it is generally in the context of Web applications. We know that web applications are based on the HTTP protocol, while HTTP is just a stateless protocol. That is to say, when a user jumps to page B from page a, an HTTP request is sent again. When the server returns a response, it cannot know what the user did before requesting page B.

For the reason why HTTP is stateless, the RFC does not explain it, but we can infer some reasons when it comes to the history and application scenarios of http:

1. The initial purpose of HTTP design is to provide a method for publishing and receiving HTML pages. At that time, there was no dynamic page technology, and there was only a pure static html page, so there was no need to maintain the protocol;

2. when a user receives a response, it usually takes some time to read the page. Therefore, if the connection between the client and the server is maintained, the connection will be idle for most of the time, this is a waste of resources. Therefore, the original HTTP design is the default short connection, that is, the client and the server disconnect the TCP connection after a request and response is completed. Therefore, the server cannot predict the next action of the client, it does not even know whether the user will access the service again. Therefore, it is unnecessary to enable the HTTP protocol to maintain the user's access status;

3. transferring part of the complexity to the Technology Based on the HTTP protocol makes HTTP relatively simple at the protocol level, and this simplicity also gives HTTP more powerful scalability. In fact, session technology is essentially an extension of the HTTP protocol.

All in all, the stateless HTTP is determined by its historical mission. However, with the rapid development of network technology, people are no longer satisfied with the rigid and boring static html. They want web applications to become active, so the script and DOM technology appeared on the client, the form is added to HTML, while CGI and other dynamic technologies are available on the server.

This kind of dynamic web demand poses a challenge to the HTTP protocol: How can a stateless protocol be associated with two consecutive requests? That is to say, how can stateless protocols meet stateful requirements?

At this time, status is an inevitable trend, and the stateless nature of the Protocol is also a success. Therefore, we need some solutions to solve this conflict and maintain the HTTP connection status, so Cookie and session appear.

For this part of content, readers may have some questions. Here I will talk about two points:

1. stateless and persistent connection

Someone may ask, is it still stateless for http1.1, which is widely used by default, to use persistent connections?

The connection mode and the status are completely irrelevant. In a sense, the State is data, and the connection method only determines the data transmission mode, but not the data. Persistent connections are a reasonable performance optimization with the improvement of computer performance and network environment. Generally, Web servers limit the number of persistent connections, to avoid excessive resource consumption.

2. stateless and session

The session is stateful, while the HTTP protocol is stateless. Is there a conflict between the two?

The session and HTTP protocols belong to different layers. The latter belongs to the Top Application Layer of the ISO 7-layer model. The former does not belong to the latter, and the former is implemented by specific dynamic page technologies, but it is also based on the latter. In the following article, I will analyze the session mechanism in Servlet/JSP technology, which will give you a deeper understanding of this.

Cookie and session

 

As mentioned above, the methods for resolving the stateless HTTP protocol include cookie and session. Both of them can record the status. The former is to save the status data on the client, and the latter is saved on the server.

First, let's take a look at how cookies work. This requires a basic HTTP protocol.

Cookie is first described in rfc2109 (obsolete, replaced by rfc2965). Each client can maintain a maximum of three hundred cookies, A maximum of 20 cookies under each domain name (generally, there are more browsers, for example, Firefox is 50), and the size of each cookie is up to 4 K, however, different browsers have their own implementations. The most important thing for Cookie usage is to control the cookie size, not to put useless information or too much information.

No matter what server technology is used, as long as the returned HTTP response contains the following headers, the server requires a cookie:

Set-COOKIE: name = Name; expires = date; Path = path; domain = domain

All browsers that support cookies will respond to this problem, that is, creating and saving the cookie file (or memory cookie). Each time a user sends a request, the browser must determine whether all the current cookies are invalid (based on the expires attribute) and match the cookie information of the path attribute. If yes, add the following form to the request header and send it back to the server:

COOKIE: name = "ZJ"; Path = "/linkage"

The dynamic scripts on the server are analyzed and processed accordingly. You can choose to ignore them directly.

There is a question about standardization (or agreement) and implementation. Simply put, it is what the specifications have made. Then, implementation must be done according to the specifications so that they can be compatible with each other, however, the methods used by each implementation are not restricted. They can also go beyond the specifications, which is called extension. Regardless of the browser, as long as you want to provide the cookie function, it must be implemented in accordance with the corresponding RFC specifications. Therefore, the server only sends the set-Cookie header domain, which is also a manifestation of the stateless HTTP protocol.

Note that cookies can be disabled by browsers for security reasons.

Let's take a look at the principle of session:

I did not find the related RFC, because the session is not a protocol thing. The basic principle is that the server maintains a session information data for each session, and the client and server access session information data by a globally unique identifier. When a user accesses a web application, the server program determines when to create a session. The procedure for creating a session is as follows:

1. Generate a globally unique identifier (sessionid );

2. Open up data storage space. Generally, the corresponding data structure is created in the memory, but in this case, once the system loses power, all session data will be lost. If it is an e-commerce website, such accidents can cause serious consequences. However, it can also be written to a file or even stored in a database, which increases the I/O overhead, but the session can be persisted to some extent and is more conducive to session sharing;

3. Send the Globally Unique Identifier of the session to the client.

The key to the problem lies in how the server sends the unique identifier of the session. After connecting to the HTTP protocol, data can be stored in the request line, header domain, or body. Based on this, there are generally two common methods: Cookie and URL rewriting.

1. Cookie

The reader should have thought of it. Yes, the server only needs to set the set-Cookie header to send the session identifier to the client. This identifier will be included in each subsequent request of the client, because the cookie can set the expiration time, generally, the cookie containing session information sets the expiration time to 0, that is, the browser process validity time. As for how the browser handles this 0, Each browser has its own solution, but the difference is not too big (usually reflected in the creation of a browser window );

2. URL rewriting

URL rewriting, as its name implies, is URL rewriting. Imagine that before returning a user request page, add the session identifier (or add the session identifier to the Path Info part) to all the URLs in the page as the get parameter ), in this way, after receiving the response, no matter which link you click or submit the form, the user will carry the session identifier, thus realizing session persistence. Readers may think this approach is troublesome, but it is true. However, if the client disables cookies, URL rewriting will be the first choice.

Here, the reader should understand why session is regarded as an extension of HTTP. The following two figures are from the firebug plug-in of Firefox. We can see that when I first accessed index. jsp, the response header contains the set-Cookie header, which is not in the request header. When I refresh the page again, Figure 2 shows that no set-Cookie header exists in the response, but the cookie header exists in the request header. Note the cookie name: JSESSIONID, as the name suggests, is the session identifier. In addition, we can see that the value of JSESSIONID in the two images is the same, so I will not explain it any more. In addition, the reader may have seen a URL such as JSESSIONID = xxx appended to some websites. This is the session implemented by URL rewriting.

 

The first request for index. jsp

Request index. jsp again

 

Cookie and session have different implementation methods, so they also have their own advantages and disadvantages and their respective application scenarios:

1. application scenarios

A typical application scenario of cookie is the remember me service, that is, the user's account information is saved on the client in the form of cookies. When the user requests a matching URL again, the account information will be sent to the server and handed over to the corresponding program for automatic login and other functions. Of course, you can also save some client information, such as page layout and search history.

A typical application scenario of session is that a user logs on to a website and puts the user's logon information into the session. In each subsequent request, the user can query the corresponding logon information to ensure that the user is valid. Of course, there are still classic scenarios such as shopping cart;

2. Security

The cookie stores the information on the client. If encryption is not performed, some privacy information is undoubtedly exposed, and the security is poor. In general, sensitive information is encrypted and stored in the cookie, but it is easy to be stolen. Sessions only store information on the server. If they are stored in files or databases, they may also be stolen, but they may be much less likely than cookies.

Session hijacking is prominent in terms of session security. This is a security threat and will be described in more detail in the following sections. Generally, Session Security is higher than Cookie security;

3. Performance

The cookie is stored on the client, which consumes the client's I/O and memory, while the session is stored on the server, which consumes the server's resources. However, session puts more pressure on the server, while cookie disperses resource consumption. In this case, cookie is superior to session;

4. Timeliness

The cookie can be set to have a long period of memory on the client, while the session generally only has a short period of validity (timeout occurs when the user actively destroys the session or closes the browser );

5. Others

Cookie processing is not convenient during development. In addition, the number and size of cookies on the client are limited, while the size of the session is limited only by hardware. Therefore, the amount of data that can be stored is too large.

Sessions in Servlet/jsp

 

Through the above explanation, the reader should have a general understanding of the session, but how to implement the session based on a dynamic page technology? Next, I will analyze the implementation of the session in Servlet/JSP technology at the source code level based on the lifecycle of the session. The Code section uses tomcat6.0.20 as a reference.

Create

 

Most of the people I have asked who are engaged in Java Web development have the same answer about the session creation time: When I request a page, the session is created. This statement is actually vague, because it is essential to create a Session Request for sending. However, no matter what kind of request, will a session be created? Error. Let's look at an example.

As we all know, JSP technology is the reversal of Servlet technology. In the development stage, we see JSP pages, but it is actually in the runtime stage, JSP pages are translated into Servlet classes. For example, we have the following JSP pages:

......

Response. setcontenttype ("text/html; charset = ISO-8859-1 ");

Pagecontext = _ jspxfactory. getpagecontext (this, request, response,

Null, true, 8192, true );

_ Jspx_page_context = pagecontext;

Application = pagecontext. getservletcontext ();

Config = pagecontext. getservletconfig ();

Session = pagecontext. getsession ();

Out = pagecontext. getout ();

_ Jspx_out = out;

 

Out. Write ("\ r \ n ");

Out. Write ("<! Doctypehtml public \ "-// W3C // dtd html 4.01 transitional // en \"> \ r \ n ");

Out. Write ("<HTML> \ r \ n ");

......

We can see that there is an explicit session creation statement. How does it come from? Let's take a look at the corresponding JSP page, and add session = "true" in the JSP page command, which means to enable session on this page, as a dynamic technology, this parameter is set to true by default, which is reasonable. It is only emphasized here. Obviously, the two have an inevitable relationship. The author found relevant evidence in the source code of the JSP/servlet Translator (Org. Apache. Jasper. compiler:

......

If (pageinfo. issession ())

Out. Printil ("session = pagecontext. getsession ();");

Out. Printil ("out = pagecontext. getout ();");

Out. Printil ("_ jspx_out = out ;");

......

The code snippet above indicates that if session = "true" is defined on the page, the session acquisition statement is added to the generated Servlet Source code. This can only explain the conditions for session creation. Obviously, it cannot explain how the session is created. In the spirit of local tracing, we will continue to explore.

If you have experienced servlet development, remember to use the getsession method of httpservletrequest to obtain the current session object:

Publichttpsession getsession (Boolean create );

Publichttpsession getsession ();

The difference between the two is that getsession without parameters sets create to true by default. That is:

Publichttpsession getsession (){

Return (getsession (true ));

}

So what does this parameter mean? Through layer-by-layer tracking, I finally clarified the context. Because the relationship between functions is complex, if you want to learn more about the internal mechanism, it is recommended to read the source code of Tomcat-related parts independently. Here I will describe the general process:

1. When a user requests a JSP page, session = "true" is set on this page ";

2. The Servlet/JSP Container translates it into a Servlet and loads and executes the servlet;

3. when the servlet/JSP Container encapsulates the httpservletrequest object, it determines whether to bind the current session to httprequest or create a new session object based on whether JSESSIONID exists in the cookie or URL (it is found and recorded in the Request Parsing phase JSESSIONID, bind the session in the request object creation stage );

4. The program operates sessions on demand to access data;

5. if it is a newly created Session, the container will add the set-Cookie header in the response to the result, to remind the browser to maintain the session (or use URL rewriting to present the new link to the user ).

Through the above description, the reader should know when the session was created. Here, we will summarize from the servlet layer: When the servlet requested by the user calls the getsession method, the session will be obtained, whether a new session is created depends on whether the current request is bound to a session. When the client adds the JSESSIONID identifier to the request and the servlet container finds the corresponding session object based on the identifier, the session is bound to the request object of the request, when the client request does not contain a JSESSIONID or the session corresponding to this JSESSIONID expires, the session binding cannot be completed. In this case, a new session must be created. At the same time, the set-Cookie header is sent to notify the client to start holding a new session.

Persistence

 

After understanding how to create a session, you can understand how the session is maintained between the client and the server. After a session is created for the first time, the client will bring the session identifier to the server in subsequent requests. As long as the server program calls getsession when a session is required, the server can bind the corresponding session to the current request to maintain the status. Of course, this requires support from the client. If the cookie is disabled without URL rewriting, the session cannot be maintained.

If a servlet does not call getsession (or simply requests a static page) between several requests, will the session be interrupted? This will not happen because the client will only send valid cookie values to the server. It will not be concerned about what the server will do with cookies, and of course it will not be concerned. After a session is created, the client will always send the session identifier to the server, regardless of whether the requested page is dynamic, static, or even a pair of images.

Destroy

 

The destruction mentioned here refers to session waste. We do not care about whether the data structure that stores session information is recycled or directly released. There are two scenarios for session destruction: timeout and manual destruction.

Due to the stateless nature of the HTTP protocol, the server cannot know when a session object will be used again. A session may no longer be accessed after it is enabled, in addition, session persistence consumes a certain amount of server overhead, so it is impossible to create a session without recycling useless sessions. A timeout mechanism is introduced here. Tomcat timeout is configured in Web. XML as follows:

<Session-config>

<Session-Timeout> 30 </session-Timeout>

</Session-config>

The above configuration means that the session will be destroyed if it is not used again within 30 minutes. How does Tomcat calculate this 30-minute period? After getsession, you must call its access method to modify lastaccessedtime. When the session is destroyed, the difference between the current time and the lastaccessedtime is determined.

Manual destruction refers to directly calling its invalidate method, which is actually calling the expire method to manually set it to timeout.

When you manually request the destruction of a session, the client cannot know that the session on the server has been destroyed, and it will still send the previous session identifier to the server. At this time, if you request a servlet that calls getsession again, the server cannot find the corresponding session object based on the previous session identifier. This is to re-create a new session, assign a new identifier and notify the server to update the session identifier to start holding the new session.

Session Data Structure

 

In Servlet/JSP, what data structure does the container use to store session-related variables? Let's guess, it must be synchronized first, because in a multi-threaded environment, sessions are shared among threads, web servers are generally multi-threaded (pool technology is also used to improve performance). Second, this data structure must be easy to operate, preferably the traditional key-value pair access method.

First, let's look at a single session object. In addition to storing its own information, such as ID, Tomcat session also provides programmers with an interface to store other information (in the class Org. apache. catalina. session. standardsession ):

Publicvoid setattribute (string name, object value, Boolean y)

Here we can track the data that it uses:

Protectedmap attributes = new concurrenthashmap ();

This makes it clear that Tomcat used a concurrenthashmap object to store data, which is a class in Java's concurrent package. It just satisfies the two points we have guessed: synchronization and ease of operation.

So what data structure does Tomcat use to store all session objects? Concurrenthashmap (in the org. Apache. Catalina. session. managerbase class for session management ):

Protectedmap <string, session> sessions = new concurrenthashmap <string, session> ();

The specific reason is unnecessary. The specific implementation of other web servers should also take these two points into account.

Sessionhijack

 

Session hijack is a serious security threat and a widespread threat. In session technology, the client and server maintain sessions by transmitting session identifiers, however, this identifier can be easily sniffed and exploited by others. This is a man-in-the-middle attack.

This section uses an instance to demonstrate what session hijacking is. Through this instance, the reader can better understand the nature of the session.

First, I wrote the following page:

<% @ Page Language = "Java" pageencoding = "ISO-8859-1" session = "true" %>

<! Doctypehtml public "-// W3C // dtd html 4.01 transitional // en">

<HTML>

<Head>

<Title> index. jsp </title>

</Head>

<Body>

This is index. jsp page.

<Br>

<%

Object o = session. getattribute ("counter ");

If (O = NULL ){

Session. setattribute ("counter", 1 );

} Else {

Integer I = integer. parseint (O. tostring ());

Session. setattribute ("counter", I + 1 );

}

Out. println (session. getattribute ("counter "));

%>

<Ahref = "<% = response. encoderedirecturl (" index. jsp ") %>"> index </a>

</Body>

</Html>

The page function is to place a counter in the session. When you access this page for the first time, the counter value is initialized to 1, and each time you access this page, the counter is added to 1. The counter value is printed to the page. In addition, for relatively simple simulation, I disabled the client (using firefox3.0) Cookie and switched to the URL rewrite method, because it is much easier to copy the link directly than to forge a cookie.

Next, open Firefox to access this page. We can see that the counter value is 1:

Click the index link to refresh the counter. Be sure not to refresh the current page. Because we do not use the cookie method, we can only add the JSESSIONID behind the URL, in this case, the URL in the address bar cannot contain the JSESSIONID. 4. I refreshed the counter to 20.

The following is the most important part. Copy the address in the address bar of Firefox (I see http: // localhost: 8080/sessio.

N/index. jsp; JSESSIONID = 1380d9f60bce9c30c3a7cbf59454d0a5), and then open another browser, where the cookie is not disabled. Here, I open Apple's safari3 browser and paste the address into its address bar. Press enter, as shown in:

It's strange. The counter goes straight to 21. In this example, I did it on the same computer, but even if I used two computers, the results would be the same. At this time, if you click the index link in two browsers, you will find that they actually manipulate the same counter. In fact, you don't have to be surprised. safari here steals the key to maintain the session between Firefox and tomcat, that is, JSESSIONID, which is a type of session hijack. In Tomcat's view, Safari gave it a JSESSIONID. Due to the stateless nature of HTTP, it could not know that this JSESSIONID was "hijacked" from Firefox, it will still search for the corresponding session and execute related calculations. At this time, Firefox cannot know that its session persistence has been "hijacked ".

Conclusion

 

At this point, the reader should have a deeper understanding of the session. However, due to the limited level and field of view of the author, there are also deficiencies in the text, the entire article describes the session mechanism in Servlet/JSP, but the mechanisms of other development platforms are also changing. As long as you think carefully, you will find that there are always some causal relationships. In the context of increasing software scale, we are more exposed to frameworks and components, and the eyes of programmers are blinded, in the continuous generation of these frameworks and components, and the continuous updating of versions, there are actually a lot of things that are relatively unchanged, that is, norms, protocols, models, algorithms, and so on, what truly improves a person is the underlying supporting technologies. If you think more, you can turn similar explorations into evidence. The technology is just like solving the problem.

Reprinted please keep the Source: shoru.cnblogs.com Jin brother's private money

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.