Session principle Data structure

Source: Internet
Author: User
Tags rfc

"Original" Reproduced please retain the source: shoru.cnblogs.com Jin Brother's money Introduction

In web development, the session is a very important concept. In many dynamic website developers, the session is a variable, and it behaves like a black hole, he only need to put things in the right time in the hole, and so on when necessary to take out the things. This is the most intuitive way for developers to feel about the session, but what about the scene in the black hole or the inside of the session? When I ask some colleagues or friends around the relevant further details, many people are often either vague or judgmental, so-called know it but do not know why.

I think a lot of developers, including myself, often is entangled in the framework and even two times the development platform, and the core and basic knowledge of it is very little, or powerless even no concern, less the spirit of the source, every memory and this, all ashamed. Once achieved a simple httpserver, but because of the knowledge of the reserve and time of the problem, did not take into account the session of this piece, but recently in the work after looking at some information, and carried out the relevant practice, small income, the spirit of sharing, I will be in this article as comprehensively as possible to show the individual's understanding of the session to the reader, while doing my best to discuss some relevant knowledge, in order to readers in the session to understand the same time can also be enlightened, is so-called to give people to fish.

What's the session?

Session General translation sessions, the Oxford Dictionary interprets it as a continuous period of time for an activity. Viewing the session from different levels has a similar but not identical meaning. For example, in a Web app user's view, he opens a browser to access an ecommerce site, logs in, and finishes shopping until the browser is closed, which is a session. In the Web application developers, I need to create a data structure to store user login information, this structure is called session. So be aware of the context when talking about the session. What this article is talking about is an HTTP protocol-based mechanism to enhance web application capabilities or a scheme that does not refer to a particular dynamic page technology, which is either a state-of-the-way or a hold-session.

Why do I need a session

When it comes to the context of the Web application, we know that the Web application is based on the HTTP protocol, and the HTTP protocol is just a stateless protocol. That is, the user jumps from page A to page B to resend the HTTP request, and the server does not know what the user did before requesting the B page when returning the response.

There is no explanation for the stateless nature of HTTP, but the history of HTTP and the application scenario can be inferred for some reason:

1. The original purpose of HTTP design is to provide a way to publish and receive HTML pages. At that time there is no dynamic page technology, only a static HTML page, so there is no need for a protocol to maintain the State;

2. When the user receives the response, it often takes some time to read the page, so if the connection between the client and the server is maintained, then the connection will be idle for most of the time, which is an unwarranted waste of resources. So the original design of HTTP is the default short connection, that is, the client and the server to complete a request and response after the disconnection of the TCP connection, the servers will not be able to predict the next action of the client, it does not even know that the user will not be able to access again, so the HTTP protocol to maintain the user's access state is completely unnecessary ;

3. Passing part of the complexity onto the HTTP protocol-based technology makes HTTP relatively simple at the protocol level, and this simplicity also gives the HTTP a stronger ability to expand. In fact, session technology is essentially an extension of the HTTP protocol.

In summary, the stateless state of HTTP is determined by its historical mission. But with the development of network technology, people are no longer satisfied with the dull static HTML, they want the Web application to move, so the client appeared the script and Dom technology, HTML added the form, and the server has a CGI and so on dynamic technology.

It is this web-dynamic demand that presents a challenge to the HTTP protocol: how can a stateless protocol correlate two successive requests? So how can a stateless protocol meet the needs of a state?

At this point the state is the inevitable trend and the state of the agreement is a done deal, so we need some solutions to resolve this contradiction, to maintain the HTTP connection status, so there is a cookie and session.

For this part of the content, the reader may have some questions, the author first of all to talk about two points:

1. Non-stateful and long connections

Some people may ask, is now widely used by the HTTP1.1 default to use a long connection, it is still stateless?

The connection mode and the state of being are completely unrelated to each other. Because the state is in a sense the data, and the connection method only determines the way the data is transmitted, and cannot determine the data. Long connection is a reasonable performance optimization with the improvement of computer performance and network environment, in general, the Web server will limit the number of long connections to avoid excessive consumption of resources.

2. Stateless and session

The session is stateful, and the HTTP protocol is stateless, are they contradictory?

Session and HTTP protocol belong to different levels of things, the latter belongs to the ISO seven layer model of the highest application layer, the former is not the latter, the former is specific dynamic page technology to achieve, but at the same time it is based on the latter. In the following section, I will analyze the session mechanism in SERVLET/JSP technology, which will give you a deeper understanding of this.

Cookies and session

There are cookies and sessions that address the way the HTTP protocol itself is stateless. Both can record the state, the former is to save the state data on the client, the latter is saved on the server.

First look at how the cookie works, which requires a basic HTTP protocol base.

Cookies are first described in RFC2109 (obsolete, superseded by RFC2965), with a maximum of 300 cookies per client, up to 20 cookies per domain name (in fact, the average browser is now more than this, such as Firefox is 50), Each cookie has a maximum size of 4 K, but different browsers have their own implementations. For the use of cookies, the most important thing is to control the size of the cookie, do not put in useless information, and do not put too much information.

Regardless of which server-side technology is used, as long as the HTTP response sent back contains headers in the following form, it is considered a cookie to be set by the servers:

Set-cookie:name=name;expires=date;path=path;domain=domain

A cookie-enabled browser responds by creating a cookie file and saving (or possibly a memory cookie), which the user will later make each request The browser has to determine whether all current cookies are not invalidated (judging by the Expires attribute) and match the cookie information of the path attribute, and if so, it is added to the request header in the following form to the server:

Cookie:name= "ZJ"; Path= "/linkage"

The dynamic script on the server will analyze it and make the appropriate processing, but you can also choose to ignore it directly.

Here is involved in a specification (or agreement) and implementation of the problem, in simple terms is the specification of what to do, then the implementation must be done according to the norms, so as to be compatible with each other, but the way the implementation is not constrained, but also in the implementation of the Norms on the basis of beyond the specification, which is called expansion. Regardless of the browser, as long as you want to provide the functionality of the cookie, it must be implemented according to the appropriate RFC specification. So here the server just send Set-cookie header domain, which is also the HTTP protocol stateless embodiment.

It is important to note that for security reasons, cookies can be disabled by the browser.

Let's look at the principle of the session:

The author did not find the relevant RFC, because the session is not a protocol level of things. The basic principle is that the server maintains a session information data for each session, while the client and server rely on a globally unique identity to access session information data. When a user accesses a web app, the service-side program decides when to create the session, which can be summed up in three steps:

1. Generate a Globally unique identifier (SESSIONID);

2. Open up data storage space. It is common to create the appropriate data structure in memory, but in this case, once the system loses power, all session data will be lost, and if it is an e-commerce website, the accident will have serious consequences. However, it can also be written in a file or even stored in a database, although this will increase I/o overhead, but the session can achieve some degree of persistence, and more conducive to the sharing session;

3. Character the global unique label of the session to the client.

The key to the problem is how the server sends the session's unique identity. By contacting the HTTP protocol, the data can be placed in the request line, header domain, or body, and there are generally two common ways to do this: cookies and URL rewriting.

1. Cookies

The reader should have thought that, yes, the server can send the session identifier to the client as long as the Set-cookie header is set, and each subsequent request from the client will be given this identifier, since the cookie can set the expiration time, Therefore, the cookie that normally contains session information is set to expire at 0, which means the browser process is valid for the time. As for the browser how to deal with this 0, each browser has its own scheme, but the difference is not too large (generally in the new browser window);

2. URL Rewriting

The so-called URL rewrite, as the name implies is rewrite URL. Imagine, before returning the page of a user request, all the URLs in the page are followed by a get parameter with the session identifier (or in the path info section, etc.) so that the user receives the response, regardless of which link or submission form is clicked, will take the session identifier again, thus realizing the conversation's retention. Readers may find this to be cumbersome, indeed, but if the client disables cookies, URL rewriting will be preferred.

Here, the reader should understand why I said that the session is also counted as an extension of HTTP. The following two pictures are the author in the Firefox Firebug plug-in, you can see, when I first visited index.jsp, the response header contains the Set-cookie header, and the request headers do not. When I refresh the page again, figure two shows that there is no Set-cookie header in the response, but there is a cookie header in the request header. Note the name of the cookie: Jsessionid, as the name implies, is the identifier of the session, in addition, you can see that the value of the Jsessionid in the two picture is the same, the reason why the author will no longer explain more. In addition, the reader may have seen on some websites the last URL appended with a section like Jsessionid=xxx, which is the session implemented with URL rewriting.

(Figure I, first request index.jsp)

(Figure two, request index.jsp again)

Cookies and sessions have different advantages and disadvantages due to the means of implementation, as well as their respective application scenarios:

1. Application Scenarios

A typical application scenario for a cookie is the Remember Me service, where the user's account information is stored on the client in the form of a cookie, and when the user requests a matching URL again, the account information is sent to the server and the corresponding program completes the automatic login function. Of course, you can also save some client information, such as page layout, search history, and so on.

A typical scenario for a session is when a user logs in to a Web site, puts his or her login information into session, and then queries each subsequent request for the appropriate login information to ensure that the user is legitimate. Of course, there are shopping carts and other classic scenes;

2. Security

Cookies keep information on the client side, and without encryption, there is no doubt that some privacy information is compromised, and in general the sensitive information is encrypted and stored in a cookie, but it is easily stolen. The session will only store information on the server side, and if stored in a file or database, there is also the possibility of being stolen, but the likelihood is much smaller than the cookie.

Session security is more prominent in the context of the issue of the existence of a conversation hijacking, which is a security threat, which is described in more detail below. Generally speaking, the session security is higher than the cookie;

3. Performance

The cookie is stored on the client, consumes the client's I/O and memory, and the session is stored on the server, consuming resources on the server side. However, the stress of the session on the server is relatively concentrated, and the cookie is very good to disperse the resource consumption, in this case, the cookie is better than the session;

4. Timeliness

Cookies can be set for a longer period of time to exist in the client, and the session is generally only a relatively short period of validity (the user actively destroy the session or close the browser after the timeout);

5. Other

The processing of cookies is not convenient for the session in development. And there is a limit on the number and size of cookies in the client, and the size of the session is limited only by hardware, and the data that can be stored is undoubtedly much larger.

Session in the SERVLET/JSP

Through the above explanation, the reader should have a general understanding of the session, but specific to some kind of dynamic page technology, how to achieve the session? The following author will combine the session's life cycle (lifecycle), from the source code level to specifically analyze the SERVLET/JSP technology, the session is how to achieve. The Code section takes tomcat6.0.20 as a reference.

Create

In some of the Java Web developers I asked, most of the time the session was created: When I asked for a page, the session was created. This is a very vague statement, because it is necessary to create a send for a session request, but will the session be created regardless of the request? Wrong. Let's look at an example.

As we all know, JSP technology is the reverse of the servlet technology, in the development phase, we see the JSP page, but really to the runtime stage, the JSP page will be "translated" for the Servlet class to execute, for example, we have the following JSP page:

<% @ page language= "java"  pageencoding= "iso-8859-1"  session= "true" %>

<! doctype html public  "-//w3c//dtd HTML 4.01 transitional//en"

    

         <title>index.jsp</title>

    

    <body>,

         this is index.jsp page.

&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;<BR>

&NBSP;&NBSP;&NBSP;&NBSP;</BODY>

     After we first requested the page, in the corresponding work directory can find the corresponding Java class page, taking into account the reason for the length, here only the more important part, interested readers can try it yourself:

response.setcontenttype ("text/html;charset=iso-8859-1");

pagecontext =  _jspxfactory . Getpagecontext ( This , request, response,

&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP; null ,   true , 8192,  true );

_jspx_page_context = PageContext;

application = Pagecontext.getservletcontext ();

config = Pagecontext.getservletconfig ();

session = Pagecontext.getsession ();

out = Pagecontext.getout ();

_jspx_out = out;

&NBSP;

out.write ("\ r \ n");

out.write ("<! DOCTYPE HTML public \ "-//W3C//DTD HTML 4.01 transitional//en\" >\r\n ");

Out.write ("

...

You can see a statement that explicitly creates a session, how does it come about? We take a look at the corresponding JSP page, added session= "true" in the page directive of the JSP, meaning that the session is enabled in this screen, in fact, as a dynamic technology, this parameter is the default is true, which is very reasonable, in this show write to just do the emphasis. It is clear that there is a definite connection between the two. The author found the relevant evidence in the source code of Jsp/servlet's translator (Org.apache.jasper.compiler):

......

if (Pageinfo.issession ())

Out.printil ("session = Pagecontext.getsession ();");

Out.printil ("out = Pagecontext.getout ();");

Out.printil ("_jspx_out = out;");

......

The code snippet above means that if session= "true" is defined in the page, the generated servlet source is added to the session's fetch statement. This only can explain the conditions of the session creation, obviously does not explain how the session was created, in line with the spirit of traceability, we continue to explore.

With the experience of servlet development, we should remember that we are using the HttpServletRequest GetSession method to get the current session object:

Public HttpSession getsession (Boolean create);

Public HttpSession getsession ();

The difference between the two is only the getsession the create default is set to True. That

Public HttpSession getsession () {

return (GetSession (true));

}

So what does this parameter really mean? Through the layers of tracking, the author finally cleared up the context, because the relationship between functions more complex, if you want to understand the internal mechanism in more detail, it is recommended to read the source code of the tomcat related parts independently. Here I will describe the general process:

1. The user requests a JSP page, the page is set session= "true";

2. servlet/jsp the container to translate it into a servlet and load and execute the servlet;

3. servlet/ The JSP container determines whether to bind the current session to HttpRequest or to create a new Session object when encapsulating the HttpServletRequest object based on the presence of Jsessionid in the cookie or URL (discovery and recording during the request parsing phase Jsession ID, which binds the session at the request object creation stage);

4. The program operates on-demand session, access data;

5. In the case of a newly created session, when the result responds, the container joins the Set-cookie header to remind the browser to keep the session (or to present the new link to the user in a URL rewrite).

Through the above narrative the reader should understand when the session was created, and then summarize it from the servlet level: When the user request servlet calls the GetSession method, it gets the session, Whether to create a new session depends on whether the current request is bound to a session. When the client joins the Jsessionid identity in the request and the Servlet container finds the corresponding session object based on this identity, the session is bound to the request object of the requested A new session must be created when the client request does not have Jsessionid or if the session of this jsessionid has expired, the binding of the session cannot be completed. The Set-cookie header is also sent to notify the client to start keeping the new session.

Keep

Understanding the creation of the session is a good way to understand how the conversation is maintained between the client and the server. When the session is first created, the client takes the session identifier to the server in subsequent requests, and the server can bind the corresponding session to the current request as long as it calls getsession when it needs a session. Thus the state is maintained. Of course, this requires the support of the client, if the cookie is disabled and does not use URL rewriting, the session is not maintained.

If a servlet does not call getsession (or simply request a static page) between several requests, will the session be interrupted? This will not happen, because the client will only pass the legitimate cookie value to the server, and it will not care about what the server is doing with the cookie, and of course it does not care. After the session is established, the client will always pass the session identifier to the server, regardless of whether the requested page is dynamic, static, or even a picture.

Destroyed

The destruction referred to here refers to the abandonment of the session, as to whether the data structure that stores the session information is recycled or directly frees the memory we don't care. The session is destroyed in two cases: timeout and manual destruction.

Due to the stateless nature of the HTTP protocol, the server can not know when a session object will be used again, maybe the user opened a session and no subsequent access, and the session is to maintain the need to consume a certain amount of service costs, Therefore it is not possible to create a session blindly and not to recycle useless session. A timeout mechanism is introduced here. The timeout in Tomcat is configured in Web. Xml as follows:

<session-config>

<session-timeout>30</session-timeout>

</session-config>

The above configuration means that the session is destroyed without being reused for 30 minutes. How does tomcat calculate this for 30 minutes? Originally after GetSession, call its access method, modify the Lastaccessedtime, when destroying the session is to judge the current time and this lastaccessedtime difference.

Manual destruction refers to calling its Invalidate method directly, and this method actually calls the expire method to manually set it to time out.

When the user manually requested the session's destruction, the client is unable to know that the server session has been destroyed, it will still send the previous session identifier to the server. At this point, if a servlet called getsession is requested again, the server cannot find the corresponding session object based on the previous session identifier, which is to recreate the new session, assign a new identifier, and tells the server to update the session identifier to start keeping the new sessions.

Data structure of Session

In servlet/jsp, what data structure does a container use to store session-related variables? We guessed that first it had to be synchronized, because in a multithreaded environment the session is shared between threads, and the Web server is generally multithreaded (in order to improve performance will also use pool technology); second, the data structure must be easy to operate, preferably the traditional key-value pair access mode.

So let's start with a single Session object, which, in addition to storing its own information, such as the ID, the session of the Tomcat provides the programmer with an interface to store other information (in the class org.apache.catalina.session. Standardsession):

Public void SetAttribute (String name, Object value, boolean notify)

Here you can track what kind of data it uses:

protected Map attributes = new concurrenthashmap ();

It is clear that Tomcat used a Concurrenthashmap object to store the data, which is a class in the Java concurrent package. It just meets the two-point demand we're guessing: synchronization and ease of operation.

So what data structure does tomcat use to store all the session objects? Sure enough, or concurrenthashmap (in the org.apache.catalina.session. Managerbase class of the management session):

protected Map<string, session> sessions = new concurrenthashmap<string, session> ();

The specific reason does not have to say more. These two points should also be taken into account for the implementation of other Web servers.

Session Hijack

Session Hijack is a serious security threat and a widespread threat, and in session technology, the client and server maintain sessions by transmitting the session identifier, but this identifier can easily be sniffed and exploited by others. This belongs to a kind of middle man attack.

In this section, an example is given to illustrate what a session hijacking is, and through this example, the reader can actually understand the nature of the sessions.

First, I wrote the following page:

<%@ page language="java" pageencoding="iso-8859-1" session="true"%>

<! DOCTYPE HTML PUBLIC "-//w3c//dtd HTML 4.01 transitional//en" >

<title>index.jsp</title>

<body>

This is index.jsp page.

<br>

<%

Object o = Session.getattribute ("counter");

if (o = = null) {

Session.setattribute ("Counter", 1);

} Else {

Integer i = Integer.parseint (o.tostring ());

Session.setattribute ("Counter", I + 1);

}

Out.println (Session.getattribute ("Counter"));

%>

<a href="<%=response.encoderedirecturl (" index.jsp ")%>">index</a>

</body>

The function of the page is to place a counter in the session, the first time to access the page, the value of this counter is initialized to 1, after each access to the page counter is added 1. The value of the counter is printed to the page. In addition, in order to simulate more simply, the author disables the client (using firefox3.0) of the cookie, instead of the URL rewrite method, because the direct copy of the link is more convenient than the forgery of the cookie.

Below, open Firefox to visit this page, we see the value of the counter is 1:

(Figure III)

Then click on the index link to refresh the counter, be careful not to refresh the current page, because we do not use cookies, only after the URL with Jsessionid, and at this time the Address bar URL is not able to bring jsessionid. Four, I flushed the counter up to 20.

(Figure IV)

Here is the most critical, copy the address in the Firefox address bar (I see the Http://localhost:8080/sessio

N/INDEX.JSP;JSESSIONID=1380D9F60BCE9C30C3A7CBF59454D0A5), and then open another browser, where it is not necessary to disable its cookie. Here I open the Apple Safari3 Browser, and then paste the address into its address bar, enter after the following:

(Figure V)

It's strange, the counter went straight to 21. This example is done on the same computer, but the result is the same, even if you do it in two different sets. At this point, if you click the index link in two browsers alternately, you will find that they are actually manipulating the same counter. In fact, it's not surprising that Safari stole the key to maintaining the conversation between Firefox and Tomcat, the Jsessionid, which belongs to the session hijack. In Tomcat's view, Safari gave it a jsessionid, because of the stateless nature of the HTTP protocol, it does not know that this jsessionid is "hijacked" from Firefox, it will still find the corresponding session, and perform the relevant calculations. Firefox is not aware that its own session has been "hijacked".

Conclusion

Here, the reader should have a more in-depth understanding of the session, but due to the author's level and limited visibility, there are no lack of expression in the text, the entire article described in servlet/jsp in the session mechanism, but other development platform mechanism are original aim. As long as you think carefully, you will find that there are always a number of causal relationships there. In the context of increasing software size, we are more exposed to the framework, components, the programmer's eyes are blinded, in these frameworks, components continue to produce and version of the constant update, there are a lot of relatively constant things, that is the specification, protocol, mode, algorithm and so on, What really makes a person better is the underlying support technology. When you think a lot, you can turn similar explorations into a testament. Do technology like to solve the cow, know the rib know bone can do.

Reprint please retain Source: Shoru.cnblogs.com Jin Brother's money

Session principle Data structure

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.