If you look at this question, you will surely be confused: What is the relationship between web application pages? Let me explain it slowly.
In existing web applications, the number of web pages that can be displayed must be greater than the actual number of JSP files. The reason is that when processing events or simply page jumps, the URL is followed by some parameters, for example:
Http: // XXXXXXXX: 9090/linghao/buy. jsp? Action = add & IID = 2 the page for processing these URLs is not necessarily located in a new JSP file, and may be processed internally in its own file, therefore, the number of web pages that can be displayed must be greater than the actual number of JSP files. Today, we focus on these parameters. Different parameter numbers and parameter values will redirect the current page to different pages, the relationship information of these pages is often hidden in JSP Java code. Unless you manually find them, you can only find these parameters through program analysis.
Now I have a similar analysis on improving Test Case Generation for Web applications using automatic interface discovery.
This job also finds the domain infomation through the analysis servlet, that is, the parameter information after the URL, but this job was used for testing at the time to overwrite the execution path, the problem we want to solve belongs to the same category. First, let's take a look at what their solution is.
They first defined an example servlet: (IP represents input parameters)
interface = IP∗IP = name, domain informationname =< string >domain information = domain type, relevant value∗domain type = ANY |NUMERICrelevant value =< string > | < number >
The article mentions an algorithm, which consists of two steps:
1. Find the parameter type string or number ......
Algorithm 1-phase1/* getdomaininfo */input: servlets: Set of servlets in the Web application // Java code output: icfg for the Servlets, annotated with Domain Information // Process Control Flow Diagram for outputting the parameter information marked with Java code begin icfg ← icfg for the Web application // obtain the inter-process control flow diagram compute data-dependence information the Web application for each pfcallintheservlets do // each request appears. getparameter (); process pfnode using ICF G's node representing pf pfvar limit LHS of the pf call statement // request. return Value of getparameter () newannot found New Annotation // Add Comment newannot. ipnode into pfnode // comment content-node newannot. type required any // The initialization type is any newannot. values values {} // the possible value set is null associate newannot with pfnode // mark these comments to the node GDI (duchain (pfvar, node), pfvar, pfnode, {}) Call the GDI function. The specific implementation is as follows. duchain represents define-use chain end for return icfg/* returns annotated IC FG */end/* GDI */input: node: current node to examine // parameter 1: node‑ar to be analyzed: variable storing the IP value and used at node // parameter 2: the root node: node to be annotated variable that represents the parameter value used at node // parameter 3: the node visited nodes: nodes visited along current path // parameter 4: The variable used for marking. If the node is analyzed, it is marked as visited to prevent the begin if node from being analyzed again! ε visited nodes then // If node has not been analyzed, if node is an exit node then // If a function exit statement returnsites implements possible return sites for node's method // the node set that may be returned after the function exits, that is, when the node that originally called this function is combined with for each retsite ε returnsites do // After the retvar variable defined at retsite of the node that calls the method of the node is executed successfully newannot unknown root node's Annotation // record the comment of the Root Node Method of the method in which the node is located and associate newannot with node retsite // create a comment to associate the node with GDI (duchain (retvar, retsite), retvar, retsite, visited nodes vertex {node}) // recursively call the GDI function, duchain (v, n) end for else if node represents a comparison with a constant then // compval variable value used in comparison // record the value used for comparison. addvaluetoannotation (root node, compval) // Add this value to the comment of the root node. Else If node is a type cast/conversion of variable ar then // If the variable is used, the casttype conversion target type of cast operation // record the converted type. setdomaintypeinannotation (root node, casttype) // change the type field in the comment to the converted type end if If node contains a definition of a variable then // If the node contains a new variable definition, that is to say, if the previous variable uses var variable defined at node for each n ε duchain (VAR, node) Do // when the value is passed to another variable, You need to track this variable again, always find the type of the value obtained from the parameter and the possible value of GDI (n, VAR, root node, visited nodes limit {node }) end for end if end ifend
In fact, the main task of this algorithm is to confirm the type of parameters in the URL and their possible values. The output of the algorithm is as follows:
Step 2: obtain possible Parameter Combinations
Algorithm 2-Phase 2/* extractinterfaces */input: icfg: annotated icfg produced by getdomaininfo // The inter-process control flow diagram obtained in Step 1: Output: interfaces []: interfaces exposed by each of the servlets // The begin CG skip call graph for the Web application // obtain the call diagram of the Web application, the node in the figure is the method SCC using set of stronugly connected components in CG // find the strongly connected part singletons using set of Singleton sets, one for each node in CG that is not part of a stronugly connected component // node set of the out-of-force connection section CC worker SCC worker singletons // All node sets for each mset ε CC, in reverse topological order do // traverse each node (method) in reverse topological order, that is, traverse the underlying functions first, this ensures that all other methods called by the function have the analysis result summarizemethod (mset) during analysis) // The specific method is as follows: End for return interfaces of each Servlet's root methodend/* summarizemethod */input: methodset implements CG nodes: singleton set or set of Strongly Connected methods in the Call Graph // mset is a set of Strongly Connected methods or a single method begin n ← sm in methodset nodes in m's CFG worklist worker {} for each n ε n do // every node in the control flow graph of convenience m in [N] records {}// the variable set for entering N is initially empty if n corresponds to a PF call then // if n is a request. getparameter () and so on newip route new IP newip. node Route n newip. name parameter of the PF call // The parameter for saving the PF method, that is, the request. getparameter () parameter, that is, the parameter name passed in the URL if n's Annotation has Domain Information dominfo then newip. domaininfo into dominfo // if n contains URL Information (result of the first step of the algorithm), add this information to the new IP address else newip. domaininfo else null end if Gen [N] else {newip} // because a new node is generated, so add the new node to the gen [N] Set add nodes in succ (N) to worklist // Add the nodes after N to the list to be analyzed. Else if n is a callsite and target (N) has summary s then // If the N node calls another method and the called method already contains summary Gen [N] ← map (N, S) // a new ing between the node and the summary is generated at n. For each interface ε Gen [N] Do // traverse each node in the gen set for each IP ε interface do annot specified annotation associated with N's return site if IP. node = annot. ipnode andannot has Domain Information dominfo then IP. domaininfo into dominfo end if end for add nodes in succ (n) To worklist else if n is a method entry point then // if n is a method entry point, then it will not generate IP Gen [N] Route {{}} add nodes in succ (N) to worklist else Gen [N] Begin begin end if out [N] Primary Gen [N] End for while | worklist | 6 = 0 do n have first element in worklist in [N] SP ε PRED (N) out [p] out' [N] records {} for each I ε in [N] Do for each G ε Gen [N] Do out' [N] timed out' [n] Combine {I have g} // list possible IP combinations end for if out '[N] 6 = out [N] Then out [N] timed out ′[ n] If n is a callsite and target (N) ε methodset then add target (n)'s entry node to worklist else add nodes in succ (N) to worklist end if end while for each M ε methodset do summary timed out [M's exit node] // method m the IP output set of the exit node is the required summary associate Summary to method M for each interface ε summary do for each IP ε interface such that IP. name is not a concrete value do IP. name parameter resolve (IP) // Replace the virtual parameter with the corresponding real parameter end for end forend
The second step of the algorithm is to traverse the method bodies in each set in the call flow chart, and generate a summary for each method that contains various possible IP combinations.
These two algorithms are long in length and are mentioned in the document. It takes a long time to study and understand them. I would like to record them and consolidate the memory to help the children's shoes who want to solve the problem, go to bed ~