[Architecture] design and comparison of a transfer Estimator

Source: Internet
Author: User
Tags bitset
Join the predicer

A [M, N] estimator uses the first M branch behavior to select from 2 ^ m branch prediction. Each prediction corresponds to the n-bit prediction of a single branch. The attraction of this branch estimator is that it achieves a higher prediction rate than the two schedulers and requires only a small amount of additional hardware support. The simplicity of its hardware is manifested in that the global history of the most recent M branch can be recorded in an M-position register, each of which records whether the branch is executed or not. The access to the branch prediction buffer station can be obtained by splicing the M-bit global history records at the low position of the Branch Address. The following is an example of [2, 2] predicer and how to access it.

The following is a [] Correlation estimator.

[] The associated estimator uses the first 10 branch actions to select from the 2 ^ 10 branch predictions. Each prediction corresponds to a two-digit estimator of a single branch. The global history of the last 10 branches can be recorded in a 10-shift register. Each digit records whether the branch is executed or not. The experiment requires a limit of 30 K space, that is, 2 ^ 10*2 * The number of entries selected for the branch is 30 K, and the number of entries selected for the branch is 15. To facilitate programming, the approximate number of 16-bit entries is used, that is, the 4-bit low address is used as the entry address. Therefore, the branch prediction buffer station can be obtained by splicing the 10-bit global history records of the 4-bit Branch Address. [] Shows the associated prediction design.

The two-digit failover period can indicate four States. When the counter is executed in the branch, 1 is added. When the row is not executed, 1 is subtracted. When the counter is 00 or 11, it is saturated. The status transfer status of the two-digit estimator is shown in.

The size of the [] Join estimator is 2 ^ 10 × 2 × 16 = 32 K bit.

Tournament predicer

The tournament forecaster uses two phases: a global predictor based on global information and a local predictor based on local information, and uses a selector to make a choice between the local forecaster and the global forecaster. The specific design is as follows:

Global predicer:The last 12 branch jumps are used for indexing, that is, the global estimator also has 2 ^ 12 = 4 K entries, each of which is a standard two-digit estimator.
Local predicer:Designed as two layers, the above layer is a local historical record, which uses 10 low bits of the instruction address for address indexing, that is, 2 ^ 10 = 1 K entries, each with 10 bits, corresponding to the 10 most recent branches of the portal, that is, the jump status of the 10 most recent branches. This 10-bit history allows you to record and predict the 10 branches, the entry selected from the local history records indexes A 1 K entry table, which consists of three counters to provide local prediction.
Selector:Use the low 12-bit branch local address index of the branch local address, that is, there are 2 ^ 12 = 4 K selectors, each index gets a two-digit counter, this parameter is used to select whether to use the local Estimator or the prediction result of the global estimator. By default, the local estimator is used during design. When both the predictors are correct or incorrect, the counter is not changed. When the global estimator is correct and the local estimator is wrong, add 1 to the counter; otherwise, subtract 1. The status transfer is shown in.

The size of the designed tournament estimator is 2 ^ 12 × 2 + 2 ^ 10 × 10 + 2 ^ 10 × 3 + 2 ^ 12 × 2 = 29 K bit.

* In addition, when collecting data, it is found that the selector of some documents uses the Global history, that is, the situation of the last 12 branches serves as the two counters of the selector. the following design is also attempted in the experiment, however, the results show that the first scenario is better.

Branch History Table Prediction

The simplest dynamic branch prediction solution is the branch history table. The branch history table is a small accesser that accesses the low-level address part of the Branch Address. That is, the memory contains N-bit prediction bits to indicate whether the Branch has been transferred successfully. In the experiment, this simple 2-bit branch History Table estimator is used as a performance comparison. If the limit is 30 kb, the number of entries selected in the 2 × branch history is 30 kb, and the number of entries selected in the branch is 30 kb. Then, a 14-bit lower command is obtained as the prediction index.

The predicted size of the branch History Table is 2 ^ 14 × 2 = 32 K bit.

Lab code

// Branch Predication Simulation// Coded by Wei Lan (Student ID: 1201214149)#include <iostream>#include <fstream>#include <sstream>#include <string>#include <bitset>#include <map>#include <vector>#include <set>#include <cmath>using namespace std;// global variables#define BITS4  0x0000f#define BITS10 0x003ff#define BITS12 0x00fff#define BITS14 0x03fff// 2-bit branch history table predictorvector<bitset<2> > branch_history_table(pow((float)2,14),bitset<2>(string("00")));// [10,2] correlating predictorconst int M=10;const int N=2;const int ADD_INDEX=14-M;int branch_saved=0;int predict_true=0,predict_false=0,actual_true=0,actual_false=0;bitset<M> latest_branch(string("0000000000"));vector<vector<bitset<N> > > correlating_predictor_table(pow((float)2,ADD_INDEX),vector<bitset <N> >(pow((float)2,M), bitset<N>(string("00"))));// tournament predictorvector< bitset<10> > local_history_table(pow((float)2,10), bitset<10>(string("0000000000")));vector< bitset<3> > local_predictor_table(pow((float)2,10),bitset<3>(string("000")));vector< bitset<2> > global_predictor_table(pow((float)2,12), bitset<2>(string("00")));bitset<12> global_history_table(string("000000000000"));int global_saved=0;vector<bitset<2> > selecotors(pow((float)2,12), bitset<2>(string("00")));// predition fuctionbool correlating(string current_pc, string next_pc);bool tournament(string current_pc, string next_pc);bool bht(string current_pc,string next_pc);int main(){string filenames[7]={"gcc.log", "compress.log", "crafty.log", "gzip.log", "mcf.log", "parser.log", "vpr.log"};double bht_average=0.0,correlating_average=0.0,tournament_average=0.0;for(int i=0;i<7;i++){cout<<"Testing file: "<<filenames[i]<<endl;ifstream fin(filenames[i]);int bht_predict_correct=0,bht_predict_wrong=0,correlating_predict_correct=0,correlating_predict_wrong=0,tournament_predict_correct=0,tournament_predict_wrong=0;string line;while(getline(fin,line)){stringstream theline(line);theline<<line;string current_pc;string next_pc;theline>>current_pc;theline>>next_pc;if(bht(current_pc,next_pc))++bht_predict_correct;else++bht_predict_wrong;if(correlating(current_pc,next_pc))++correlating_predict_correct;else++correlating_predict_wrong;if(tournament(current_pc,next_pc))++tournament_predict_correct;else++tournament_predict_wrong;}float bht_correct_rate=(float)(bht_predict_correct)/(float)(bht_predict_correct+bht_predict_wrong);cout<<"The correct rate for 2-bit branch history table predictor is: "<<bht_correct_rate<<endl;bht_average+=bht_correct_rate;float correlating_correct_rate=(float)correlating_predict_correct/(float)(correlating_predict_correct+correlating_predict_wrong);cout<<"The correct rate for ("<<M<<","<<N<<") correlating predictor is: "<<correlating_correct_rate<<endl;correlating_average+=correlating_correct_rate;float tournament_correct_rate=(float)tournament_predict_correct/(float)(tournament_predict_correct+tournament_predict_wrong);cout<<"The correct rate for tournament predictor is: "<<tournament_correct_rate<<endl;tournament_average+=tournament_correct_rate;cout<<endl;fin.close();}bht_average /=7.0;correlating_average /=7.0;tournament_average /=7.0;cout<<"Average: "<<bht_average<<" "<<correlating_average<<" "<<tournament_average<<endl;return 0;}bool correlating(string current_pc, string next_pc){bool jump_actual=false,jump_predict=false;long current_add=strtol(current_pc.c_str(), NULL, 16);long next_add=strtol(next_pc.c_str(), NULL, 16);bitset<ADD_INDEX> current_add_low=bitset<ADD_INDEX>(current_add & BITS4);if(correlating_predictor_table[current_add_low.to_ulong()][latest_branch.to_ullong()].at(1)==1)jump_predict=true;if((next_add-current_add)!=4)jump_actual=true;// Reset predictor and tablesif(jump_actual){long tmp_predict=correlating_predictor_table[current_add_low.to_ulong()][latest_branch.to_ullong()].to_ulong();tmp_predict=(tmp_predict+1)>3 ? 3: (tmp_predict+1);correlating_predictor_table[current_add_low.to_ulong()][latest_branch.to_ullong()]=bitset<N>(tmp_predict);}else {long tmp_predict=correlating_predictor_table[current_add_low.to_ulong()][latest_branch.to_ullong()].to_ulong();tmp_predict=(tmp_predict-1)<0 ? 0: (tmp_predict-1);correlating_predictor_table[current_add_low.to_ulong()][latest_branch.to_ullong()]=bitset<N>(tmp_predict);}long tmp_latest=latest_branch.to_ulong();latest_branch=bitset<M>( (tmp_latest << 1 ) & BITS10);latest_branch[0]=jump_actual ? 1 : 0;return !(jump_actual ^ jump_predict);}bool tournament(string current_pc, string next_pc){bool jump_actual=false,jump_predict=false,local_predict=false,global_predict=false;long current_add=strtol(current_pc.c_str(), NULL, 16);long next_add=strtol(next_pc.c_str(), NULL, 16);// Local predict int local_index=current_add & BITS10;if(local_predictor_table[local_history_table[local_index].to_ulong()].at(2)==1){local_predict=true;}// Global predictif(global_predictor_table[global_history_table.to_ulong()].at(1)==1){global_predict=true;}if(selecotors[current_add & BITS12].at(1)==0)jump_predict=local_predict;elsejump_predict=global_predict;// Update local and predictors and tablesif((next_add-current_add)!=4)jump_actual=true;if(jump_actual){long local_tmp=local_predictor_table[local_history_table[local_index].to_ulong()].to_ulong();local_tmp=(local_tmp+1) > 7 ? 7 : (local_tmp+1) ;local_predictor_table[local_history_table[local_index].to_ulong()]=bitset<3>(local_tmp);}else{long local_tmp=local_predictor_table[local_history_table[local_index].to_ulong()].to_ulong();local_tmp=(local_tmp-1) < 0 ? 0 : (local_tmp-1) ;local_predictor_table[local_history_table[local_index].to_ulong()]=bitset<3>(local_tmp);}long local_history_tmp=local_history_table[local_index].to_ulong();local_history_tmp=(local_history_tmp<<1) & BITS10;local_history_table[local_index]=bitset<10>(local_history_tmp);local_history_table[local_index][0]= jump_actual ? 1: 0;// Update global and predictors and tablesif( jump_actual){long global_tmp=global_predictor_table[global_history_table.to_ulong()].to_ulong();global_tmp=(global_tmp+1) > 3 ? 3 : (global_tmp+1) ;global_predictor_table[global_history_table.to_ulong()]=bitset<2>(global_tmp);}else {long global_tmp=global_predictor_table[global_history_table.to_ulong()].to_ulong();global_tmp=(global_tmp-1) < 0 ? 0 : (global_tmp-1) ;global_predictor_table[global_history_table.to_ulong()]=bitset<2>(global_tmp);}long global_history_tmp=global_history_table.to_ulong();global_history_tmp=( (global_history_tmp<<1) & BITS12);global_history_table=bitset<12>(global_history_tmp);global_history_table[0]=jump_actual ? 1:0;// Update selecotrsif(local_predict^global_predict){if( local_predict^jump_actual ){long selecotor_tmp=selecotors[current_add & BITS12].to_ulong();selecotor_tmp=(selecotor_tmp+1)>3 ? 3:  (selecotor_tmp+1);selecotors[current_add&0xfff]=bitset<2>(selecotor_tmp);}else if(global_predict^jump_actual ){long selecotor_tmp=selecotors[current_add & BITS12].to_ulong();selecotor_tmp=(selecotor_tmp-1)<0 ? 0:  (selecotor_tmp-1);selecotors[current_add&0xfff]=bitset<2>(selecotor_tmp);}}return !(jump_actual ^ jump_predict);}bool bht(string current_pc,string next_pc){bool jump_actual=false,jump_predict=false;long current_add=strtol(current_pc.c_str(), NULL, 16);long next_add=strtol(next_pc.c_str(), NULL, 16);if((next_add-current_add)!=4)jump_actual=true;if(branch_history_table[current_add & BITS14].at(1))jump_predict=true;// Update histroy tableif(jump_actual){long tmp=branch_history_table[current_add & BITS14].to_ulong();tmp=(tmp+1)>3 ? 3:(tmp+1);branch_history_table[current_add & BITS14]=bitset<2>(tmp);}else{long tmp=branch_history_table[current_add & BITS14].to_ulong();tmp=(tmp-1)<0 ? 0:(tmp-1);branch_history_table[current_add & BITS14]=bitset<2>(tmp);}return !(jump_actual ^ jump_predict);}

(Reprinted please indicate the author and Source: http://blog.csdn.net/xiaowei_cqu is not allowed for commercial use)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.