Test the Direct Screen Writing speed of vga12h

Source: Internet
Author: User

File: vgaspeed.txt
Name: test the speed of the vga12h Mode
Author: zyl910
Blog: http://blog.csdn.net/zyl910/
Version: V1.0
Updata: 2006-11-14

Download (note the modified extension)

Introduction
~~~~
I 've written a lot of screen writing code in DOS, but I never thought of a problem-there are multiple VGA operating speeds. Therefore, I wrote a small program to test the VGA speed.
Graphics Mode: VGA 12 h, 640*480*16 colors.
Three test items:
1. Read test. The VGA read mode 0 is used to copy the video memory data to the system memory in a row-by-plane mode.
2. Write test. If the VGA write mode is 0, the system memory data is copied to the video memory on a row-by-bit plane by scan.
3. Write test while waiting for vertical scanning.
Four access methods: C language remote pointer, movsb, movsw, and movsd

 

Test Results
~~~~~~~~

CPU: AMD athlon XP 1700 + (actual frequency: 1463 MHz (11x133 ))
Memory: ddr266 256 MB
Graphics: NVIDIA geforce2 MX/MX 400 (AGP 4x)
Memory bandwidth: 125 MHz * 2000 bit = MB/S
Operating System: Windows XP SP2
[FPS]
R_C: 11.7646
W_c: 51.5834
R_byte: 12.0000
W_byte: 86.4751
R_word: 23.6298
W_word: 124.8862
R_dword: 44.7459
W_dword: 156.8619
Waitw_ B: 60.0298
Waitw_w: 59.9293
Waitwi_d: 59.9293
124.8862/86.4751 = 144.42%
156.8619/124.8862 = 125.60%

 

CPU: AMD athlon XP 1700 + (actual frequency: 1463 MHz (11x133 ))
Memory: ddr266 256 MB
Graphics: NVIDIA geforce2 MX/MX 400 (AGP 4x)
Memory bandwidth: 125 MHz * 2000 bit = MB/S
Operating System: Windows 98SE
[FPS]
R_C: 11.6641
W_c: 60.7337
R_byte: 11.9657
W_byte: 98.8431
R_word: 23.4287
W_word: 173.9558
R_dword: 44.4442
W_dword: 267.9724
Waitw_ B: 59.9293
Waitw_w: 59.9293
Waitwi_d: 60.0298
173.9558/98.8431 = 175.99%
267.9724/173.9558 = 154.05%

CPU: AMD athlon XP 1700 + (actual frequency: 1463 MHz (11x133 ))
Memory: ddr266 256 MB
Graphics: NVIDIA geforce2 MX/MX 400 (AGP 4x)
Memory bandwidth: 125 MHz * 2000 bit = MB/S
Operating System: DOS real mode
[FPS]
R_C: 11.7646
W_c: 61.2365
R_byte: 12.0663
W_byte: 107.9934
R_word: 23.6298
W_word: 190.6475
R_dword: 44.9470
W_dword: 279.1337
Waitw_ B: 60.0298
Waitw_w: 59.9293
Waitwi_d: 60.0298
190.6475/107.9934 = 176.54%
279.1337/190.6475 = 146.41%

 

CPU: Intel celon 2.53 GHz
Memory: Dual ddr333 512 MB
Graphics: NVIDIA Riva tnt2 Model 64 (AGP 4x)
Memory bandwidth: 110 MHz * 64bit = 880 MB/S
Operating System: Windows XP SP2
[FPS]
R_C: 8.4464
W_c: 37.5061
R_byte: 8.4000
W_byte: 46.1536
R_word: 16.2895
W_word: 64.1525
R_dword: 31.1713
W_dword: 78.9337
Waitw_ B: 29.8641
Waitw_w: 59.3260
Waitwi_d: 59.5271
64.1525/46.1536 = 138.99%
78.9337/64.1525 = 123.04%

 

CPU: Intel celon, 1800 MHz (18x100)
Memory: ddr266 256 MB
Graphics: NVIDIA geforce4 MX 440 (AGP 8x)
Memory bandwidth: 405 MHz * 64bit = 3240 MB/S
Operating System: Windows XP SP2
[FPS]
R_C: 7.7425
W_c: 33.1823
R_byte: 7.8000
W_byte: 42.9359
R_word: 15.0829
W_word: 56.5105
R_dword: 28.8586
W_dword: 68.5768
Waitw_ B: 28.6575
Waitw_w: 49.2707
Waitwi_d: 55.2033
56.5105/42.9359 = 131.62%
68.5768/56.5105 = 121.35%

 

Analysis
~~~~

I. The update rate is 60 frames.
The test results of "Write test while waiting for Vertical Scanning" are similar, which indicates that the update rate in vga12h mode is 60 frames.

2. Reading speed is much slower than writing speed
This may be because VGA hardware is designed for Output purposes, so it is specially optimized for write operations.
Due to the slow reading speed, some VGA hardware acceleration features that require read graphics storage-cyclic shift and bitmap merge modes-may not improve the performance, but may be slowed down.

3. Performance Improvement of movsw/movsd is not as high as expected
For read tests, movsw/movsd doubles the performance.
For write tests, although movsw/movsd can improve performance, it has not doubled.
However, we still stick to the movsd method in pursuit of speed.
Note that write mode 1 only supports movsb.

4. Why is the speed so slow?
The vga12h mode resolution is 640*480*16 colors, so the size of each frame is 640*480*4/8 = 153600 (byte)
If each clock copies one byte, the theoretical frame rate is: 66 MHz * 1 byteperhz/153600 = 429.6875fps.
The actual movsb test result is about 100 frames, only 1/4 of the theoretical value. If you want to increase the frequency by 4 times by using AGP 4x, the gap is even greater.
It may be because when the video card sends frame data to the monitor, it cannot access the video memory on the main surface. However, there is only one primary surface (VGA is KB) in the vga12h, and it cannot be removed from the screen, so dual-buffer acceleration is not available.

Test code
~~~~~~~~

 

/*
File: vgaspeed. c
Name: test the speed of the vga12h Mode
Author: zyl910
Blog: http://blog.csdn.net/zyl910/
Version: V1.0
Updata: 2006-11-14
*/
#include <stdio.h>
#include <conio.h>
#include <mem.h>
#include <dos.h>

typedef unsigned char BYTE;
typedef unsigned int  WORD;
typedef unsigned long DWORD;
typedef void far* LPVOID;

#define SCR_W 640
#define SCR_H 480
#define SCR_PLANES 4
#define SCANSIZE_DIB ((SCR_W)/2)
#define SCANSIZE_VGA ((SCR_W)/8)
#define SEG_VIDEO 0xA000
#define WaitVR() while(!(inportb(0x3da)&0x08))
static volatile DWORD far* const pbiosclock = MK_FP(0x0040, 0x6C);
#define BIOSCLOCK_F ((double)18.2)
void repmovsb(LPVOID lpD, LPVOID lpS, WORD cBytes)
{
 _asm{
  push ds
  push es
  mov cx, cBytes
  les di, lpD
  lds si, lpS
  rep movsb
  pop es
  pop ds;
 }
}
void repmovsw(LPVOID lpD, LPVOID lpS, WORD cWords)
{
 _asm{
  push ds
  push es
  mov cx, cWords
  les di, lpD
  lds si, lpS
  rep movsw
  pop es
  pop ds;
 }
}
void repmovsd(LPVOID lpD, LPVOID lpS, WORD cDWords)
{
 _asm{
  push ds
  push es
  mov cx, cDWords
  les di, lpD
  lds si, lpS
  db 0x66; rep movsw; /* rep movsd */
  pop es
  pop ds
 }
}
int main(void)
{
 BYTE byVGA[SCR_PLANES][SCANSIZE_VGA];
 DWORD cntF;
 int iX, iY;
 BYTE iP;
 WORD pscan;
 BYTE far *pbyV;
 BYTE *pbyM;
 BYTE bymask;
 DWORD tmrold, tmrcur, tmrover;
 double fpsR_C, fpsR_BYTE, fpsR_WORD, fpsR_DWORD;
 double fpsW_C, fpsW_BYTE, fpsW_WORD, fpsW_DWORD;
 double fpsWaitW_BYTE, fpsWaitW_WORD, fpsWaitW_DWORD;
 /* init VGA 12h: 640*480*4bit */
 _asm{
  mov ax, 0x0012;
  int 0x10;
  cld;
 }
 printf("Testing...");
 /* R:C */
 do{
  cntF = 0;
  tmrold = *pbiosclock;
  tmrover = tmrold + (DWORD)(BIOSCLOCK_F * 10); /* 10s */
  do{
   pscan = 0;
   for(iY=0; iY<SCR_H; iY++)
   {
    for(iP=0; iP<SCR_PLANES; iP++)
    {
     _asm{
      mov dx, 0x3CE; /* gc[4]:Read Map Select */
      mov al, 4;
      out dx, al;
      inc dx;
      mov al, iP;
      out dx, al;
     }
     pbyM = byVGA[iP];
     pbyV = MK_FP(SEG_VIDEO, pscan);
     for(iX=0; iX<SCANSIZE_VGA; iX++)
     {
      *pbyM++ = *pbyV++;
     }
    }
    pscan += SCANSIZE_VGA;
   }
   cntF++;
   tmrcur = *pbiosclock;
  }while((tmrcur<tmrover)&&(tmrcur>=tmrold));
  if (tmrcur < tmrold) continue;
 }while(0);
 fpsR_C = cntF / ((tmrcur-tmrold)/BIOSCLOCK_F);
 /* W:C */
 do{
  cntF = 0;
  tmrold = *pbiosclock;
  tmrover = tmrold + (DWORD)(BIOSCLOCK_F * 10); /* 10s */
  do{
   memset(byVGA[0], -(1&(((int)cntF)>>0)), SCANSIZE_VGA);
   memset(byVGA[1], -(1&(((int)cntF)>>1)), SCANSIZE_VGA);
   memset(byVGA[2], -(1&(((int)cntF)>>2)), SCANSIZE_VGA);
   memset(byVGA[3], -(1&(((int)cntF)>>3)), SCANSIZE_VGA);
   pscan = 0;
   for(iY=0; iY<SCR_H; iY++)
   {
    bymask = 1;
    for(iP=0; iP<SCR_PLANES; iP++)
    {
     _asm{
      mov dx, 0x3C4; /* sc[2]:Map Mask */
      mov al, 2;
      out dx, al;
      inc dx;
      mov al, bymask;
      out dx, al;
     }
     pbyM = byVGA[iP];
     pbyV = MK_FP(SEG_VIDEO, pscan);
     for(iX=0; iX<SCANSIZE_VGA; iX++)
     {
      *pbyV++ = *pbyM++;
     }
     bymask <<= 1;
    }
    pscan += SCANSIZE_VGA;
   }
   cntF++;
   tmrcur = *pbiosclock;
  }while((tmrcur<tmrover)&&(tmrcur>=tmrold));
  if (tmrcur < tmrold) continue;
 }while(0);
 fpsW_C = cntF / ((tmrcur-tmrold)/BIOSCLOCK_F);
 /* R:Byte */
 do{
  cntF = 0;
  tmrold = *pbiosclock;
  tmrover = tmrold + (DWORD)(BIOSCLOCK_F * 10); /* 10s */
  do{
   pscan = 0;
   for(iY=0; iY<SCR_H; iY++)
   {
    for(iP=0; iP<SCR_PLANES; iP++)
    {
     _asm{
      mov dx, 0x3CE; /* gc[4]:Read Map Select */
      mov al, 4;
      out dx, al;
      inc dx;
      mov al, iP;
      out dx, al;
     }
     repmovsb(byVGA[iP], MK_FP(SEG_VIDEO, pscan), SCANSIZE_VGA);
    }
    pscan += SCANSIZE_VGA;
   }
   cntF++;
   tmrcur = *pbiosclock;
  }while((tmrcur<tmrover)&&(tmrcur>=tmrold));
  if (tmrcur < tmrold) continue;
 }while(0);
 fpsR_BYTE = cntF / ((tmrcur-tmrold)/BIOSCLOCK_F);
 /* W:BYTE */
 do{
  cntF = 0;
  tmrold = *pbiosclock;
  tmrover = tmrold + (DWORD)(BIOSCLOCK_F * 10); /* 10s */
  do{
   memset(byVGA[0], -(1&(((int)cntF)>>0)), SCANSIZE_VGA);
   memset(byVGA[1], -(1&(((int)cntF)>>1)), SCANSIZE_VGA);
   memset(byVGA[2], -(1&(((int)cntF)>>2)), SCANSIZE_VGA);
   memset(byVGA[3], -(1&(((int)cntF)>>3)), SCANSIZE_VGA);
   pscan = 0;
   for(iY=0; iY<SCR_H; iY++)
   {
    bymask = 1;
    for(iP=0; iP<SCR_PLANES; iP++)
    {
     _asm{
      mov dx, 0x3C4; /* sc[2]:Map Mask */
      mov al, 2;
      out dx, al;
      inc dx;
      mov al, bymask;
      out dx, al;
     }
     repmovsb(MK_FP(SEG_VIDEO, pscan), byVGA[iP], SCANSIZE_VGA);
     bymask <<= 1;
    }
    pscan += SCANSIZE_VGA;
   }
   cntF++;
   tmrcur = *pbiosclock;
  }while((tmrcur<tmrover)&&(tmrcur>=tmrold));
  if (tmrcur < tmrold) continue;
 }while(0);
 fpsW_BYTE = cntF / ((tmrcur-tmrold)/BIOSCLOCK_F);
 /* R:Word */
 do{
  cntF = 0;
  tmrold = *pbiosclock;
  tmrover = tmrold + (DWORD)(BIOSCLOCK_F * 10); /* 10s */
  do{
   pscan = 0;
   for(iY=0; iY<SCR_H; iY++)
   {
    for(iP=0; iP<SCR_PLANES; iP++)
    {
     _asm{
      mov dx, 0x3CE; /* gc[4]:Read Map Select */
      mov al, 4;
      out dx, al;
      inc dx;
      mov al, iP;
      out dx, al;
     }
     repmovsw(byVGA[iP], MK_FP(SEG_VIDEO, pscan), SCANSIZE_VGA/2);
    }
    pscan += SCANSIZE_VGA;
   }
   cntF++;
   tmrcur = *pbiosclock;
  }while((tmrcur<tmrover)&&(tmrcur>=tmrold));
  if (tmrcur < tmrold) continue;
 }while(0);
 fpsR_WORD = cntF / ((tmrcur-tmrold)/BIOSCLOCK_F);
 /* W:WORD */
 do{
  cntF = 0;
  tmrold = *pbiosclock;
  tmrover = tmrold + (DWORD)(BIOSCLOCK_F * 10); /* 10s */
  do{
   memset(byVGA[0], -(1&(((int)cntF)>>0)), SCANSIZE_VGA);
   memset(byVGA[1], -(1&(((int)cntF)>>1)), SCANSIZE_VGA);
   memset(byVGA[2], -(1&(((int)cntF)>>2)), SCANSIZE_VGA);
   memset(byVGA[3], -(1&(((int)cntF)>>3)), SCANSIZE_VGA);
   pscan = 0;
   for(iY=0; iY<SCR_H; iY++)
   {
    bymask = 1;
    for(iP=0; iP<SCR_PLANES; iP++)
    {
     _asm{
      mov dx, 0x3C4; /* sc[2]:Map Mask */
      mov al, 2;
      out dx, al;
      inc dx;
      mov al, bymask;
      out dx, al;
     }
     repmovsw(MK_FP(SEG_VIDEO, pscan), byVGA[iP], SCANSIZE_VGA/2);
     bymask <<= 1;
    }
    pscan += SCANSIZE_VGA;
   }
   cntF++;
   tmrcur = *pbiosclock;
  }while((tmrcur<tmrover)&&(tmrcur>=tmrold));
  if (tmrcur < tmrold) continue;
 }while(0);
 fpsW_WORD = cntF / ((tmrcur-tmrold)/BIOSCLOCK_F);
 /* R:DWord */
 do{
  cntF = 0;
  tmrold = *pbiosclock;
  tmrover = tmrold + (DWORD)(BIOSCLOCK_F * 10); /* 10s */
  do{
   pscan = 0;
   for(iY=0; iY<SCR_H; iY++)
   {
    for(iP=0; iP<SCR_PLANES; iP++)
    {
     _asm{
      mov dx, 0x3CE; /* gc[4]:Read Map Select */
      mov al, 4;
      out dx, al;
      inc dx;
      mov al, iP;
      out dx, al;
     }
     repmovsd(byVGA[iP], MK_FP(SEG_VIDEO, pscan), SCANSIZE_VGA/4);
    }
    pscan += SCANSIZE_VGA;
   }
   cntF++;
   tmrcur = *pbiosclock;
  }while((tmrcur<tmrover)&&(tmrcur>=tmrold));
  if (tmrcur < tmrold) continue;
 }while(0);
 fpsR_DWORD = cntF / ((tmrcur-tmrold)/BIOSCLOCK_F);
 /* W:DWORD */
 do{
  cntF = 0;
  tmrold = *pbiosclock;
  tmrover = tmrold + (DWORD)(BIOSCLOCK_F * 10); /* 10s */
  do{
   memset(byVGA[0], -(1&(((int)cntF)>>0)), SCANSIZE_VGA);
   memset(byVGA[1], -(1&(((int)cntF)>>1)), SCANSIZE_VGA);
   memset(byVGA[2], -(1&(((int)cntF)>>2)), SCANSIZE_VGA);
   memset(byVGA[3], -(1&(((int)cntF)>>3)), SCANSIZE_VGA);
   pscan = 0;
   for(iY=0; iY<SCR_H; iY++)
   {
    bymask = 1;
    for(iP=0; iP<SCR_PLANES; iP++)
    {
     _asm{
      mov dx, 0x3C4; /* sc[2]:Map Mask */
      mov al, 2;
      out dx, al;
      inc dx;
      mov al, bymask;
      out dx, al;
     }
     repmovsd(MK_FP(SEG_VIDEO, pscan), byVGA[iP], SCANSIZE_VGA/4);
     bymask <<= 1;
    }
    pscan += SCANSIZE_VGA;
   }
   cntF++;
   tmrcur = *pbiosclock;
  }while((tmrcur<tmrover)&&(tmrcur>=tmrold));
  if (tmrcur < tmrold) continue;
 }while(0);
 fpsW_DWORD = cntF / ((tmrcur-tmrold)/BIOSCLOCK_F);
 /* WaitW:BYTE */
 do{
  cntF = 0;
  tmrold = *pbiosclock;
  tmrover = tmrold + (DWORD)(BIOSCLOCK_F * 10); /* 10s */
  do{
   memset(byVGA[0], -(1&(((int)cntF)>>0)), SCANSIZE_VGA);
   memset(byVGA[1], -(1&(((int)cntF)>>1)), SCANSIZE_VGA);
   memset(byVGA[2], -(1&(((int)cntF)>>2)), SCANSIZE_VGA);
   memset(byVGA[3], -(1&(((int)cntF)>>3)), SCANSIZE_VGA);
   pscan = 0;
   WaitVR();
   for(iY=0; iY<SCR_H; iY++)
   {
    bymask = 1;
    for(iP=0; iP<SCR_PLANES; iP++)
    {
     _asm{
      mov dx, 0x3C4; /* sc[2]:Map Mask */
      mov al, 2;
      out dx, al;
      inc dx;
      mov al, bymask;
      out dx, al;
     }
     repmovsb(MK_FP(SEG_VIDEO, pscan), byVGA[iP], SCANSIZE_VGA);
     bymask <<= 1;
    }
    pscan += SCANSIZE_VGA;
   }
   cntF++;
   tmrcur = *pbiosclock;
  }while((tmrcur<tmrover)&&(tmrcur>=tmrold));
  if (tmrcur < tmrold) continue;
 }while(0);
 fpsWaitW_BYTE = cntF / ((tmrcur-tmrold)/BIOSCLOCK_F);
 /* WaitW:WORD */
 do{
  cntF = 0;
  tmrold = *pbiosclock;
  tmrover = tmrold + (DWORD)(BIOSCLOCK_F * 10); /* 10s */
  do{
   memset(byVGA[0], -(1&(((int)cntF)>>0)), SCANSIZE_VGA);
   memset(byVGA[1], -(1&(((int)cntF)>>1)), SCANSIZE_VGA);
   memset(byVGA[2], -(1&(((int)cntF)>>2)), SCANSIZE_VGA);
   memset(byVGA[3], -(1&(((int)cntF)>>3)), SCANSIZE_VGA);
   pscan = 0;
   WaitVR();
   for(iY=0; iY<SCR_H; iY++)
   {
    bymask = 1;
    for(iP=0; iP<SCR_PLANES; iP++)
    {
     _asm{
      mov dx, 0x3C4; /* sc[2]:Map Mask */
      mov al, 2;
      out dx, al;
      inc dx;
      mov al, bymask;
      out dx, al;
     }
     repmovsw(MK_FP(SEG_VIDEO, pscan), byVGA[iP], SCANSIZE_VGA/2);
     bymask <<= 1;
    }
    pscan += SCANSIZE_VGA;
   }
   cntF++;
   tmrcur = *pbiosclock;
  }while((tmrcur<tmrover)&&(tmrcur>=tmrold));
  if (tmrcur < tmrold) continue;
 }while(0);
 fpsWaitW_WORD = cntF / ((tmrcur-tmrold)/BIOSCLOCK_F);
 /* WaitW:DWORD */
 do{
  cntF = 0;
  tmrold = *pbiosclock;
  tmrover = tmrold + (DWORD)(BIOSCLOCK_F * 10); /* 10s */
  do{
   memset(byVGA[0], -(1&(((int)cntF)>>0)), SCANSIZE_VGA);
   memset(byVGA[1], -(1&(((int)cntF)>>1)), SCANSIZE_VGA);
   memset(byVGA[2], -(1&(((int)cntF)>>2)), SCANSIZE_VGA);
   memset(byVGA[3], -(1&(((int)cntF)>>3)), SCANSIZE_VGA);
   pscan = 0;
   WaitVR();
   for(iY=0; iY<SCR_H; iY++)
   {
    bymask = 1;
    for(iP=0; iP<SCR_PLANES; iP++)
    {
     _asm{
      mov dx, 0x3C4; /* sc[2]:Map Mask */
      mov al, 2;
      out dx, al;
      inc dx;
      mov al, bymask;
      out dx, al;
     }
     repmovsd(MK_FP(SEG_VIDEO, pscan), byVGA[iP], SCANSIZE_VGA/4);
     bymask <<= 1;
    }
    pscan += SCANSIZE_VGA;
   }
   cntF++;
   tmrcur = *pbiosclock;
  }while((tmrcur<tmrover)&&(tmrcur>=tmrold));
  if (tmrcur < tmrold) continue;
 }while(0);
 fpsWaitW_DWORD = cntF / ((tmrcur-tmrold)/BIOSCLOCK_F);
 /* Exit VGA */
 _asm{
  mov ax, 0x0003;
  int 0x10;
 }
 /* out */
 printf("[FPS]/n");
 printf("R_C    :%16.4f/n", fpsR_C);
 printf("W_C    :%16.4f/n", fpsW_C);
 printf("R_BYTE :%16.4f/n", fpsR_BYTE);
 printf("W_BYTE :%16.4f/n", fpsW_BYTE);
 printf("R_WORD :%16.4f/n", fpsR_WORD);
 printf("W_WORD :%16.4f/n", fpsW_WORD);
 printf("R_DWORD:%16.4f/n", fpsR_DWORD);
 printf("W_DWORD:%16.4f/n", fpsW_DWORD);
 printf("WaitW_B:%16.4f/n", fpsWaitW_BYTE);
 printf("WaitW_W:%16.4f/n", fpsWaitW_WORD);
 printf("WaitW_D:%16.4f/n", fpsWaitW_DWORD);
 return 0;
}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.