Start auto intelligent virtual instrument 2

Source: Internet
Author: User
Tags crc32
Repost an article about how to start Linux in one second. Based on TI's dm6446, smart instruments can be used.
All this for 1 second boot

Results

Let me give the results first

Sequence Time (uncompressed kernel) Time (compressed kernel)
U-boot load and start 0.08 0.08
Kernel Load 0.52 (0.24 CRC32 and 0.28 for copy) 0.35 (0.16 CRC32 and 0.19 for copy)
Kernel start and uncompress 0.0 0.66 (for unzip)
Kernel Initialization 0.37 0.38
Init 0 (bypassed init =/bin/ash) 0 (bypassed init =/bin/ash)
Switching run level and executing init scripts 0 (bypassed init =/bin/ash) 0 (bypassed init =/bin/ash)
Starting Shell 0 (init =/bin/ash) 0 (init =/bin/ash)
Total 0.97 1.47

Message Log with uncompressed kernel (refer
Measuring boot time)

$./Tstamp.exe </dev/ttys0

       column1 is elapsed time since first message       column2 is elapsed time since previous message       column3 is the message0.000 0.000:0.000 0.000:0.000 0.000: U-Boot 1.2.0 (Jun 23 2008 - 14:53:30)0.000 0.000:0.000 0.000: I2C:   ready0.000 0.000: DRAM:  256 MB0.000 0.000: MY AMD Flash: 16 MB0.060 0.060: In:    serial0.060 0.000: Out:   serial0.060 0.000: Err:   serial0.070 0.010: RM Clock :- 297MHz DDR Clock :- 162MHz1.071 1.001: Hit any key to stop autoboot:   01.351 0.280: # Booting image at 80007fc0 ...1.351 0.000:    Verifying Checksum ...1.502 0.151: OK1.502 0.000: OK1.502 0.000: ## Loading Ramdisk Image at 80900000 ...1.502 0.000:    Verifying Checksum ...1.592 0.090: OK1.602 0.010:1.602 0.000: Starting kernel ...1.602 0.000:1.972 0.370:bin/ash: can't access tty; job control turned off

Note: Remember to subtract 1 second bootdelay

Message Log with compressed kernel (refer
Measuring boot time)

$./Tstamp.exe </dev/ttys0

       column1 is elapsed time since first message       column2 is elapsed time since previous message       column3 is the message0.000 0.000:0.000 0.000: U-Boot 1.2.0 (Jun 23 2008 - 14:53:30)0.000 0.000:0.000 0.000: I2C:   ready0.010 0.010: DRAM:  256 MB0.010 0.000: MY AMD Flash: 16 MB0.060 0.050: In:    serial0.060 0.000: Out:   serial0.070 0.010: Err:   serial0.070 0.000: ARM Clock :- 297MHz DDR Clock :- 162MHz1.071 1.001: Hit any key to stop autoboot:   01.261 0.190: # Booting image at 80007fc0 ...1.261 0.000:    Verifying Checksum ...1.331 0.070: OK1.331 0.000: OK1.341 0.010: ## Loading Ramdisk Image at 80900000 ...1.341 0.000:    Verifying Checksum ...1.432 0.091: OK1.432 0.000:1.432 0.000: Starting kernel ...1.432 0.000:2.093 0.661: Uncompressing Linux.......................................................................... done, booting the kernel.2.473 0.380:/bin/ash: can't access tty; job control turned off

Note: Remember to subtract 1 second bootdelay

Hardware setup
Board:DM6446 DVEVMRS232 port connected to PC
Software setup
Linux: Montavista Pro 5.0 installed on Linux BoxLSP: REL_LSP_PSP_02_00_00_010
Linux configuration for boot time functions ction
U-Boot, Kernel and ramdisk(cramfs) in NOR flash.Rootfilesystem is ramdisk (cramfs)
Optimize kernel size
  • Remove unused components from Kernel
  • Use loadable modules option to defer initialization of components to after-boot. Example: network initialization.
This gave 0x107650 bytes (~1MB) compressed Kernel and 0x238FA0 bytes (~2.2MB) uncompressed Kernel.I have used 0x107650 and 0x238FA0 in this article, please replace it with your Kernel size appropriately.
Optimize filesystem size
  • Rebuild rootfilesystem with minimal components
  • Use cramfs as rootfilesystem
  • Recipe to make cramfs from existing ext2 filesystem
Make ramdisk Host# mkdir <tempdir> Host# cd <tempdir> Host# cp /opt/montavista/pro/devkit/arm/v5t_le_uclibc/images/ramdisk.gz . Host# gzip -d ramdisk.gzloop mount Host# mkdir disk Host# mount -o loop -t ext2 ramdisk diskcopy modules (created during kernel size optimization) Host# cp /opt/montavista/pro/devkit/lsp/ti-davinci/linux-2.6.18_pro500/drivers/net/davinci_emac_driver.ko disk/homemake cramfs  Host# mkcramfs -n ramdisk disk rootfs.cramfsmake it U-boot compatible. (place U-Boot header) Host# mkimage -A arm -T ramdisk -n 'Ramdisk' -a 0x80900000 -e 0x80900000 -d rootfs.cramfs uCramfsdisk
This gave 0x164040 (~1.4MB) filesystem(cramfs). I have used 0x164040 in this article. Please replace it with your filesystem size.
Burn U-boot, kernel and filesystem to nor flash
  • Burn U-boot at 0x02000000
  • Burn kernel at 0x02050000
  • Burn filesystem at 0x02300000
  • See
    • Put cramfs image to flash
    • Burn any image to nor flash
  • Get working system for boot time ction
  • Boot time at this stage probably is below 10 seconds
boot parameters at this stage: setenv bootargs mem=256M console=ttyS0,115200n8 root=/dev/ram0 ro setenv bootcmd 'cp.b 0x2300000 0x80900000 <your filesystem size in hex>; bootm 0x2050000 0x80900000'
Bootarg Parameter Changes
  • UseQuietBootargs parameter to avoid printing kernel messages.
  • The Linux kernel runs a test loop that takes 200 milliseconds checking to see how fast the CPU is running. it will print the value to the screen during boot. plug the value into bootargs to save 200 ms.
  • Switch off network initialization during boot. (do it a part of your system or application Initialization on need basis)
  • Set a memory limit on the bootargs parameter that is just enough to support your system. 16 Mb is used here. linux allocates and pre-initializes all the DDR memory in its heap, the smaller the heap, the quicker the pre-initialization process.
  • Provide the shell to run as part of bootargs
  • Boot time is not very impressive yet
boot parameters at this stage:setenv bootargs mem=16M console=ttyS0,115200n8 root=/dev/ram0 ro quiet lpj=741376 ip=off init=/bin/ash
Optimize U-bootnor emifa CS2 settings

U-boot is configured for the slowest possible nor speed by defalut. so, copy from nor is very slow (slowest possible ). board uses am29lv256mh. connected using 16 bit bus. data Sheet (of nor) mentions 120ns as the access time (read Setup Time + read strobe
Time ). read strobe time has to be atleast 40ns. emifa runs at 100 MHz emifa speed (1/6 th of pll1 600 MHz ). i. e. theoretically 12 emifa clocks are needed to fetch one short (16 bits Bus ).

Convention to calculate EMIFA cycles need EMIFA cycles = ceil of ("calculated cycles as per data sheets") + 1                = 12 + 1one additional clock (+ 1) is to account for crystal/oscillator accuracy.
EMIFA CS2 setting can be any of the following (refer DM6446 EMIF user guide for EMIFA CS2 register)EMIFA CS2 0x3FFE058D. read setup of 1 clock (register value of 0) + 12 clocks (register value of 11).EMIFA CS2 0x3FFEC20D. read setup of 7 clocks (register value of 6) + 5 clocks (register value of 4)read hold time is not needed, but minimum is 1 clock on DM6446 (register value of 0)

Changes go into aemif nor initialization part of board/DaVinci/lowlevel_init.s

ACFG2:                  .word 0x01E00010ACFG2_VAL:              .word 0x3FFE058D..LDR R0, ACFG2LDR R1, ACFG2_VALLDR R2, [R0]AND R1, R2, R1STR R1, [R0]...
Parameter Space

Reduce the U-boot parameter space from 128kb to 2kb.

Changes goinoInclude/configs/DaVinci. h

#define CFG_ENV_SIZE            0x800

Remove/relocate I2C communication code

I2C communication in U-boot is slow and can be removed or relocated outof U-boot init code. instead of running it as part of init, it can run as part of appropriate command (when u-boot is in interactive mode ). on this board, for example, MAC address is
Read over I2C in board/DaVinci. C. moved it to pai_net.c

..netboot_common(...)..if(readset_ethaddr_first) {   /* do I2C communication to get MAC address here */   readset_ethaddr_first = 0;}....
Make nor to DDR copy fasterc Optimization

U-boot memory copy code is not optimal. CPU does byte by byte copy.

Two functions where the copy happens

memmove of lib_generic/string.cdo_mem_cp of common/cmd_mem.c

By writing optimized C routine to copy data (when source and destination are aligned on double boundary) Got 0.8 MB per 100 milliseconds. i. e. 2.4 MB (kernel + filesystem) Copy takes 300 milliseconds.

    if( (((uint)dst|(uint)src)&0x7) == 0 )    { // both dst and src are aligned on double boundary       double *dDbl;       const double *sDbl;              loop = len >> 3;       dDbl = (double *)dst;       sDbl = (const double*)src;              for (i = 0; i < loop; i++)          *dDbl++ = *sDbl++;              d = (uchar*)dDbl;       s = (const uchar*)sDbl;       if (len & 4)    { *d++ = *s++; *d++ = *s++; *d++ = *s++; *d++ = *s++;}       if (len & 2)    { *d++ = *s++; *d++ = *s++; }       if (len & 1)      *d++ = *s++;    }
Optimization Using edma

By moving to edma to copy, got 1.26 MB per 100milliseconds. i. e. 2.4 MB (kernel + filesystem) Copy takes 190 milliseconds. theoretical calculations (based on am29lv256mh's 120ns access time per two bytes) Give 144 milliseconds for 2.4 MB nor to DDR copy.

Access kernel from nor once

The following bootcmd has one issue, kernel is accessed from nor twice. First time for CRC32 check and second time for Kernel relocation

setenv bootcmd 'cp.b 0x2300000 0x80900000 <your filesystem size in hex>; bootm 0x2050000 0x80900000'

Change bootcmd

setenv bootcmd 'cp.b 0x2050000 0x80700000 <your kernel size in hex>;cp.b 0x2300000 0x80900000 <your filesystem size in hex>; bootm 0x80700000 0x80900000'

Now nor is accessed only once for copy. CRC32 and Relocation happens on DDR

Kernel relocation

U-boot relocates kernel to 0x80008000 usingMemmoveFunction. This step happens after CRC32 check passes. Relocation can be avoided by making the first copy smartly. uimage (kernel image with header) has 0x40 byte header. Copy to 0x80007fc0 puts actual
Kernel at 0x80008000

Change the bootcmd

setenv bootcmd 'cp.b 0x2050000 0x80007FC0 0x107650;cp.b 0x2300000 0x80900000 0x164040; bootm 0x80007FC0 0x80900000'

This is good, but, U-boot CILSMemmoveTo copy kernel onto itself (relocate from 0x80008000 to 0x80008000 now ).

add a check in memmove code to "do nothing" if destination and source are pointing to the same address
Optimize crc32run it on DDR

In the steps above, kernel is copied from nor to DDR. Therefore, CRC32 works on DDR instead of nor. First step of CRC32 optimization is already done.

Wider input data access

Input buffer is accessed byte by byte in CRC32 code. change to 4 bytes (as an integer) eachtime. it takes 500 milliseconds to do CRC32 check of 2.4 MB (kernel + filesystem) at this stage.

Place CRC table onchip

Place 1 Kbyte CRC table (crc_table) onchip. dm6446 has 8kb of onchip at 0xa000

#ifdef CRC_TABLE_ONCHIP       unsigned int * pTOnchip=(unsigned int*)0x0000A000;       if(crc_table_onchip_first)       {          for(i=0;i<256;i++) // copy to onchip            pTOnchip[i] = crc_table[i];          crc_table_onchip_first = 0;       }       pT = pTOnchip;#endif

It takes ~ 200 milliseconds to do CRC32 check of 2.4 MB (kernel + filesystem) at this stage.

Edma input data onchip

Dm6446 has 8kb of onchip at 0xa000. in the step above, 1kb is used for CRC table. that leaves 7kb. use edma to get input data in chunks of 7kb and runs CRC32 on onchip data.

uInt curLen;uInt *pDOnchip=(unsigned int*)0x0000A400;crc = crc ^ 0xffffffffL;for(i=0;i<len;i+=curLen){  curLen = ( (len-i) < (7*1024) )? (len-i):(7*1024);  your_memcpy_using_edma(pDOnchip,buf+i,curLen);  crc = your_crc32_with_no_compliment(crc,pDOnchip,curLen); // ^0xffffffff is already done}return crc ^ 0xffffffffL;

Your_crc32_with_no_complimentIs CRC32 function without initial and final compliment (^ 0 xffffffff)

It takes160Milliseconds to do CRC32 check of 2.4 MB (compressed kernel + filesystem) and
240Milliseconds to do CRC32 check of 3.6 MB (uncompressed kernel + filesystem ).

Use uncompressed Kernel

Untill this point compressed kernel is used. Now,CopyAndCRC32Optimized and
UnzipOf kernel taking 0.66 seconds, it is a trade off between kernel size (flash size) vs. boottime.

To use uncompressed kernel (Image) With U-boot, header has to be placed on
Image.

mkimage -A arm -O linux -T kernel -C none -a 0x80008000 -e 0x80008000 -n 'Linux-2.6.18_pro500' -d Image uImagemkimage here places a 0x40 bytes header on Image to produce uImage

Now burn kernel at 0x02050000. See
Burn any image to nor flash

My booot Parameters
bootdelay=1bootcmd=cp.b 0x2050000 0x80007FC0 0x238FA0;cp.b 0x2300000 0x80900000 0x164040; bootm 0x80007FC0 0x80900000bootargs=mem=16M console=ttyS0,115200n8 root=/dev/ram0 ro quiet lpj=741376 ip=off init=/bin/ash

All this gives 1 second boot time (0.97 to be precise)

Further work

Not sure I wocould be able to spend further time on this. If I endup spending time on this, I wocould do the following

  • Kernel unzip Optimization
Kernel unzip takes a lot of time. See results abve.
Move kernel unzipping to U-boot and optimize
  • Get more detailed splitup of "kernel initialization" step which is taking 0.38 and optimize.
  • Try initramfs

See also

  1. Boot time optimization
  2. Measuring boot time
  3. Http://elinux.org/Boot_Time
  4. Kernel and initramfs initialization in 0.5 sec sample
  5. Booting LINUX Network Camera on dm365 in 3.2 (2.5) seconds

There is another one-second startup scheme abroad, which seems to require money and has a demo, based on QT, FS Platform

1 second Linux boot to QT!

By
Andrew Murray on January 13,201 1
In
Linux

At the end of last year, to demonstrate my
Company's swiftboot service, I put together a rather impressive demo. Using a Renesas

Ms7724 development board I was able to achieve a one second cold Linux boot to a QT application. Here's the demo...

Please people see a demo like this and assume there are 'smoke and mirrors 'or that we 've implemented a suspend to disk solution. this is genuinely a cold boot including uboot (2009-01), Linux kernel (2.6.31-rc7) and QT Embedded Open Source 4.6.2. we 've not
Applied any specific intellectual property but instead spent time analyzing where boot delays are coming from and simply optimising them away. the majority of the modifications we make usually fall into the category of 'removing things That Aren't required ',
'Optimising things that are required', or 'Taking a new approach to solving problems 'and are tailored very precisely to the needs of the 'product '.

If you're interested in exactly what modification I made and a little more about the approach taken-you may be interested in these
Slides which I presented
ELC-E 2010-I'm also expecting a video of this presentation to appear on
Free electrons in the near future.

You may also remember my last
Demo based on an omap3530 EVM .[2011 embedded-bits.co.uk]

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.