用於拆解和組合PDF中各個對象的shell指令碼

來源:互聯網
上載者:User

標籤:blog   使用   檔案   for   ar   art   div   log   sp   

拆解指令碼

header_start=0header_len=15xref_start=$(strings -a -t d $1 | grep -e "\bxref\b" | awk ‘{print $1}‘)trailer_start=$(strings -a -t d $1 | grep -e "\btrailer\b" | awk ‘{print $1}‘)#echo $xref_start#echo $trailer_startxref_len=$(echo "$trailer_start - $xref_start" | bc)#echo $xref_lenheader_dump=$(echo "$1" | sed -re ‘s/^(.*)\.pdf/tdis\_\1\_header\.bin/g‘)dd if=$1 of=$header_dump bs=1 skip=$header_start count=$header_lenxref_dump=$(echo "$1" | sed -re ‘s/^(.*)\.pdf/tdis\_\1\_xref\.bin/g‘)dd if=$1 of=$xref_dump bs=1 skip=$xref_start count=$xref_lentrailer_dump=$(echo "$1" | sed -re ‘s/^(.*)\.pdf/tdis\_\1\_trailer\.bin/g‘)dd if=$1 of=$trailer_dump bs=1 skip=$trailer_start#cat tdis_daniel_xref.bin | awk ‘NF==3‘ | awk ‘NR!=1{printf("%d 0 obj is at offset: %d\n", NR-1, $1);}‘cat tdis_daniel_xref.bin | awk ‘NF==3‘ | awk ‘NR!=1{printf("%d %d\n", $1, NR-1);}‘ | sort > tdis_"$xref_dump"echo "$xref_start 0" >> tdis_"$xref_dump"cat tdis_tdis_daniel_xref.bin | awk ‘BEGIN{loffset=0;lobjnum=0;}{printf("%3d %3d %3d\n", loffset, $1-loffset, lobjnum);loffset=$1;lobjnum=$2;}‘ | awk ‘NR!=1‘ > tdis_metrics_"$xref_dump"if [ ! -d objects ]then    mkdir objectsficat tdis_metrics_"$xref_dump" | while read offset len objndo#echo $offset, $len, $objnobj_name=$(echo "$1_$objn" | sed -re ‘s/^(.*)\.pdf/tdis\_\1\_obj/g‘ | awk ‘{printf("objects/%s.bin", $0);}‘)#echo $obj_namedd if=$1 of=$obj_name bs=1 skip=$offset count=$lendone#grep -Ubo --binary-file=text stream tdis_daniel_obj_2.bin | sed -e ‘s/:/ /g‘ | awk ‘NR==1{printf("%d ",$1+7);}NR==2{printf("%d ", $1-10);}‘ > tdis_stream.bin#read xstart xend < tdis_stream.bin#dd if=tdis_daniel_obj_2.bin of=flated.bin bs=1 skip=$xstart count=$[ $xend - $xstart ]#cat flated.bin | zlib-flate -uncompress > deflated.bin

 

組合指令碼

target=$1dd if=$(ls -1 | grep "header.bin") of=$target bs=1 count=15obj_offset=15obj_nums=0for file in $(ls -1 objects)do    #echo $file    obj_len=$(wc objects/$file | awk ‘{print $3}‘)    dd if=objects/$file of=$target bs=1 count=$obj_len seek=$obj_offset    printf "%010d %05d n\n" $obj_offset 0 >> "tas_generated_"$1"_xref.bin"     obj_offset=$[ $obj_offset + $obj_len ]    obj_nums=$[ $obj_nums + 1 ]doneecho "xref" >> $targetprintf "0 %d\n" $obj_nums >> $targetecho "0000000000 65535 f" >> $targetcat "tas_generated_"$1"_xref.bin" >> $targetawk ‘NR<=2‘ $(ls -1 | grep "trailer.bin") >> $targetecho "startxref" >> $targetecho $obj_offset >> $targetecho "%%EOF" >> $target

 

這樣,我們就可以對解析出來的單個pdf對象進行單獨操作了。

手動找出包含graphic operators stream的對象,使用下面指令碼解壓stream

target=$(ls -1 objects | grep "_obj_"$1".bin")grep -Ubo --binary-file=text stream objects/$target | sed -e ‘s/:/ /g‘ | awk ‘NR==1{printf("%d ",$1+7);}NR==2{printf("%d ", $1-10);}‘ > tdeflate_stream.binread xstart xend < tdeflate_stream.bindd if=objects/$target of=flated.bin bs=1 skip=$xstart count=$[ $xend - $xstart ]cat flated.bin | zlib-flate -uncompress > deflated.bin

 

重新編輯deflated.bin檔案,再使用下面指令碼壓縮

printf "%d 0 obj\n" $1 > tflate_"$1".binprintf "<</Length %d/Filter/FlateDecode>>stream\n" >> tflate_"$1".bincat deflated.bin | zlib-flate -compress >> tflate_"$1".binecho "" >> tflate_"$1".binecho "endstream" >> tflate_"$1".binecho "endobj" >> tflate_"$1".bintarget=$(ls -1 objects | grep "_obj_"$1".bin") rm objects/$targetmv tflate_"$1".bin objects/$target

 

用於拆解和組合PDF中各個對象的shell指令碼

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.