Shell scripts used to disassemble and combine various objects in PDF

Source: Internet
Author: User
Tags uncompress

Disassembling script

header_start=0header_len=15xref_start=$(strings -a -t d $1 | grep -e "\bxref\b" | awk ‘{print $1}‘)trailer_start=$(strings -a -t d $1 | grep -e "\btrailer\b" | awk ‘{print $1}‘)#echo $xref_start#echo $trailer_startxref_len=$(echo "$trailer_start - $xref_start" | bc)#echo $xref_lenheader_dump=$(echo "$1" | sed -re ‘s/^(.*)\.pdf/tdis\_\1\_header\.bin/g‘)dd if=$1 of=$header_dump bs=1 skip=$header_start count=$header_lenxref_dump=$(echo "$1" | sed -re ‘s/^(.*)\.pdf/tdis\_\1\_xref\.bin/g‘)dd if=$1 of=$xref_dump bs=1 skip=$xref_start count=$xref_lentrailer_dump=$(echo "$1" | sed -re ‘s/^(.*)\.pdf/tdis\_\1\_trailer\.bin/g‘)dd if=$1 of=$trailer_dump bs=1 skip=$trailer_start#cat tdis_daniel_xref.bin | awk ‘NF==3‘ | awk ‘NR!=1{printf("%d 0 obj is at offset: %d\n", NR-1, $1);}‘cat tdis_daniel_xref.bin | awk ‘NF==3‘ | awk ‘NR!=1{printf("%d %d\n", $1, NR-1);}‘ | sort > tdis_"$xref_dump"echo "$xref_start 0" >> tdis_"$xref_dump"cat tdis_tdis_daniel_xref.bin | awk ‘BEGIN{loffset=0;lobjnum=0;}{printf("%3d %3d %3d\n", loffset, $1-loffset, lobjnum);loffset=$1;lobjnum=$2;}‘ | awk ‘NR!=1‘ > tdis_metrics_"$xref_dump"if [ ! -d objects ]then    mkdir objectsficat tdis_metrics_"$xref_dump" | while read offset len objndo#echo $offset, $len, $objnobj_name=$(echo "$1_$objn" | sed -re ‘s/^(.*)\.pdf/tdis\_\1\_obj/g‘ | awk ‘{printf("objects/%s.bin", $0);}‘)#echo $obj_namedd if=$1 of=$obj_name bs=1 skip=$offset count=$lendone#grep -Ubo --binary-file=text stream tdis_daniel_obj_2.bin | sed -e ‘s/:/ /g‘ | awk ‘NR==1{printf("%d ",$1+7);}NR==2{printf("%d ", $1-10);}‘ > tdis_stream.bin#read xstart xend < tdis_stream.bin#dd if=tdis_daniel_obj_2.bin of=flated.bin bs=1 skip=$xstart count=$[ $xend - $xstart ]#cat flated.bin | zlib-flate -uncompress > deflated.bin

 

Combined script

target=$1dd if=$(ls -1 | grep "header.bin") of=$target bs=1 count=15obj_offset=15obj_nums=0for file in $(ls -1 objects)do    #echo $file    obj_len=$(wc objects/$file | awk ‘{print $3}‘)    dd if=objects/$file of=$target bs=1 count=$obj_len seek=$obj_offset    printf "%010d %05d n\n" $obj_offset 0 >> "tas_generated_"$1"_xref.bin"     obj_offset=$[ $obj_offset + $obj_len ]    obj_nums=$[ $obj_nums + 1 ]doneecho "xref" >> $targetprintf "0 %d\n" $obj_nums >> $targetecho "0000000000 65535 f" >> $targetcat "tas_generated_"$1"_xref.bin" >> $targetawk ‘NR<=2‘ $(ls -1 | grep "trailer.bin") >> $targetecho "startxref" >> $targetecho $obj_offset >> $targetecho "%%EOF" >> $target

 

In this way, we can perform separate operations on the parsed single PDF object.

Manually find the object containing graphic operators stream and decompress stream using the following script

target=$(ls -1 objects | grep "_obj_"$1".bin")grep -Ubo --binary-file=text stream objects/$target | sed -e ‘s/:/ /g‘ | awk ‘NR==1{printf("%d ",$1+7);}NR==2{printf("%d ", $1-10);}‘ > tdeflate_stream.binread xstart xend < tdeflate_stream.bindd if=objects/$target of=flated.bin bs=1 skip=$xstart count=$[ $xend - $xstart ]cat flated.bin | zlib-flate -uncompress > deflated.bin

 

Re-edit the deflated. binfile and compress it with the following script.

printf "%d 0 obj\n" $1 > tflate_"$1".binprintf "<</Length %d/Filter/FlateDecode>>stream\n" >> tflate_"$1".bincat deflated.bin | zlib-flate -compress >> tflate_"$1".binecho "" >> tflate_"$1".binecho "endstream" >> tflate_"$1".binecho "endobj" >> tflate_"$1".bintarget=$(ls -1 objects | grep "_obj_"$1".bin") rm objects/$targetmv tflate_"$1".bin objects/$target

 

Shell scripts used to disassemble and combine various objects in PDF

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.