Disassembling script
header_start=0header_len=15xref_start=$(strings -a -t d $1 | grep -e "\bxref\b" | awk ‘{print $1}‘)trailer_start=$(strings -a -t d $1 | grep -e "\btrailer\b" | awk ‘{print $1}‘)#echo $xref_start#echo $trailer_startxref_len=$(echo "$trailer_start - $xref_start" | bc)#echo $xref_lenheader_dump=$(echo "$1" | sed -re ‘s/^(.*)\.pdf/tdis\_\1\_header\.bin/g‘)dd if=$1 of=$header_dump bs=1 skip=$header_start count=$header_lenxref_dump=$(echo "$1" | sed -re ‘s/^(.*)\.pdf/tdis\_\1\_xref\.bin/g‘)dd if=$1 of=$xref_dump bs=1 skip=$xref_start count=$xref_lentrailer_dump=$(echo "$1" | sed -re ‘s/^(.*)\.pdf/tdis\_\1\_trailer\.bin/g‘)dd if=$1 of=$trailer_dump bs=1 skip=$trailer_start#cat tdis_daniel_xref.bin | awk ‘NF==3‘ | awk ‘NR!=1{printf("%d 0 obj is at offset: %d\n", NR-1, $1);}‘cat tdis_daniel_xref.bin | awk ‘NF==3‘ | awk ‘NR!=1{printf("%d %d\n", $1, NR-1);}‘ | sort > tdis_"$xref_dump"echo "$xref_start 0" >> tdis_"$xref_dump"cat tdis_tdis_daniel_xref.bin | awk ‘BEGIN{loffset=0;lobjnum=0;}{printf("%3d %3d %3d\n", loffset, $1-loffset, lobjnum);loffset=$1;lobjnum=$2;}‘ | awk ‘NR!=1‘ > tdis_metrics_"$xref_dump"if [ ! -d objects ]then mkdir objectsficat tdis_metrics_"$xref_dump" | while read offset len objndo#echo $offset, $len, $objnobj_name=$(echo "$1_$objn" | sed -re ‘s/^(.*)\.pdf/tdis\_\1\_obj/g‘ | awk ‘{printf("objects/%s.bin", $0);}‘)#echo $obj_namedd if=$1 of=$obj_name bs=1 skip=$offset count=$lendone#grep -Ubo --binary-file=text stream tdis_daniel_obj_2.bin | sed -e ‘s/:/ /g‘ | awk ‘NR==1{printf("%d ",$1+7);}NR==2{printf("%d ", $1-10);}‘ > tdis_stream.bin#read xstart xend < tdis_stream.bin#dd if=tdis_daniel_obj_2.bin of=flated.bin bs=1 skip=$xstart count=$[ $xend - $xstart ]#cat flated.bin | zlib-flate -uncompress > deflated.bin
Combined script
target=$1dd if=$(ls -1 | grep "header.bin") of=$target bs=1 count=15obj_offset=15obj_nums=0for file in $(ls -1 objects)do #echo $file obj_len=$(wc objects/$file | awk ‘{print $3}‘) dd if=objects/$file of=$target bs=1 count=$obj_len seek=$obj_offset printf "%010d %05d n\n" $obj_offset 0 >> "tas_generated_"$1"_xref.bin" obj_offset=$[ $obj_offset + $obj_len ] obj_nums=$[ $obj_nums + 1 ]doneecho "xref" >> $targetprintf "0 %d\n" $obj_nums >> $targetecho "0000000000 65535 f" >> $targetcat "tas_generated_"$1"_xref.bin" >> $targetawk ‘NR<=2‘ $(ls -1 | grep "trailer.bin") >> $targetecho "startxref" >> $targetecho $obj_offset >> $targetecho "%%EOF" >> $target
In this way, we can perform separate operations on the parsed single PDF object.
Manually find the object containing graphic operators stream and decompress stream using the following script
target=$(ls -1 objects | grep "_obj_"$1".bin")grep -Ubo --binary-file=text stream objects/$target | sed -e ‘s/:/ /g‘ | awk ‘NR==1{printf("%d ",$1+7);}NR==2{printf("%d ", $1-10);}‘ > tdeflate_stream.binread xstart xend < tdeflate_stream.bindd if=objects/$target of=flated.bin bs=1 skip=$xstart count=$[ $xend - $xstart ]cat flated.bin | zlib-flate -uncompress > deflated.bin
Re-edit the deflated. binfile and compress it with the following script.
printf "%d 0 obj\n" $1 > tflate_"$1".binprintf "<</Length %d/Filter/FlateDecode>>stream\n" >> tflate_"$1".bincat deflated.bin | zlib-flate -compress >> tflate_"$1".binecho "" >> tflate_"$1".binecho "endstream" >> tflate_"$1".binecho "endobj" >> tflate_"$1".bintarget=$(ls -1 objects | grep "_obj_"$1".bin") rm objects/$targetmv tflate_"$1".bin objects/$target
Shell scripts used to disassemble and combine various objects in PDF