Step by step teach you to write PDF files
As a cross-platform file format, PDF is becoming more and more popular. In addition to the pdflib provided by Adobe, there are many third-party libraries that can be used to create, modify, and convert PDF files. The PDF document adopts a mix of binary and text. The structure of PDF files has been studied for recent projects. In the end, the leader decided to use a third-party database, which could not be used. Here, I would like to share with you some suggestions. There are many introductions on the structure of PDF files, which are not described here. Please refer.
PDF files can contain text, images, videos, animations, 3D data, etc. I will take the simplest Text Example. Then we start with "Hello, world! "Start. This example is as follows:
Because the parsing of PDF files requires cross-reference tables, the cross-reference table records the position of each OBJ relative to the start point of the file (hereinafter referred to as "Address"). We can write a function, find the location of the current file.
Long getobjlocation (File * pfile)
{
Int fseekres = fseek (pfile, 0, seek_cur );
Return ftell (pfile );
}
As the ISO standard changes, the PDF file format is also adjusted. The start of the file indicates the version format of the file.
Void writeheader (File * pfile, vector <long> & pdflocation)
{
Fwrite ("% PDF-1.7 \ n", 1, strlen ("% PDF-1.7 \ n"), pfile );
Pdflocation. push_back (getobjlocation (pfile ));
}
When parsing a PDF file, it starts from the entry point.
Void writecatalog (File * pfile, vector <long> & pdflocation)
{
Fwrite ("1 0 OBJ % entry point \ n <\ n/type/CATALOG \ n/Pages 3 0 r \ n>
\ Nendobj \ n ",
1,
Strlen ("1 0 OBJ % entry point \ n <\ n/type/CATALOG \ n/Pages 3 0 r \ n>
\ Nendobj \ n "),
Pfile );
Pdflocation. push_back (getobjlocation (pfile ));
}
PDF can have many pages. The size and content of each page can be different. In this example, there is only one page.
Void writepages (File * pfile, vector <long> & pdflocation)
{
Fwrite ("3 0 OBJ \ n <\ n/type/pages \ n/mediabox [0 0 200 200] \ n
/Count 1 \ n/kids [4 0 r] \ n> \ nendobj \ n ",
1,
Strlen ("3 0 OBJ \ n <\ n/type/pages \ n/mediabox [0 0 200 200] \ n
/Count 1 \ n/kids [4 0 r] \ n> \ nendobj \ n "), pfile );
Pdflocation. push_back (currentlocation (pfile ));
}
Create a page.
Void writepage (File * pfile, vector <long> & pdflocation)
{
Fwrite ("4 0 OBJ \ n <\ n/type/Page \ n/parent 3 0 r \ n/resources <\ n/font <
\ N/F1 5 0 r \ n> \ n/contents 6 0 r \ n> \ nendobj \ n ",
1,
Strlen ("4 0 OBJ \ n <\ n/type/Page \ n/parent 2 0 r \ n/resources <
\ N/font <\ n/F1 5 0 r \ n> \ n/contents 6 0 r \ n> \ nendobj \ n "),
Pfile );
Pdflocation. push_back (getobjlocation (pfile ));
}
Write text content.
Void writecontents (File * pfile, vector <long> & pdflocation)
{
Char txtcontent [512];
Sprintf_s (txtcontent, 512, "6 0 OBJ % page content \ n <\ n/length % d \ n>
\ Nstream \ NBT \ N70 50 TD \ n/F1 12 TF \ n (Hello, world !) Net \ nendstream \ nendobj \ n ",
Strlen ("\ NBT \ N70 50 TD \ n/F1 12 TF \ n (Hello, world !) TJ \ net \ n "));
Fwrite (txtcontent, 1, strlen (txtcontent), pfile );
Pdflocation. push_back (getobjlocation (pfile ));
}
Define the font.
Void writefont (File * pfile, vector <long> & pdflocation)
{
Fwrite ("5 0 OBJ \ n <\ n/type/font \ n/subtype/type1 \ n/basefont
/Times-Roman \ n> \ nendobj \ n ",
1,
Strlen ("5 0 OBJ \ n <\ n/type/font \ n/subtype/type1 \ n/basefont
/Times-Roman \ n> \ nendobj \ n "),
Pfile );
Pdflocation. push_back (getobjlocation (pfile ));
}
Create a cross-reference table. It is customary to start from 0th OBJ, 0th do not exist, marked as deleted. You can reference more than one table based on the actual situation.
Void writexref (File * pfile, const vector <long> pdflocation)
{
Char * xrefindexandnum = new char [64];
Memset (xrefindexandnum, 0x0, 64 );
Sprintf_s (xrefindexandnum, 64, "xref \ N0 % d \ n", pdflocation. Size ());
Fwrite (xrefindexandnum, 1, strlen (xrefindexandnum), pfile );
Delete [] xrefindexandnum;
Xrefindexandnum = NULL;
Fwrite ("0000000000 65535 f \ n", 1, strlen ("0000000000 65535 f \ n"), pfile );
Vector <long>: const_iterator iter = pdflocation. Begin ();
For (size_t I = 0; I <pdflocation. Size ()-1; I ++)
{
Writeobjxref (pfile, pdflocation [I]);
}
}
// Reference address of a single OBJ
Void writeobjxref (File * pfile, long objref)
{
Char * temp = new char [30];
Sprintf_s (temp, 30, "% 010d 00000 n \ n", objref );
Fwrite (temp, 1, strlen (temp), pfile );
Delete [] temp;
Temp = NULL;
}
The method at the end of the file does not have much to do with the version. It specifies the number of OBJ objects, the entry point, and the address of the cross-referenced table.
Void writetraile (File * pfile, const vector <long> pdflocation)
{
Char * temparrchar = new char [256];
Sprintf_s (temparrchar, 256,
"Trailer \ n </size % d/root 1 0 r> \ nstartxref \ n % LD \ n % EOF \ n ",
Pdflocation. Size (),
Pdflocation [pdflocation. Size ()-1]);
Fwrite (temparrchar, 1, strlen (temparrchar), pfile );
Delete [] temparrchar;
Temparrchar = NULL;
}
Finally, call the above function to create a complete PDF file. You can use NotePad to open the generated PDF file and see the PDF file organization form.
Void writepdf (const char * filename)
{
File * pdffile = NULL;
Fopen_s (& pdffile, filename, "WB ");
Vector <long> pdflocation; // stores the object reference address and the address of the cross-reference table.
Writeheader (pdffile, pdflocation );
Writecatalog (pdffile, pdflocation );
Writepages (pdffile, pdflocation );
Writepage (pdffile, pdflocation );
Writefont (pdffile, pdflocation );
Writecontents (pdffile, pdflocation );
Writexref (pdffile, pdflocation );
Writetraile (pdffile, pdflocation );
Fclose (pdffile );
Pdffile = NULL;
}
At present, there are already many mature third-party libraries on the market, which can basically meet your needs. I am here to help you understand the structure of PDF files and do not encourage repeated efforts. Write in a hurry. If you have any mistakes, you are welcome to make a brick.
PS: This article uses C ++/CLI to host code. simple modifications can be used for C #. Many spaces are inserted in the Code for typographical purposes. Do not simply copy and run them.