PDF format details

Source: Internet
Author: User

PDF (Portable Document Format) is a very useful file format, its biggest feature is
Platform independent and powerful (supports text/images/ Music /Video). Let's talk about the PDF file (physical) structure today.
The PDF file structure can be divided into the following parts:
1. header:
The first line of the PDF file, in the following format:

% PDF-1.3

Indicates that the current file version is 1.3 (the current maximum version is 1.5)

2. Body:
All objects used in the PDF file, including text/image/music/Video/font/hyper-connection/Encryption Information The format is as follows:

2 0 OBJ
...
End OBJ

The ellipsis is any legal object specified in PDF (a total of 8 types)

3. Cross Reference Table:
Reference tables of all PDF objects in the following format:

Xref
0 5
0000000000 65535 F
0000000009 00000 n
0000000074 00000 n
0000000120 00000 n
0000000179 00000 n

Xref indicates the reference table content, and 0 5 indicates that the object number starts with 0,
There are 5 consecutive objects (, 4), which are represented by 5 rows respectively. The first 10 objects in each row Number Represents this
The offset address of the object relative to the file header. The last five digits are only used when the object is deleted.
And indicates the last f or N table of the object number after the object is deleted and re-generated.
Indicates whether the object is used (N indicates use, F indicates deletion or useless)

4. trailer:
The portal of the entire PDF file, in the following format:
Trailer
<
/Size 8
/Root 1 0 r
>
Startxref
553
% EOF

/Size: Total number of objects used in this PDF
/Root: the object number of the catalog object of the PDF file. This is the top-level object in the PDF file.
/Startxref: the following number indicates the start position of the Cross Reference Table.
/% EOF: file Terminator.

Actually, a PDF file is very complicated, but the above several parts are definite and can only be more or less. Next I will talk about eight types of PDF files.

1. booleam
It is represented by the true or false keyword. It can be an element of an array object or an entry of a dictionary object.

2. Numeric
Integer and real types are supported. Non-decimal numbers and exponential numbers are not supported.

Example:
1) integer 123 4567 + 111-2
Range: power 31 of positive 2-power 31 of negative 2
2) real number 12.3 0.8 + 6.3-4.01-3. +. 03
Range: Power 38 of ± 3. 403x10 + power 1. 175x10-power 38
Note: If the integer value exceeds the value range, it is converted to the real number. If the real number exceeds the value range, an error occurs.
3. String
A string consists of a series of bytes ranging from 0 to 65535. The total length of a string cannot exceed. There are two methods for string:
1) A string contained by (). The Escape Character "/" can be used in the middle "/".
Example:
(ABC) indicates ABC
(A //) indicates/
2) A hex string contained by <>. Two digits indicate one character, and less than two digits are filled with 0.
Example:
<AABB> represents two characters: AA and BB.
<AAB> represents two characters: AA and B0
4. Name
It consists of a leading/followed by a series of characters. The maximum length is 127. Different from string, name is inseparable.
And unique, which means that a name object is an atom, such as/name. N is not the name object.
One element; unique means that two identical names must represent the same object. Starting from Release 1.2, except for ASCII 0
Can all be represented by a # plus two hexadecimal numbers.
Example:
/Name indicates name
/Name # 20is indicates name is
/Name #200 indicates name 0
5. Array
A set of objects contained in [] can be any PDF object (including array). Although PDF only supports one-dimensional array
Array nesting implements an array with any dimension (but the element of an array cannot exceed 8191)
Example:
[549 3.14 false (Ralph)/somename]
6. Dictionary
Multiple groups of entries contained in "<" and ">". Each group of entries consists of key and value. The key must be a name object and
The key in a dictionary is unique; value can be a legal object of any PDF (including a dictionary object ).
Example:
</Integeritem 12
/Stringitem (a string)
/Subdictionary </Item1 0.4
/Item2 true
/Lastitem (not !)
/Verylastitem (OK)
>
>
7. Stream
The keyword stream and endstream contain a series of bytes. The content is similar to the string, but there is a difference: stream can be divided several times
Read, separate the use of different parts, string must be used as a whole at a time; string has a length limit,
Stream does not have this limit. Generally, large data is represented by stream.
Example: (omitted)
8. null
Null indicates null. If the value of a key is null, the key can be ignored.
Object is equivalent to referencing an empty object.
Example: (omitted)

Some useful things: Why do some PDF files cannot be printed?
PDF has its own encryption measures, which limit printing.
Find trailer. If the PDF file is encrypted, there will be a/encrypt name. Its value is generally N 0 r, indicating the PDF file.
The file's encryption information is recorded in the obj n 0. Find This OBJ and there is a/P name under it. Its value is a number (32 bits)
The third digit indicates whether you have the print permission :)

Due to the special file structure of PDF, it is difficult to generate PDF files. Currently, the most common SDK is provided by Adobe, but it is not convenient to use because it depends on the Adobe environment. Here I will introduce you to another method-using pdflib to generate a PDF file.

Pdflib also provides a set of sdks, but compared with Adobe sdks, pdflib is very small, but its functions are not weak at all. So Development Pdflib is a wise choice. You can get it from the following URL:
Http://www.pdflib.com/products/pdflib/download/index.html. If you do not get the serial number, the generated PDF will be added with a watermark. Other features are the same as those for the commercial version.
You can also get the pdflib-lite version from the following URL:
Http://www.pdflib.com/products/pdflib/download-source.html
Pdflib-lite usage protocol:
Http://www.pdflib.com/purchase/license-lite.html
Compared with pdflib, pdflib-lite features full source code, but features less comprehensive than pdflib. For example, pdflib supports fewer fonts than pdflib. However, if you are familiar with the PDF file itself, you can use Basic It depends on your C language skills. If you want to use commercial products, read the authorization agreement carefully!

Download After the pdflib package (about 6 MB), pdflib. dll, pdflib. Lib, pdflib. H, and pdflib. Reg are available in the pdflib folder. For us, we only need to have the first three files.
The following is a complete code for generating PDF files:

# Include <stdio. h>
# Include <stdlib. h>

# Include "pdflib. H"

Int
Main (void)
{
PDF * P;
Int font;

/* Create a New pdflib object */
If (P = pdf_new () = (PDF *) 0)
{
Printf ("couldn't create pdflib object (out of memory )! /N ");
Return (2 );
}

Pai_try (p ){
If (pdf_begin_document (P, "hello.pdf", 0, "") =-1 ){
Printf ("error: % s/n", performance_get_errmsg (p ));
Return (2 );
}

/* This line is required to avoid problems on Japan Systems */
Pai_set_parameter (P, "hypertextencoding", "host ");

Performance_set_info (P, "creator", "Hello. c ");
Performance_set_info (P, "author", "Thomas Merz ");
Pai_set_info (P, "title", "Hello, world (c )! ");

Pai_begin_page_ext (p, a4_width, a4_height ,"");

/* Change "host" encoding to "winansi" or whatever you need! */
Font = pai_load_font (P, "Helvetica-bold", 0, "host ","");

Pai_setfont (p, Font, 24 );
Pai_set_text_pos (p, 50,700 );
Performance_show (P, "Hello, world! ");
Pai_continue_text (P, "(says C )");
Pai_end_page_ext (P ,"");

Pai_end_document (P ,"");
}

Performance_catch (p ){
Printf ("pdflib exception occurred in Hello sample:/N ");
Printf ("[% d] % s: % s/n ",
Pai_get_errnum (P), pai_get_apiname (P), pai_get_errmsg (p ));
Performance_delete (P );
Return (2 );
}

Performance_delete (P );

Return 0;
}
Now, a hellower PDF file is generated, and you can use it on your own to create a complicated PDF file. It depends on your imagination. :) but pay attention to it when you try to use it, the coordinates of the PDF file are different from what we usually understand. The origin of the screen coordinates is in the upper left corner, and the origin of the PDF file is in the lower left corner.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.