Epub

字数 1110 · 2021-08-03

EPUB is an e-book file format that uses the .epub file extension.

The EPUB format is implemented as an archive file consisting of XHTML files carrying the content, along with images and other supporting files.

  • EPUB 2.0 - 2007
    • Open Publication Structure (OPS)
    • Open Packaging Format (OPF)
    • Open Container Format (OCF)
  • EPUB 3.0 - 2011
    • MathML
  • EPUB 3.2 - 2019
    • WOFF 2
  • OPS - contains the formatting of its content.
  • OPF - describes the structure of the .epub file in XML.
  • OCF - collects all files as a ZIP archive.

EPUB internally uses XHTML or DTBook (an XML standard provided by the DAISY Consortium) to represent the text and structure of the content document, and a subset of CSS to provide layout and formatting. XML is used to create the document manifest, table of contents, and EPUB metadata. Finally, the files are bundled in a zip file as a packaging format.

OPS

An EPUB file uses XHTML 1.1 to construct the content of a book.

The mimetype for XHTML documents in EPUB is application/xhtml+xml.

Styling and layout are performed using a subset of CSS 2.0, referred to as OPS Style Sheets.

custom properties: oeb-page-head, oeb-page-foot, and oeb-column-number.

Font-embedding can be accomplished using the @font-face property, as well as including the font file in the OPF’s manifest.

The mimetype for CSS documents in EPUB is text/css.

EPUB also requires that PNG, JPEG, GIF, and SVG images be supported using the mimetypes image/png, image/jpeg, image/gif, image/svg+xml.

Unicode is required.

1
2
3
4
5
6
7
8
9
10
11
12
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
  <head>
    <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8" />
    <title>Pride and Prejudice</title>
    <link rel="stylesheet" href="css/main.css" type="text/css" />
  </head>
  <body>
    ...
  </body>
</html>

OPF

The OPF specification’s purpose is to “…[define] the mechanism by which the various components of an OPS publication are tied together and provides additional structure and semantics to the electronic publication.” This is accomplished by two XML files with the extensions .opf and .ncx.

.opf

The OPF file, traditionally named content.opf, houses the EPUB book’s metadata, file manifest, and linear reading order.

This file has a root element package and four child elements: metadata, manifest, spine, and guide. Furthermore, the package node must have the unique-identifier attribute. The .opf file’s mimetype is application/oebps-package+xml.

metadata

The metadata element contains all the metadata information for a particular EPUB file. Three metadata tags are required: title, language, and identifier.

manifest

The manifest element lists all the files contained in the package. Each file is represented by an item element, and has the attributes id, href, media-type.

All XHTML (content documents), stylesheets, images or other media, embedded fonts, and the NCX file should be listed here. Only the .opf file itself, the container.xml, and the mimetype files should not be included.

spine

The spine element lists all the XHTML content documents in their linear reading order.

Also, any content document that can be reached through linking or the table of contents must be listed as well.

The toc attribute of spine must contain the id of the NCX file listed in the manifest. Each itemref element’s idref is set to the id of its respective content document.

guide

The guide element is an optional element.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
<?xml version="1.0"?>
<package version="2.0" xmlns="http://www.idpf.org/2007/opf" unique-identifier="BookId">

  <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
    <dc:title>Pride and Prejudice</dc:title>
    <dc:language>en</dc:language>
    <dc:identifier id="BookId" opf:scheme="ISBN">123456789X</dc:identifier>
    <dc:creator opf:file-as="Austen, Jane" opf:role="aut">Jane Austen</dc:creator>
  </metadata>

  <manifest>
    <item id="chapter1" href="chapter1.xhtml" media-type="application/xhtml+xml"/>
    <item id="appendix" href="appendix.xhtml" media-type="application/xhtml+xml"/>
    <item id="stylesheet" href="style.css" media-type="text/css"/>
    <item id="ch1-pic" href="ch1-pic.png" media-type="image/png"/>
    <item id="myfont" href="css/myfont.otf" media-type="application/x-font-opentype"/>
    <item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml"/>
  </manifest>

  <spine toc="ncx">
    <itemref idref="chapter1" />
    <itemref idref="appendix" />
  </spine>

  <guide>
    <reference type="loi" title="List Of Illustrations" href="appendix.xhtml#figures" />
  </guide>

</package>

.ncx

The NCX file (Navigation Control file for XML), traditionally named toc.ncx, contains the hierarchical table of contents for the EPUB file.

The specification for NCX is not a part of the EPUB specification.

The NCX file has a mimetype of application/x-dtbncx+xml.

OCF

An EPUB file is a group of files that conform to the OPS/OPF standards and are wrapped in a ZIP file.

The OCF specifies how to organize these files in the ZIP, and defines two additional files that must be included.

  • The mimetype file contains the string application/epub+zip.
  • A folder named META-INF, which contains the required file container.xml.