EPUB
is an e-book file format that uses the .epub
file extension.
The EPUB
format is implemented as an archive file consisting of XHTML
files carrying the content, along with images and other supporting files.
- EPUB 2.0 - 2007
- Open Publication Structure (OPS)
- Open Packaging Format (OPF)
- Open Container Format (OCF)
- EPUB 3.0 - 2011
- MathML
- EPUB 3.2 - 2019
- WOFF 2
OPS
- contains the formatting of its content.OPF
- describes the structure of the.epub
file inXML
.OCF
- collects all files as aZIP
archive.
EPUB
internally uses XHTML
or DTBook
(an XML standard provided by the DAISY Consortium) to represent the text and structure of the content document, and a subset of CSS
to provide layout and formatting. XML
is used to create the document manifest, table of contents, and EPUB
metadata. Finally, the files are bundled in a zip
file as a packaging format.
OPS
An EPUB file uses XHTML
1.1 to construct the content of a book.
The mimetype for XHTML
documents in EPUB is application/xhtml+xml
.
Styling and layout are performed using a subset of CSS 2.0, referred to as OPS Style Sheets.
custom properties: oeb-page-head
, oeb-page-foot
, and oeb-column-number
.
Font-embedding can be accomplished using the @font-face
property, as well as including the font file in the OPF
’s manifest.
The mimetype for CSS
documents in EPUB
is text/css
.
EPUB
also requires that PNG
, JPEG
, GIF
, and SVG
images be supported using the mimetypes image/png
, image/jpeg
, image/gif
, image/svg+xml
.
Unicode
is required.
1
2
3
4
5
6
7
8
9
10
11
12
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8" />
<title>Pride and Prejudice</title>
<link rel="stylesheet" href="css/main.css" type="text/css" />
</head>
<body>
...
</body>
</html>
OPF
The OPF
specification’s purpose is to “…[define] the mechanism by which the various components of an OPS publication are tied together and provides additional structure and semantics to the electronic publication.” This is accomplished by two XML
files with the extensions .opf
and .ncx
.
.opf
The OPF
file, traditionally named content.opf
, houses the EPUB
book’s metadata, file manifest, and linear reading order.
This file has a root element package
and four child elements: metadata
, manifest
, spine
, and guide
. Furthermore, the package
node must have the unique-identifier
attribute. The .opf
file’s mimetype
is application/oebps-package+xml
.
metadata
The metadata
element contains all the metadata information for a particular EPUB
file. Three metadata
tags are required: title
, language
, and identifier
.
manifest
The manifest
element lists all the files contained in the package. Each file is represented by an item
element, and has the attributes id
, href
, media-type
.
All XHTML
(content documents), stylesheets, images or other media, embedded fonts, and the NCX
file should be listed here. Only the .opf
file itself, the container.xml
, and the mimetype files should not be included.
spine
The spine
element lists all the XHTML
content documents in their linear reading order.
Also, any content document that can be reached through linking or the table of contents must be listed as well.
The toc
attribute of spine
must contain the id of the NCX
file listed in the manifest
. Each itemref
element’s idref
is set to the id of its respective content document.
guide
The guide
element is an optional element.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
<?xml version="1.0"?>
<package version="2.0" xmlns="http://www.idpf.org/2007/opf" unique-identifier="BookId">
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
<dc:title>Pride and Prejudice</dc:title>
<dc:language>en</dc:language>
<dc:identifier id="BookId" opf:scheme="ISBN">123456789X</dc:identifier>
<dc:creator opf:file-as="Austen, Jane" opf:role="aut">Jane Austen</dc:creator>
</metadata>
<manifest>
<item id="chapter1" href="chapter1.xhtml" media-type="application/xhtml+xml"/>
<item id="appendix" href="appendix.xhtml" media-type="application/xhtml+xml"/>
<item id="stylesheet" href="style.css" media-type="text/css"/>
<item id="ch1-pic" href="ch1-pic.png" media-type="image/png"/>
<item id="myfont" href="css/myfont.otf" media-type="application/x-font-opentype"/>
<item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml"/>
</manifest>
<spine toc="ncx">
<itemref idref="chapter1" />
<itemref idref="appendix" />
</spine>
<guide>
<reference type="loi" title="List Of Illustrations" href="appendix.xhtml#figures" />
</guide>
</package>
.ncx
The NCX
file (Navigation Control file for XML), traditionally named toc.ncx
, contains the hierarchical table of contents for the EPUB
file.
The specification for NCX
is not a part of the EPUB
specification.
The NCX
file has a mimetype of application/x-dtbncx+xml
.
OCF
An EPUB
file is a group of files that conform to the OPS/OPF
standards and are wrapped in a ZIP
file.
The OCF
specifies how to organize these files in the ZIP
, and defines two additional files that must be included.
- The
mimetype
file contains the stringapplication/epub+zip
. - A folder named
META-INF
, which contains the required filecontainer.xml
.