Extensible Markup Language
Documents
Each XML document has both a logical and a physical structure
- physical
- entities
- logical
- declarations
- elements
- comments
- character references
- processing instructions
Characters
Definition: A parsed entity contains text, a sequence of characters, which may represent markup or character data.
- entity
- text
- markup
- character
- data
- character
- markup
- text
Unicode
Entities
Applications
- RSS
- SVG
[1] document ::= prolog element Misc*
[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
[3] S ::= (#x20 | #x9 | #xD | #xA)+
[4] NameStartChar ::= :
| [A-Z]
| _
| [a-z]
| [#xC0-#xD6]
| [#xD8-#xF6]
| [#xF8-#x2FF]
| [#x370-#x37D]
| [#x37F-#x1FFF]
| [#x200C-#x200D]
| [#x2070-#x218F]
| [#x2C00-#x2FEF]
| [#x3001-#xD7FF]
| [#xF900-#xFDCF]
| [#xFDF0-#xFFFD]
| [#x10000-#xEFFFF]
[4a] NameChar ::= NameStartChar | “-“ | “.” | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
[5] Name ::= NameStartChar (NameChar)*
[6] Names ::= Name (#x20 Name)*
[7] Nmtoken ::= (NameChar)+
[8] Nmtokens ::= Nmtoken (#x20 Nmtoken)*
Unicode
[#xC0-#xD6] [#xD8-#xF6] [#xF8-#x0FF]
- Latin-1 supplement characters#xD7
-×
#xF7
-÷
[#x0100 - #x2FF]
- Latin Extended-A/B & IPA Extensions[#x0300 - #x033F]
- CJK Symbols and Punctuation[#x370-#x37D]
[#x37F-#x3FF]
- Greek and Coptic#x37E
-;
[#x400 - #x1FFF]
- Cyrillic … Arabic … Tibetan … Runic … Greek Extended[#x2000 - #206F]
- General Punctuation#x200C
-“
#x200D
-”
[#x2070 - #x218F]
- Superscripts and Subscripts …Currency Symbols … Number Forms[#x2190 - #x2BFF]
- Arrows
- Mathematical Operators
- …
- Block Elements
[2B00 - 2BFF]
- Miscellaneous Symbols and Arrows
[#x2C00 - #x2FEF]
[#x2C00 - #x2E7F]
- Unknown[#x2E80 - #x2EFF]
- CJK Radicals Supplement[#x2F00 - #x2FDF]
- Kangxi Radicals[#x2FE0 - #x2FEF]
- Unknown[#x2FF0 - #x2FFF]
- Ideographic Description Characters
[#x3001 - #xD7FF]
[#x3000 - #x303F]
- CJK Symbols and Punctuation#x3000
-
- Hiragana
- Katakana
- Bopomofo
- …
[4E00 — 9FFF]
- CJK Unified Ideographs[AC00 — D7AF]
- Hangul Syllables[D7B0 - D7FF]
- Unknown
[#xD800 - #xF8FF]
- High Surrogates - leading bytes
- High Private Use Surrogates
- Low Surrogates - trailing bytes
[E000 — F8FF]
- Private Use Area - will not be assigned characters by the Unicode Consortium
[#xF900 - #xFDCF]|[#xFDF0-#xFFFD]
[F900 — FAFF]
- CJK Compatibility Ideographs[FB00 — FB4F]
- Alphabetic Presentation Forms[FB50 — FDFF]
- Arabic Presentation Forms-A[FDD0 - FDEF]
- Undefined
- …
[FFF0 — FFFF]
- SpecialsFFFD
-�
- Replacement Character-
FFFE
FFFF
- Undefined
[#x10000-#xEFFFF]
[10000 — 1007F]
- Linear B Syllabary[10080 — 100FF]
- Linear B Ideograms[10100 — 1013F]
- Aegean Numbers
https://www.fileformat.info/info/unicode/char/1F746/index.htm
Everson Mono
https://fontlibrary.org/en/font/symbola
https://unifoundry.com/unifont/
https://github.com/unicode-org/last-resort-font