Extensible Markup Language
Documents
Each XML document has both a logical and a physical structure
- physical
- entities
- logical
- declarations
- elements
- comments
- character references
- processing instructions
Characters
Definition: A parsed entity contains text, a sequence of characters, which may represent markup or character data.
- entity
- text
- markup
- character
- data
- character
- markup
- text
Unicode
Entities
Applications
- RSS
- SVG
[1] document ::= prolog element Misc*
[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
[3] S ::= (#x20 | #x9 | #xD | #xA)+
[4] NameStartChar ::= : | [A-Z] | _ | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
[4a] NameChar ::= NameStartChar | “-“ | “.” | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
[5] Name ::= NameStartChar (NameChar)*
[6] Names ::= Name (#x20 Name)*
[7] Nmtoken ::= (NameChar)+
[8] Nmtokens ::= Nmtoken (#x20 Nmtoken)*
Unicode
[#xC0-#xD6] [#xD8-#xF6] [#xF8-#x0FF]- Latin-1 supplement characters#xD7-×#xF7-÷
[#x0100 - #x2FF]- Latin Extended-A/B & IPA Extensions[#x0300 - #x033F]- CJK Symbols and Punctuation[#x370-#x37D][#x37F-#x3FF]- Greek and Coptic#x37E-;
[#x400 - #x1FFF]- Cyrillic … Arabic … Tibetan … Runic … Greek Extended[#x2000 - #206F]- General Punctuation#x200C-“#x200D-”
[#x2070 - #x218F]- Superscripts and Subscripts …Currency Symbols … Number Forms[#x2190 - #x2BFF]- Arrows
- Mathematical Operators
- …
- Block Elements
[2B00 - 2BFF]- Miscellaneous Symbols and Arrows
[#x2C00 - #x2FEF][#x2C00 - #x2E7F]- Unknown[#x2E80 - #x2EFF]- CJK Radicals Supplement[#x2F00 - #x2FDF]- Kangxi Radicals[#x2FE0 - #x2FEF]- Unknown[#x2FF0 - #x2FFF]- Ideographic Description Characters
[#x3001 - #xD7FF][#x3000 - #x303F]- CJK Symbols and Punctuation#x3000-- Ideographic Space
- Hiragana
- Katakana
- Bopomofo
- …
[4E00 — 9FFF]- CJK Unified Ideographs[AC00 — D7AF]- Hangul Syllables[D7B0 - D7FF]- Unknown
[#xD800 - #xF8FF]- High Surrogates - leading bytes
- High Private Use Surrogates
- Low Surrogates - trailing bytes
[E000 — F8FF]- Private Use Area - will not be assigned characters by the Unicode Consortium
[#xF900 - #xFDCF]|[#xFDF0-#xFFFD][F900 — FAFF]- CJK Compatibility Ideographs[FB00 — FB4F]- Alphabetic Presentation Forms[FB50 — FDFF]- Arabic Presentation Forms-A[FDD0 - FDEF]- Undefined
- …
[FFF0 — FFFF]- SpecialsFFFD-�- Replacement Character-
FFFEFFFF- Undefined
[#x10000-#xEFFFF][10000 — 1007F]- Linear B Syllabary[10080 — 100FF]- Linear B Ideograms[10100 — 1013F]- Aegean Numbers
https://www.fileformat.info/info/unicode/char/1F746/index.htm
Everson Mono
https://fontlibrary.org/en/font/symbola
https://unifoundry.com/unifont/
https://github.com/unicode-org/last-resort-font