Extensible Markup Language
Documents
Each XML document has both a logical and a physical structure
- physical
    - entities
 
- logical
    - declarations
- elements
- comments
- character references
- processing instructions
 
Characters
Definition: A parsed entity contains text, a sequence of characters, which may represent markup or character data.
- entity
    - text
        - markup
            - character
 
- data
            - character
 
 
- markup
            
 
- text
        
Unicode
Entities
Applications
- RSS
- SVG
[1]   	document	   ::=   	prolog element Misc*
[2]   	Char	   ::=   	#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
[3]   	S	   ::=   	(#x20 | #x9 | #xD | #xA)+
[4]   	NameStartChar	   ::=   	: | [A-Z] | _ | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
[4a]   	NameChar	   ::=   	NameStartChar | “-“ | “.” | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
[5]   	Name	   ::=   	NameStartChar (NameChar)*
[6]   	Names	   ::=   	Name (#x20 Name)*
[7]   	Nmtoken	   ::=   	(NameChar)+
[8]   	Nmtokens	   ::=   	Nmtoken (#x20 Nmtoken)*
Unicode
- [#xC0-#xD6] [#xD8-#xF6] [#xF8-#x0FF]- Latin-1 supplement characters- #xD7-- ×
- #xF7-- ÷
 
- [#x0100 - #x2FF]- Latin Extended-A/B & IPA Extensions
- [#x0300 - #x033F]- CJK Symbols and Punctuation
- [#x370-#x37D]- [#x37F-#x3FF]- Greek and Coptic- #x37E-- ;
 
- [#x400 - #x1FFF]- Cyrillic … Arabic … Tibetan … Runic … Greek Extended
- [#x2000 - #206F]- General Punctuation- #x200C-- “
- #x200D-- ”
 
- [#x2070 - #x218F]- Superscripts and Subscripts …Currency Symbols … Number Forms
- [#x2190 - #x2BFF]- Arrows
- Mathematical Operators
- …
- Block Elements
- [2B00 - 2BFF]- Miscellaneous Symbols and Arrows
 
- [#x2C00 - #x2FEF]- [#x2C00 - #x2E7F]- Unknown
- [#x2E80 - #x2EFF]- CJK Radicals Supplement
- [#x2F00 - #x2FDF]- Kangxi Radicals
- [#x2FE0 - #x2FEF]- Unknown
- [#x2FF0 - #x2FFF]- Ideographic Description Characters
 
- [#x3001 - #xD7FF]- [#x3000 - #x303F]- CJK Symbols and Punctuation- #x3000-
 
- Hiragana
- Katakana
- Bopomofo
- …
- [4E00 — 9FFF]- CJK Unified Ideographs
- [AC00 — D7AF]- Hangul Syllables
- [D7B0 - D7FF]- Unknown
 
- [#xD800 - #xF8FF]- High Surrogates - leading bytes
- High Private Use Surrogates
- Low Surrogates - trailing bytes
- [E000 — F8FF]- Private Use Area - will not be assigned characters by the Unicode Consortium
 
- [#xF900 - #xFDCF]|[#xFDF0-#xFFFD]- [F900 — FAFF]- CJK Compatibility Ideographs
- [FB00 — FB4F]- Alphabetic Presentation Forms
- [FB50 — FDFF]- Arabic Presentation Forms-A- [FDD0 - FDEF]- Undefined
 
- …
- [FFF0 — FFFF]- Specials- FFFD-- �- Replacement Character
- 
            FFFEFFFF- Undefined
 
 
- [#x10000-#xEFFFF]- [10000 — 1007F]- Linear B Syllabary
- [10080 — 100FF]- Linear B Ideograms
- [10100 — 1013F]- Aegean Numbers
 
https://www.fileformat.info/info/unicode/char/1F746/index.htm
Everson Mono
https://fontlibrary.org/en/font/symbola
https://unifoundry.com/unifont/
https://github.com/unicode-org/last-resort-font