Misplaced Pages

General Punctuation

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
See also: Supplemental Punctuation (Unicode block) Unicode character block
General Punctuation
RangeU+2000..U+206F
(112 code points)
PlaneBMP
ScriptsCommon (109 char.)
Inherited (2 char.)
Symbol setsPunctuation
Spaces
Format controls
Assigned111 code points
Unused1 reserved code points
6 deprecated
Unicode version history
1.0.0 (1991)67 (+67)
1.1 (1993)76 (+9)
3.0 (1999)83 (+7)
3.2 (2002)95 (+12)
4.0 (2003)97 (+2)
4.1 (2005)106 (+9)
5.1 (2008)107 (+1)
6.3 (2013)111 (+4)
Unicode documentation
Code chart ∣ Web page
Note:

General Punctuation is a Unicode block containing punctuation, spacing, and formatting characters for use with all scripts and writing systems. Included are the defined-width spaces, joining formats, directional formats, smart quotes, archaic and novel punctuation such as the interrobang, and invisible mathematical operators.

Additional punctuation characters are in the Supplemental Punctuation block and sprinkled in dozens of other Unicode blocks.

Block

General Punctuation
Official Unicode Consortium code chart (PDF)
  0 1 2 3 4 5 6 7 8 9 A B C D E F
U+200x NQ
 SP 
MQ
 SP 
EN
 SP 
EM
 SP 
 3/M 
SP
 4/M 
SP
 6/M 
SP
F
 SP 
P
 SP 
TH
 SP 
H
 SP 
ZW
 SP 
ZW
 NJ 
 ZW 
J
 LRM   RLM 
U+201x  NB 
U+202x L
 SEP 
P
 SEP 
 LRE   RLE   PDF   LRO   RLO   NNB 
SP
U+203x
U+204x
U+205x MM
  SP  
U+206x  WJ   ƒ()    ×     ,     +    LRI   RLI   FSI   PDI  I
 SS 
A
 SS 
I
 AFS 
A
 AFS 
NA
 DS 
NO
 DS 
Notes
1. As of Unicode version 16.0
2. Grey area indicates non-assigned code point
3. Unicode code points U+206A - U+206F are deprecated as of Unicode version 3.0

Several characters in this block are usually not rendered with a directly visible glyph. Ten whitespace characters U+2002 through U+200B (fixed en or 1⁄2 em, em, 1⁄3 em, 1⁄4 em, 1⁄6 em, figure and punctuation space, variable thin or 1⁄5 em and hair space, fixed zero-width space) and U+205F (math medium or 2⁄9 em space) differ by horizontal width, while U+2000 and U+2001 (en and em quad) are effectively aliases of U+2002 and U+2003, respectively; another two, U+202F and U+2060 (ill-termed word joiner) are variants of U+2009 or U+2004 and U+200B that prohibit line-breaks. Three zero-width characters U+200B through U+200D (space, non-joiner and joiner) differ in how they affect ligation and shaping of adjacent letters such as contextual forms in Arabic. Eleven invisible characters U+200E, U+200F (left-to-right and right-to-left mark), U+202A through U+202E (embeds, pops and overrides) and U+2066 through U+2069 (isolates) control the directionality of text unless higher-level markup overrides them. There are explicit line and paragraph separators at U+2028 and U+2029.

Variation selectors

Starting with Unicode 16 (2024), the block has variation sequences defined for East Asian punctuation positional variants of the curly quotation marks ‘...’ and “...”. They use U+FE00 VARIATION SELECTOR-1 (VS01) and U+FE01 VARIATION SELECTOR-2 (VS02):

Variation sequences for fullwidth quotation marks
U+ 2018 2019 201C 201D Description
base code point
base + VS01 ‘︀ ’︀ “︀ ”︀ non-fullwidth form
base + VS02 ‘︁ ’︁ “︁ ”︁ justified fullwidth form

The non-fullwidth forms are expected to be separated with a space on one side, the fullwidth forms are not:

The red registration corners mark the glyph metrics and show how the glyph aligns within the space allotted to the character. For variable-width display (left), an adjacent space is expected; for full-width CJK display (right), a space is not necessary.

In vertical text, the fullwidth forms should display somewhat differently, and even as regular CJK quotation marks 「...」 and 『...』 if the vertical orientation property is set to "Hans":

CJK behaviour of generic quotation marks in horizontal and vertical text when variation selector VS02 is appended. The 'horizontal' column at left is the 'VS2' column of the preceding table.

Emoji

This section contains Unicode emoticons or emojis. Without proper rendering support, you may see question marks, boxes, or other symbols instead of the intended characters.

The General Punctuation block contains two emoji: U+203C and U+2049.

The block has four standardized variants defined to specify emoji-style (U+FE0F VS16) or text presentation (U+FE0E VS15) for the two emoji, both of which default to a text presentation.

Emoji variation sequences
U+ 203C 2049
base code point
base+VS15 (text) ‼︎ ⁉︎
base+VS16 (emoji) ‼️ ⁉️

History

The following Unicode-related documents record the purpose and process of defining specific characters in the General Punctuation block:

Version Final code points Count UTC ID L2 ID WG2 ID Document
1.0.0 U+2000..202E, 2030..203E, 2040..2044 67 (to be determined)
L2/11-438 N4182 Edberg, Peter (2011-12-22), Emoji Variation Sequences (Revision of L2/11-429)
L2/17-086 Burge, Jeremy; et al. (2017-03-27), Add ZWJ, VS-16, Keycaps & Tags to Emoji_Component
L2/17-103 Moore, Lisa (2017-05-18), "E.1.7 Add ZWJ, VS-16, Keycaps & Tags to Emoji_Component", UTC #151 Minutes
L2/23-212R Lunde, Ken (2023-10-14), Proposal to add standardized variation sequences for four quotation marks
L2/23-238R Anderson, Deborah; Kučera, Jan; Whistler, Ken; Pournader, Roozbeh; Constable, Peter (2023-11-01), "15 Symbols (Punctuation): Quotation Marks ", Recommendations to UTC #177 November 2023 on Script Proposals
L2/23-231 Constable, Peter (2023-12-08), "Consensus 177-C36", UTC #177 Minutes, Add ... eight standardized variation sequences, based on L2/23-212R
1.1 U+203F, 2045..2046 3 (to be determined)
U+206A..206F 6 (to be determined)
UTC/1992-xxx Freytag, Asmus (1992-05-12), "C. Bidi", Unconfirmed minutes for UTC Meeting #52, May 8, 1992 at Xerox
L2/01-275 Davis, Mark (2001-07-16), New Properties (ReservedForCf, Deprecated, Discouraged)
L2/01-301 Whistler, Ken (2001-08-01), "Alternate format controls inherited from 10646", Analysis of Character Deprecation in the Unicode Standard
L2/01-326 Davis, Mark (2001-08-15), New Properties: Reserved_Cf_Code_Point & Deprecated
L2/01-295R Moore, Lisa (2001-11-06), "Motion 88-M13", Minutes from the UTC/L2 meeting #88
3.0 U+202F, 2048..2049 3 L2/97-288 N1603 Umamaheswaran, V. S. (1997-10-24), "8.18", Unconfirmed Meeting Minutes, WG 2 Meeting # 33, Heraklion, Crete, Greece, 20 June – 4 July 1997
L2/98-088 N1711 The Working Meeting on Mongolian Encoding Attended by Representatives of China and Mongolia, 1998-02-15
L2/98-104 N1734 Whistler, Ken (1998-03-20), Comments on the Mongolian Encoding Proposal, WG2 N1711
L2/98-252 (pdf, txt) N1833RM (pdf, doc) Moore, Richard (1998-05-04), Feedback on Ken Whistler's Comments on Mongolian Encoding: N 1734
L2/98-251 (pdf, html, txt) N1808 (pdf, doc) Reply to "Proposal WG2 N1734" Raised at the Seattle Meeting Regarding "Proposal WG 2 N1711", 1998-07-09
L2/98-281R (pdf, html) Aliprand, Joan (1998-07-31), "Mongolian (IV.A)", Unconfirmed Minutes – UTC #77 & NCITS Subgroup L2 # 174 JOINT MEETING, Redmond, WA -- July 29-31, 1998
N1862 Revision of N1711 - Mongolian, 1998-09-17
N1865 US Position - Mongolian (N1711, N1734 and N1808), 1998-09-18
N1918 Paterson, Bruce (1998-10-28), Text for Combined PDAM registration and consideration ballot - SC2 N 3208
L2/99-010 N1903 (pdf, html, doc) Umamaheswaran, V. S. (1998-12-30), "8.1.3", Minutes of WG 2 meeting 35, London, U.K.; 1998-09-21--25
L2/99-075.1 N1973 Irish Comments on SC 2 N 3208, 1999-01-19
L2/99-075 N1972 (pdf, html, doc) Summary of Voting on SC 2 N 3208, PDAM ballot on WD for ISO/IEC 10646-1/Amd. 29: Mongolian, 1999-02-12
N2020 Paterson, Bruce (1999-04-05), FPDAM 29 Text - Mongolian
L2/99-113 Text for FPDAM ballot of ISO/IEC 10646, Amd. 29 - Mongolian, 1999-04-06
L2/99-232 N2003 Umamaheswaran, V. S. (1999-08-03), "6.1.3 PDAM29 – Mongolian script", Minutes of WG 2 meeting 36, Fukuoka, Japan, 1999-03-09--15
L2/99-304 N2126 Paterson, Bruce (1999-10-01), Revised Text for FDAM ballot of ISO/IEC 10646-1/FDAM 29, AMENDMENT 29: Mongolian
L2/99-381 Final text for ISO/IEC 10646-1, FDAM 29 -- Mongolian, 1999-12-07
L2/00-010 N2103 Umamaheswaran, V. S. (2000-01-05), "6.4.4", Minutes of WG 2 meeting 37, Copenhagen, Denmark: 1999-09-13—16
L2/07-209 Whistler, Ken (2007-07-05), UTR 14 and U+202F NARROW NO-BREAK SPACE
L2/11-438 N4182 Edberg, Peter (2011-12-22), Emoji Variation Sequences (Revision of L2/11-429)
L2/15-187 Moore, Lisa (2015-08-11), "B.14.5", UTC #144 Minutes
L2/16-258 N4752R2 Eck, Greg (2016-09-19), Mongolian Base Forms, Positional Forms, & Variant Forms
L2/16-259 N4753 Eck, Greg; Rileke, Orlog Ou (2016-09-20), WG2 #65 Mongolian Discussion Points
L2/16-266 N4763 Anderson, Deborah; Whistler, Ken; McGowan, Rick; Pournader, Roozbeh; Glass, Andrew; Iancu, Laurențiu; Moore, Lisa (2016-09-26), "1. Mongolian", Comments on Mongolian, Small Khitan, and other WG2 #65 documents
L2/16-297 N4769 Anderson, Deborah (2016-10-27), Mongolian ad hoc report
U+204A 1 L2/98-214 N1747 Everson, Michael (1998-05-25), Contraction characters for the UCS
L2/98-281R (pdf, html) Aliprand, Joan (1998-07-31), "Characters from ISO 5426-2 (IV.C.5-6)", Unconfirmed Minutes – UTC #77 & NCITS Subgroup L2 # 174 JOINT MEETING, Redmond, WA -- July 29-31, 1998
L2/98-292R (pdf, html, Figure 1) "2.6", Comments on proposals to add characters from ISO standards developed by ISO/TC 46/SC 4, 1998-08-19
L2/98-292 N1840 "2.6", Comments on proposals to add characters from ISO standards developed by ISO/TC 46/SC 4, 1998-08-25
L2/98-301 N1847 Everson, Michael (1998-09-12), Responses to NCITS/L2 and Unicode Consortium comments on numerous proposals
L2/98-372 N1884R2 (pdf, doc) Whistler, Ken; et al. (1998-09-22), Additional Characters for the UCS
L2/98-329 N1920 Combined PDAM registration and consideration ballot on WD for ISO/IEC 10646-1/Amd. 30, AMENDMENT 30: Additional Latin and other characters, 1998-10-28
L2/99-010 N1903 (pdf, html, doc) Umamaheswaran, V. S. (1998-12-30), "8.1.5.1", Minutes of WG 2 meeting 35, London, U.K.; 1998-09-21--25
U+204B..204D 3 L2/98-215 N1748 Everson, Michael (1998-05-25), Additional signature mark characters for the UCS
L2/98-281R (pdf, html) Aliprand, Joan (1998-07-31), "Signature Marks (IV.C.7)", Unconfirmed Minutes – UTC #77 & NCITS Subgroup L2 # 174 JOINT MEETING, Redmond, WA -- July 29-31, 1998
L2/98-292R (pdf, html, Figure 1) "2.7", Comments on proposals to add characters from ISO standards developed by ISO/TC 46/SC 4, 1998-08-19
L2/98-292 N1840 "2.7", Comments on proposals to add characters from ISO standards developed by ISO/TC 46/SC 4, 1998-08-25
L2/98-301 N1847 Everson, Michael (1998-09-12), Responses to NCITS/L2 and Unicode Consortium comments on numerous proposals
L2/98-372 N1884R2 (pdf, doc) Whistler, Ken; et al. (1998-09-22), Additional Characters for the UCS
L2/98-329 N1920 Combined PDAM registration and consideration ballot on WD for ISO/IEC 10646-1/Amd. 30, AMENDMENT 30: Additional Latin and other characters, 1998-10-28
L2/99-010 N1903 (pdf, html, doc) Umamaheswaran, V. S. (1998-12-30), "8.1.5.1", Minutes of WG 2 meeting 35, London, U.K.; 1998-09-21--25
3.2 U+2047, 2051 2 L2/99-238 Consolidated document containing 6 Japanese proposals, 1999-07-15
N2092 Addition of forty eight characters, 1999-09-13
L2/99-365 Moore, Lisa (1999-11-23), Comments on JCS Proposals
L2/00-024 Shibano, Kohji (2000-01-31), JCS proposal revised
L2/99-260R Moore, Lisa (2000-02-07), "JCS Proposals", Minutes of the UTC/L2 meeting in Mission Viejo, October 26-28, 1999
L2/00-098, L2/00-098-page5 N2195 Rationale for non-Kanji characters proposed by JCS committee, 2000-03-15
L2/00-119 N2191R Whistler, Ken; Freytag, Asmus (2000-04-19), Encoding Additional Mathematical Symbols in Unicode
L2/00-234 N2203 (rtf, txt) Umamaheswaran, V. S. (2000-07-21), "8.18, 8.20", Minutes from the SC2/WG2 meeting in Beijing, 2000-03-21 -- 24
L2/00-115R2 Moore, Lisa (2000-08-08), "Motion 83-M11", Minutes Of UTC Meeting #83
L2/00-297 N2257 Sato, T. K. (2000-09-04), JIS X 0213 symbols part-1
L2/00-342 N2278 Sato, T. K.; Everson, Michael; Whistler, Ken; Freytag, Asmus (2000-09-20), Ad hoc Report on Japan feedback N2257 and N2258
L2/01-050 N2253 Umamaheswaran, V. S. (2001-01-21), "7.16 JIS X0213 Symbols", Minutes of the SC2/WG2 meeting in Athens, September 2000
U+204E..2050, 2057, 205F, 2061..2062 7 L2/00-005R2 Moore, Lisa (2000-02-14), "Motion 82-M11", Minutes of UTC #82 in San Jose
L2/00-119 N2191R Whistler, Ken; Freytag, Asmus (2000-04-19), Encoding Additional Mathematical Symbols in Unicode
L2/00-234 N2203 (rtf, txt) Umamaheswaran, V. S. (2000-07-21), "8.18", Minutes from the SC2/WG2 meeting in Beijing, 2000-03-21 -- 24
L2/00-115R2 Moore, Lisa (2000-08-08), "Motion 83-M11", Minutes Of UTC Meeting #83
U+2052, 2063 2 L2/01-142 N2336 Beeton, Barbara; Freytag, Asmus; Ion, Patrick (2001-04-02), Additional Mathematical Symbols
L2/01-156 N2356 Freytag, Asmus (2001-04-03), Additional Mathematical Characters (Draft 10)
L2/01-344 N2353 (pdf, doc) Umamaheswaran, V. S. (2001-09-09), "7.7 Mathematical Symbols", Minutes from SC2/WG2 meeting #40 -- Mountain View, April 2001
U+2060 1 L2/99-260R Moore, Lisa (2000-02-07), "Unicode in Markup Languages", Minutes of the UTC/L2 meeting in Mission Viejo, October 26-28, 1999
L2/00-005R2 Moore, Lisa (2000-02-14), "Zero Width Grapheme Break/Join", Minutes of UTC #82 in San Jose, Action Item for Arnold Winkler: As the zero width grapheme break/join proposal was withdrawn, re-open Action Item 81-12 (for Mark Davis to prepare a proposal for WG2 for the Zero Width Word Joiner.)
L2/00-258 N2235 Davis, Mark (2000-08-09), Proposal for addition of ZERO WIDTH WORD JOINER
L2/00-369 Whistler, Ken (2000-10-06), "e. (ZERO WIDTH) WORD JOINER", WG2 in Vouliagmeni (Athens)
L2/01-050 N2253 Umamaheswaran, V. S. (2001-01-21), "7.7 Proposal for addition of ZERO WIDTH WORDJOINER", Minutes of the SC2/WG2 meeting in Athens, September 2000
4.0 U+2053..2054 2 L2/02-141 N2419 Everson, Michael; et al. (2002-03-20), Uralic Phonetic Alphabet characters for the UCS
L2/02-192 Everson, Michael (2002-05-02), Everson's Reply on UPA
N2442 Everson, Michael; Kolehmainen, Erkki I.; Ruppel, Klaas; Trosterud, Trond (2002-05-21), Justification for placing the Uralic Phonetic Alphabet in the BMP
L2/02-291 Whistler, Ken (2002-05-31), WG2 report from Dublin
L2/02-292 Whistler, Ken (2002-06-03), Early look at WG2 consent docket
L2/02-166R2 Moore, Lisa (2002-08-09), "Scripts and New Characters - UPA", UTC #91 Minutes
L2/02-253 Moore, Lisa (2002-10-21), "Consensus 92-C2", UTC #92 Minutes
4.1 U+2055 1 L2/03-151R Constable, Peter; Lloyd-Williams, James; Lloyd-Williams, Sue; Chowdhury, Shamsul Islam; Ali, Asaddar; Sadique, Mohammed; Chowdhury, Matiar Rahman (2003-05-10), Revised Proposal for Encoding Syloti Nagri Script in the BMP
L2/03-136 Moore, Lisa (2003-08-18), "Scripts and New Characters - Syloti Nagri Script", UTC #95 Minutes
U+2056, 2058..2059 3 L2/03-282R N2610R Everson, Michael; Cleminson, Ralph (2003-09-04), Final proposal for encoding the Glagolitic script in the UCS
L2/03-324 N2642 Pantelia, Maria (2003-10-06), Proposal to encode additional Greek editorial and punctuation characters in the UCS
U+205A..205C 3 L2/03-157 Pantelia, Maria (2003-05-19), Additional Beta Code Characters not in Unicode (WIP)
L2/03-193R N2612-7 Pantelia, Maria (2003-06-11), Proposal to encode additional Punctuation Characters in the UCS
U+205D 1 L2/02-312R Pantelia, Maria (2002-11-07), Proposal to encode additional Greek editorial and punctuation characters in the UCS
L2/03-324 N2642 Pantelia, Maria (2003-10-06), Proposal to encode additional Greek editorial and punctuation characters in the UCS
U+205E 1 L2/03-354 N2655 Freytag, Asmus (2003-10-10), Proposal -- Symbols used in Dictionaries
L2/03-356R2 Moore, Lisa (2003-10-22), "Consensus 97-C15", UTC #97 Minutes
5.1 U+2064 1 L2/07-011R N3198R Freytag, Asmus; Beeton, Barbara; Ion, Patrick; Sargent, Murray; Carlisle, David; Pournader, Roozbeh (2007-01-15), 29 Additional Mathematical and Symbol Characters
L2/07-015 Moore, Lisa (2007-02-08), "Mathematical Characters and Symbols (C.4)", UTC #110 Minutes
L2/07-268 N3253 (pdf, doc) Umamaheswaran, V. S. (2007-07-26), "M50.16", Unconfirmed minutes of WG 2 meeting 50, Frankfurt-am-Main, Germany; 2007-04-24/27
6.3 U+2066..2069 4 L2/12-186R Lanin, Aharon; Davis, Mark; Pournader, Roozbeh (2012-07-24), A Proposal for Bidi Isolates in Unicode
L2/12-290 N4310 Lanin, Aharon; Davis, Mark; Pournader, Roozbeh (2012-07-31), Proposal for Four Characters for Bidi
L2/12-239 Moore, Lisa (2012-08-14), "Consensus 132-C12", UTC #132 Minutes
L2/13-040 Pournader, Roozbeh; Lanin, Aharon (2013-01-29), Fasttracking Arabic Letter Mark (ALM)
L2/13-125 N4447 Constable, Peter (2013-06-10), Unicode Liaison Report to WG2
  1. Proposed code points and characters names may differ from final code points and names
  2. ^ See also L2/10-458, L2/11-414, L2/11-415, and L2/11-429
  3. ^ Refer to the history section of the Miscellaneous Symbols and Pictographs block for additional emoji-related documents
  4. ^ Refer to the history section of the Miscellaneous Mathematical Symbols-B block for additional math-related documents

References

  1. "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.
  3. Lunde, Ken (2023-10-14). "L2/23-212R: Proposal to add standardized variation sequences for four quotation marks" (PDF).
  4. "UTR #51: Unicode Emoji". Unicode Consortium. 2023-09-05.
  5. "UCD: Emoji Data for UTR #51". Unicode Consortium. 2023-02-01.
  6. "UTS #51 Emoji Variation Sequences". The Unicode Consortium.
Category: