Misplaced Pages

Zero-width space

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
(Redirected from Zero width space) Special character in text processing

The zero-width space (), abbreviated ZWSP, is a non-printing character used in computerized typesetting to indicate where the word boundaries are, without actually displaying a visible space in the rendered text. This enables text-processing systems for scripts that do not use explicit spacing to recognize where word boundaries are for the purpose of handling line breaks appropriately.

The zero-width space is Unicode character U+200B, and is located in the Unicode General Punctuation block. In HTML, it can be represented by the character entity reference ​.

Purpose

The zero-width space marks a potential line break without hyphenation. Its semantics and HTML implementation are similar to the soft hyphen, but soft hyphens display a hyphen character at the point where the line is broken.

The zero-width space can be used to mark word breaks in languages without visible space between words, such as Thai, Myanmar, Khmer, and Japanese.

In justified text, the rendering engine may add inter-character spacing, also known as letter spacing, between letters separated by a zero-width space, unlike around fixed-width spaces.

Example

To show the effect of the zero-width space in text, the following words have been separated with zero-width spaces:

Lorem​Ipsum​Dolor​Sit​Amet​Consectetur​Adipiscing​Elit​Sed​Do​Eiusmod​Tempor​Incididunt​Ut​Labore​Et​Dolore​Magna​Aliqua​Ut​Enim​Ad​Minim​Veniam​Quis​Nostrud​Exercitation​Ullamco​Laboris​Nisi​Ut​Aliquip​Ex​Ea​Commodo​Consequat​Duis​Aute​Irure​Dolor​In​Reprehenderit​In​Voluptate​Velit​Esse​Cillum​Dolore​Eu​Fugiat​Nulla​Pariatur​Excepteur​Sint​Occaecat​Cupidatat​Non​Proident​Sunt​In​Culpa​Qui​Officia​Deserunt​Mollit​Anim​Id​Est​Laborum

By contrast, the following words have not been separated:

LoremIpsumDolorSitAmetConsecteturAdipiscingElitSedDoEiusmodTemporIncididuntUtLaboreEtDoloreMagnaAliquaUtEnimAdMinimVeniamQuisNostrudExercitationUllamcoLaborisNisiUtAliquipExEaCommodoConsequatDuisAuteIrureDolorInReprehenderitInVoluptateVelitEsseCillumDoloreEuFugiatNullaPariaturExcepteurSintOccaecatCupidatatNonProidentSuntInCulpaQuiOfficiaDeseruntMollitAnimIdEstLaborum

The first text is broken into lines but only at word boundaries, and resizing the browser window will re-break the text accordingly, while the second text is not broken at all.

Usage

HTML

In HTML pages, the HTML element <wbr> functions as a zero-width space. In Internet Explorer 6, the zero-width space was not supported in some fonts.

Prohibition in domain names

ICANN rules prohibit domain names from containing non-displayed characters, including the zero-width space, and most browsers prohibit their use within domain names because they can be used to create a homograph attack, where a malicious URL is visually indistinguishable from a legitimate one.

Encoding

The zero-width space character is encoded in Unicode as U+200B ZERO WIDTH SPACE.

In HTML, it can be referenced as &ZeroWidthSpace;, &#8203; or &#x200B;. Additionally, the character entities &NegativeThickSpace;, &NegativeMediumSpace;, &NegativeThinSpace;, and &NegativeVeryThinSpace; all also refer to the zero-width space, contrary to what their names suggest.

The TeX representation is \hskip0pt; the LaTeX representation is \hspace{0pt}; and the groff representation is \:.

See also

References

Citations

  1. ^ "23.2 Layout Controls". The Unicode® Standard Version 15.0 – Core Specification (PDF). The Unicode Consortium. September 2022. p. 918. ISBN 978-1-936213-32-0.
  2. Dunae, Alex. "Better Web Typography with Spaces and Hyphens". dunae.ca. Archived from the original on December 14, 2010. Retrieved December 3, 2009.
  3. "Network.IDN.blacklist_chars". mozillaZine. Retrieved 2018-02-07.
  4. "Unicode Character 'Zero Width Space'". FileFormat.Info. Retrieved 2018-02-07.
  5. "General Punctuation – Unicode" (PDF). Retrieved 2013-07-20.
  6. Entities/ZeroWidthSpace in MathML Version 2.0
  7. "The LaTeX Companion. Chapter 3: Basic Formatting Tools" (PDF). Retrieved 2019-07-16.
  8. "groff(7) – Linux manual page". Retrieved 2014-02-08.

Sources

Unicode
Unicode
Code points
Characters
Special purpose
Lists
Processing
Algorithms
Comparison of encodings
On pairs of
code points
Usage
Related standards
Related topics
Scripts and symbols in Unicode
Common and
inherited scripts
Modern scripts
Ancient and
historic scripts
Notational scripts
Symbols, emojis
Categories: