Misplaced Pages

:Special characters: Difference between revisions - Misplaced Pages

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Browse history interactivelyContent deleted Content addedVisualWikitext
Revision as of 11:34, 21 September 2001 editHannes Hirzel (talk | contribs)500 editsmNo edit summary  Latest revision as of 05:39, 30 April 2017 edit undoAnomieBOT (talk | contribs)Bots6,567,292 editsm Substing templates: {{Redr}}. See User:AnomieBOT/docs/TemplateSubster for info. 
(96 intermediate revisions by 54 users not shown)
Line 1: Line 1:
#REDIRECT ]
Many characters not in the repertoire of standard ] will be useful--even necessary--for Wiki pages, especially the international pages. This page contains my recommendations for which characters are safe to use and how to use them. There are three ways to enter a non-ASCII character into a Wiki page:



* Enter the character directly from a foreign keyboard, or by cut and paste from a "character map" type application, or by some special means provided by the operating system or text editing application. The web server should then be configured to announce which 8-bit character set is being used.

* Use an HTML named character entity reference like <code>&amp;agrave;</code>. This is the most reliable method, and is unambiguous even when the server does not announce the use of any special characer set, and even when the character does not display properly on some browsers.

* Use an HTML numeric character entity reference like <code>&amp;#161;</code>. This is not recommended, because many browsers incorrectly interpret these as references to the native character set. It is, however, the only way to enter ] values for which there is no named entity, such as the ] letters. Note that because the code points 128 to 159 are unused in both ISO-8859-1 and ], character references in that range such as <code>&amp;#131;</code> are illegal and ambiguous, though they are commonly used by many web sites.



Generally speaking, Western European languages such as Spanish, French, and German pose few problems. For specific details about other langauges, see: ] (more will be added to this list as contributors in other languages appear).



<h2>ISO-8859-1 Characters</h2>



The following ] characters are safe for use in all Wiki pages. The table below lists the code for each character in hexadecimal and decimal, shows the character itself, shows the HTML entity name, and the common name of the character.



<pre>

Hex Dec Entity Character



00A0 0160 &amp;nbsp; no-break space

00A1 0161 &amp;iexcl; inverted exclamation

00A2 0162 &amp;cent; cent sign

00A3 0163 &amp;pound; pound sign

00A4 0164 &amp;curren; intl. currency sign

00A5 0165 &amp;yen; yen sign

00A7 0167 &amp;sect; section sign

00A8 0168 &amp;uml; diaeresis (umlaut)

00A9 0169 &amp;copy; copyright sign

00AA 0170 &amp;ordf; feminine ordinal

00AB 0171 &amp;laquo; left double-angle quote

00AC 0172 &amp;not; not sign

00AE 0174 &amp;reg; registered trademark sign

00AF 0175 &amp;macr; macron

00B0 0176 &amp;deg; degree sign

00B1 0177 &amp;plusmn; plus-minus sign

00B4 0180 &amp;acute; acute accent

00B5 0181 &amp;micro; micro sign

00B6 0182 &amp;para; ] (paragraph) sign

00B7 0183 &amp;middot; middle dot (Georgian comma)

00B8 0184 &amp;cedil; cedilla

00BA 0186 &amp;ordm; masculine ordinal

00BB 0187 &amp;raquo; right double-angle quote

00BF 0191 &amp;iquest; inverted question

00C0 0192 &amp;Agrave; A grave

00C1 0193 &amp;Aacute; A acute

00C2 0194 &amp;Acirc; A circumflex

00C3 0195 &amp;Atilde; A tilde

00C4 0196 &amp;Auml; A diaeresis

00C5 0197 &amp;Aring; A ring

00C6 0198 &amp;AElig; AE ligature

00C7 0199 &amp;Ccedil; C cedilla

00C8 0200 &amp;Egrave; E grave

00C9 0201 &amp;Eacute; E acute

00CA 0202 &amp;Ecirc; E circumflex

00CB 0203 &amp;Euml; E diaeresis

00CC 0204 &amp;Igrave; I grave

00CD 0205 &amp;Iacute; I acute

00CE 0206 &amp;Icirc; I circumflex

00CF 0207 &amp;Iuml; I diaeresis

00D1 0209 &amp;Ntilde; N tilde

00D2 0210 &amp;Ograve; O grave

00D3 0211 &amp;Oacute; O acute

00D4 0212 &amp;Ocirc; O circumflex

00D5 0213 &amp;Otilde; O tilde

00D6 0214 &amp;Ouml; O diaeresis

00D8 0216 &amp;Oslash; O stroke

00D9 0217 &amp;Ugrave; U grave

00DA 0218 &amp;Uacute; U acute

00DB 0219 &amp;Ucirc; U circumflex

00DC 0220 &amp;Uuml; U diaeresis

00DF 0223 &amp;szlig; sharp s (ess-zed)

00E0 0224 &amp;agrave; a grave

00E1 0225 &amp;aacute; a acute

00E2 0226 &amp;acirc; a circumflex

00E3 0227 &amp;atilde; a tilde

00E4 0228 &amp;auml; a diaeresis

00E5 0229 &amp;aring; a ring

00E6 0230 &amp;aelig; ae ligature

00E7 0231 &amp;ccedil; c cedilla

00E8 0232 &amp;egrave; e grave

00E9 0233 &amp;eacute; e acute

00EA 0234 &amp;ecirc; e circumflex

00EB 0235 &amp;euml; e diaeresis

00EC 0236 &amp;igrave; i grave

00ED 0237 &amp;iacute; i acute

00EE 0238 &amp;icirc; i circumflex

00EF 0239 &amp;iuml; i diaeresis

00F1 0241 &amp;ntilde; n tilde

00F2 0242 &amp;ograve; o grave

00F3 0243 &amp;oacute; o acute

00F4 0244 &amp;ocirc; o circumflex

00F5 0245 &amp;otilde; o tilde

00F6 0246 &amp;ouml; o diaeresis

00F7 0247 &amp;divide; divide sign

00F8 0248 &amp;oslash; o stroke

00F9 0249 &amp;ugrave; u grave

00FA 0250 &amp;uacute; u acute

00FB 0251 &amp;ucirc; u circumflex

00FC 0252 &amp;uuml; u diaeresis

00FF 0255 ? &amp;yuml; y diaeresis

</pre>



These characters are a subset of the most common ] character set in use on the ], ] 8859-1. Misplaced Pages pages are identified by the server as containing ISO-8859-1 text. The characters above are a subset selected to improve compatibility with other machines.



For example, the ] is in common use on the Internet, is not limited to any specific language, and its native character set (which is not ISO-8859-1) contains many of the common international characters. Many Macintosh browsers will correctly translate ISO text into the native character set, as long as the characters used are available. So the table above is the subset of ISO-8859-1 characters that are also available on the native Macintosh character set. ] standard code page 1252 set is a superset of ISO-8859-1, so these characters will be readable as is on Windows machines. The most common Latin character sets other than ISO-8859-1 are MS-DOS (pre-Windows) code page 437, Macintosh Roman, and other ISO sets such as ISO-8859-2. The number of pre-Windows MS-DOS machines with web browsers is small and they are often dedicated-purpose machines that wouldn't be using Misplaced Pages anyway, so it is reasonably safe to sacrifice compatibility with them for the sake of needed foreign characters. Other ISO sets are generally intended to be read by other browsers using those same sets in the same country, and so those pages should use a language-specific set.



These characters can be entered either as HTML named character entity references such as <b>&amp;agrave;</b>, directly from foreign keyboards, or with whatever facilities are available to the Wiki author for entering these characters. For example, Wiki authors using Windows machines can enter these by holding down the Alt key while typing the 4-digit decimal code of the character on the numeric pad of the keyboard. It is important that all 4 digits (including the leading 0) be typed; typing a 3-digit code will enter characters from the obsolete code page 437. Wiki authors using Macintosh machines should take care to either use special facilities to enter these in ISO-8859-1 format rather than with the native character set, or else use HTML named character entity references. Note that some Windows users may have trouble with versions of Microsoft Internet Explorer that use "Alt-Left-Arrow" and "Alt-Right-Arrow" for page movement. These will interfere with entering codes that contain the digits 4 and 6. Use HTML named character entity references in this case.



The characters from the table above can be used directly as 8-bit characters in all Wiki pages, and are sufficient for all pages primarily in English, Spanish, French, German, and languages that require no more special characters than those (such as Catalan). Despite their general safety, at this time, it is not possible to use these characters in Wiki page titles in the English Misplaced Pages, though some of the ]s are configured to allow them.



<h3>Unsafe characters</h3>



Note especially what is missing here from the full ISO-8859-1 set: The broken bar (<code>0166=&amp;brvbar;</code>), soft hyphen (<code>0173=&amp;shy;</code>), superscript digits (<code>0178=&amp;sup2;, 0179=&amp;sup3;</code>), vulgar fractions (<code>0188=&amp;frac14;, 0189=&amp;frac12;, 0190=&amp;frac34;</code>), Old English eth and thorn (<code>0208=&amp;ETH;, 0240=&amp;eth;, 0222=&amp;THORN;, 0254=&amp;thorn;</code>), and multiply sign (<code>0215=&amp;times;</code>). These should be considered unsafe (and adequate substitutes are available for most of them).



Special care should be taken with characters that do exist in the native character set of popular machines but not in the above set. These are not safe, even though they may display correctly to you when you use them. Characters from Windows code page 1252 not in ISO-8859-1 include the euro sign (<code>&amp;euro;</code>), dagger and double dagger (<code>&amp;dagger;, &amp;Dagger;</code>), bullet (<code>&amp;bull;</code>), trade mark sign (<code>&amp;trade;</code>), typeset-style punctuation (see below), per mille sign (<code>&amp;permil;</code>), some Eastern European caron-accented letters, and the oe ligatures. Characters from the Macintosh Roman set not in ISO-8859-1 include dagger and double dagger, bullet, trade mark sign, a few math symbols such as infinity (<code>&amp;infin;</code>) and not equal (<code>&amp;ne;</code>), a few commonly-used Greek letters such as pi (<code>&amp;pi;</code>), ligatures like oe and fl, typeset-style punctuation, per mille sign, and lone accents such as the breve, onogek, and caron.



defines named character entities for some Latin characters not in ISO-8859-1 that are used by popular languages, such as OE ligature (<code>&amp;OElig;, &amp;oelig;</code>), uppercase Y with diaeresis (<code>&amp;Yuml;</code>), and some Eastern European accented characters like <code>&amp;scaron;</code>. These are also unsafe, though if they entered as HTML named character entity references, they may display on some machines.



In short, don't assume that it is safe to use a special character just because it looks correct on your machine. Use the ones from the table above, and read and understand how to use others shown below.



<h2>Possibly usable non-ISO characters</h2>



Some characters not listed as safe above may still be usable when entered as named HTML character entity references, because web browsers will recognize them and render them correctly, perhaps by switching to alternate fonts as needed. All of these should be considered less safe to use than those above, but only in the sense that they may not display properly, though in the form of HTML character entity references they are unambiguous, and preserve data integrity.



For many of these, adequate substitutes and workarounds are available, and should be used when the value of making the text available to users of older computers and software exceeds the value of good presentation to those with newer software (in the judgment of the author or editor).



<h3>Typeset-style Punctuation</h3>



Absent from the ISO-8859-1 character set, but commonly used and present in both Macintosh Roman and Windows code page 1252 character sets, are proper English quotation marks and dashes. These can be entered as character entity references, and should appear correctly on most machines running recent software. Even on ISO-based machines such as Unix/X, browsers should be able to interpret these references and make appropriate substitutes using plain ASCII straight quotes and hyphens (Mozilla does this correctly, for example). These references were not present in older versions of HTML, so may not be recognized by older software. Since using these characters maintains data integrity even on those machines that may not display them correctly, it should be considered safe to use these unless proper display on old software is critical. German "low-9" quotation marks are a similar case, but are less commonly translated by browsing software, and so are not quite as safe. The table below shows these characters next to a capital letter "O" for better visibility:



<pre>

&lsquo;O &amp;lsquo; left single quote &mdash;O &amp;mdash; em dash

&rsquo;O &amp;rsquo; right sigle quote &ndash;O &amp;ndash; en dash

&ldquo;O &amp;ldquo; left double quote &sbquo;O &amp;sbquo; single low-9 quote

&rdquo;O &amp;rdquo; right double quote &bdquo;O &amp;bdquo; double low-9 quote

</pre>



Many web sites targeted for a Windows-using audience use code page 1252 references for these characters: for example, using <code>&amp;#151;</code> for the em dash. This is not a recommended practice. To ensure future data integrity and maximum compatibility, recode these as named references such as <code>&amp;mdash;</code>.



<h2>Greek letters and math symbols</h2>



Web standards for writing about mathematics are very recent (in fact MathML 2.0 was just released in February of 2001), so many browsers made before these standards were in place try to compensate by at least allowing characters commonly used in mathematics, including most of the ]. These are necessarily entered as character entity references. Browsers often render these by switching to a "Symbol" font or something similar.



Upper- and lowercase Greek letters simply use their full names for character entities. These should, of course, only be used for occasional Greek letters in primarily-Latin text. Actual Greek-language text should be written using a Greek character set to avoid bloated files and slow response. Here are a few samples:



<pre>

&alpha; &amp;alpha; &Gamma; &amp;Gamma;

&beta; &amp;beta; &Lambda; &amp;Lambda;

&gamma; &amp;gamma; &Sigma; &amp;Sigma;

&pi; &amp;pi; &Pi; &amp;Pi;

&sigma; &amp;sigma; &Omega; &amp;Omega;

&sigmaf; &amp;sigmaf; ("final" sigma, lowercase only)

</pre>



Other common math symbols:



<pre>

&ne; &amp;ne; &prime; &amp;prime;

&le; &amp;le; &Prime; &amp;Prime;

&ge; &amp;ge; &part; &amp;part;

&equiv; &amp;equiv; &int; &amp;int;

&asymp; &amp;asymp; &sum; &amp;sum;

&infin; &amp;infin; &prod; &amp;prod;

&radic; &amp;radic;

</pre>



Many of the symbols in the Windows "Symbol" font commonly used for rendering mathematics (such as the expandable bracket parts) are not present on most other machines, and not even present in ] 3.1 or as HTML named entities (though they are planned for Unicode 3.2). These are used by products such as ] to reder equations. You should be aware that if you use these symbols, you are restricting your audience to Windows users (whether or not that's acceptable is a judgment you will have to make as an author).



<h2>Other common symbols</h2>



Some characters such as the bullet, euro currency sign, and trade mark sign are special cases. They are likely to be understood and rendered in some way by many browsers. Because they are important for international trade, many computers specifically add them to fonts at some non-standard location and render them when requested, or else render them in special ways that don't require them to be present in a font. See below for how your browser renders these:



<pre>

&bull; &amp;bull; bullet

&euro; &amp;euro; euro currency sign

&trade; &amp;trade; trade mark sign

</pre>



Other somewhat less commonly used symbols include these:



<pre>

&dagger; &amp;dagger; dagger &spades; &amp;spades; black spade suit

&Dagger; &amp;Dagger; double dagger &clubs; &amp;clubs; black club suit

&loz; &amp;loz; lozenge &hearts; &amp;hearts; black heart suit

&permil; &amp;permil; per mille sign &diams; &amp;diams; black diamond suit

&larr; &amp;larr; leftward arrow &lsaquo; &amp;lsaquo; single left-pointing angle quote

&uarr; &amp;uarr; upward arrow &rsaquo; &amp;rsaquo; single right-pointing angle quote

&rarr; &amp;rarr; rightward arrow

&darr; &amp;darr; downward arrow

</pre>



These should be considered unsafe to use except perhaps on pages intended for a specific audience likely to have very up-to-date software on popular machines.



<h2>Unicode</h2>



The ] character encoding '''UCS-4''' is the official character encoding of . Many browsers, though, are only capable of displaying a small subset of the full UCS-4 repertoire. For example, the codes <code>&amp;#1049; &amp;#1511; &amp;#1605;</code> display on your browser as '''&#1049;''', '''&#1511;''', and '''&#1605;''', which ideally look like the ] letter "Short I", the ] letter "Qof", and the ] letter "Meem", respectively. It is unlikely that your computer has all of those fonts and will display them all correctly, though it may display a subset of them. Because they are encoded according to the standard, though, they ''will'' display correctly on any system that is compliant and does have the characters available. Numeric character entity references are the only way to enter these characters into a Wiki page at present. Note that encoding them using decimal rather than hexadecimal (e.g. <code>&amp;#1049;</code> instead of <code>&amp;#x419;</code>) will increase the number of browsers on which they will work.



see also ] for character entities tables.

----

/Windows



]


{{redirect category shell|{{R move}}{{R to help}}}}

Latest revision as of 05:39, 30 April 2017

Redirect to:

This page is a redirect. The following categories are used to track and monitor this redirect:
  • From a page move: This is a redirect from a page that has been moved (renamed). This page was kept as a redirect to avoid breaking links, both internal and external, that may have been made to the old page name.
  • To a help page: This is a redirect from any page inside or outside of help namespace to a page in that namespace.
When appropriate, protection levels are automatically sensed, described and categorized.