Percent-encoding - Misplaced Pages

This is an old revision of this page, as edited by 223.24.176.8 (talk) at 09:39, 6 November 2017 (→Percent-encoding in a URI). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Revision as of 09:39, 6 November 2017 by 223.24.176.8 (talk) (→Percent-encoding in a URI)(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff) For the urlencode in MediaWiki, see Help:Magic words For links within Misplaced Pages needing percent-encoding, see Help:URL (in the section Fixing Links with Unsupported Characters)

Percent-encoding, also known as URL encoding, is a mechanism for encoding information in a Uniform Resource Identifier (URI) under certain circumstances. Although it is known as URL encoding it is, in fact, used more generally within the main Uniform Resource Identifier (URI) set, which includes both Uniform Resource Locator (URL) and Uniform Resource Name (URN). As such, it is also used in the preparation of data of the application/x-www-form-urlencoded media type, as is often used in the submission of HTML form data in HTTP requests.

Percent-encoding in a URI

Types of URI characters

The characters allowed in a URI are either reserved or unreserved (or a percent character as part of a percent-encoding). Reserved characters are those characters that sometimes have special meaning. For example, forward slash characters are used to separate different parts of a URL (or more generally, a URI). Unreserved characters have no such meanings. Using percent-encoding, reserved characters are represented using special character sequences. The sets of reserved and unreserved characters and the circumstances under which certain reserved characters have special meaning have changed slightly with each revision of specifications that govern URIs and URI schemes.

RFC 3986 section 2.2 *Reserved Characters* (January 2005)
`!`	[[asterisk\| The application/x-www-form-urlencoded type When data that has been entered into HTML forms is submitted, the form field names and values are encoded and sent to the server in an HTTP request message using method GET or POST, or, historically, via email. The encoding used by default is based on an early version of the general URI percent-encoding rules, with a number of modifications such as newline normalization and replacing spaces with + instead of %20. The media type of data encoded this way is application/x-www-form-urlencoded, and it is currently defined (still in a very outdated manner) in the HTML and XForms specifications. In addition, the CGI specification contains rules for how web servers decode data of this type and make it available to applications. When HTML form data is sent in an HTTP GET request, it is included in the query component of the request URI using the same syntax described above. When sent in an HTTP POST request or via email, the data is placed in the body of the message, and application/x-www-form-urlencoded is included in the message's Content-Type header. See also Internationalized Resource Identifier Punycode Binary-to-text encoding for a comparison of various encoding algorithms Shellcode References User-agent support for email based HTML form submission, using a 'mailto' URL as the form action, was proposed in RFC 1867 section 5.6, during the HTML 3.2 era. Various web browsers implemented it by invoking a separate email program or using their own rudimentary SMTP capabilities. Although sometimes unreliable, it was briefly popular as a simple way to transmit form data without involving a web server or CGI scripts. Berners-Lee, T. (June 1994). "RFC 1630". IETF Tools. IETF. Retrieved 29 June 2016. External links The following specifications all discuss and define reserved characters, unreserved characters, and percent-encoding, in some form or other: RFC 3986 / STD 66 (plus errata), the current generic URI syntax specification. RFC 2396 (obsolete, plus errata) and RFC 2732 (plus errata) together comprised the previous version of the generic URI syntax specification. RFC 1738 (mostly obsolete) and RFC 1808 (obsolete), which define URLs. RFC 1630 (obsolete), the first generic URI syntax specification. W3C Guidelines on Naming and Addressing: URIs, URLs, ... W3C explanation of UTF-8 in URIs W3C HTML form content types

Categories:

Percent-encoding in a URI

Types of URI characters

The application/x-www-form-urlencoded type

See also

References

External links