Skip to content

Commit 57cf978

Browse files
committed
Address comments and discussion from 2025-06-26 telecon
1 parent a00d01f commit 57cf978

File tree

1 file changed

+47
-28
lines changed

1 file changed

+47
-28
lines changed

‎index.html

Lines changed: 47 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -2014,10 +2014,10 @@ <h3>Choosing character encodings</h3>
20142014
</aside>
20152015

20162016
<div class="req" id="char-use-utf8">
2017-
<p class="advisement">Specify UTF-8 for all document formats, protocols, or serialization forms unless you have a good reason not to.</p>
2017+
<p class="advisement">Use UTF-8 for all document formats, protocols, or serialization forms.</p>
20182018
</div>
20192019

2020-
<p>When specifying the serialization of text, whether it be in a file, format, or protocol, UTF-8 is the best choice for nearly all applications.</p>
2020+
<p>UTF-8 is the best choice for nearly all applications.</p>
20212021

20222022
<aside class="note">
20232023
<p>Web APIs and text processing usually specified using strings rather than trying to grappple with the raw byte sequences in a specific [=character encoding form=]. As noted in [[[#char_string]]], these strings are typically represented using UTF-16 [=code units=] ({{DOMString}}) or, less commonly, as Unicode [=code points=] ({{USVString}}). Because the conversion between these forms and UTF-8 is algorithmic, lossless, and usually invisible to users and since UTF-16 is a comparatively poor choice for serialization, UTF-8 is the preferred [=character encoding=] for storage and transmission.</p>
@@ -2029,33 +2029,8 @@ <h3>Choosing character encodings</h3>
20292029

20302030
<p>New protocols and formats, as well as existing formats deployed in new contexts, are required to use the UTF-8 character encoding. This policy applies to IETF and Web standards and is articulated in [[RFC2277]], [[RFC3629]], [[Encoding]], [[design-principles]], and many more. The only specifications that need <a>legacy character encodings</a> are those that work with older protocols or formats and even there UTF-8 is strongly recommended.</p>
20312031

2032-
<div class="req" id="char_identification">
2033-
<p class="advisement">Specifications that allow multiple [=character encoding forms=] MUST provide character encoding identification mechanisms such that the encoding of text can be reliably identified.</p>
2034-
<details class="links"><summary>explanations &amp; examples</summary>
2035-
<p><a href="https://www.w3.org/TR/charmod/#sec-Encodings">Choice and Identification of Character Encodings, C015</a>, in <cite>Character Model for the World Wide Web: Fundamentals</cite></p>
2036-
</details>
2037-
</div>
2038-
2039-
<div class="req" id="char_enc_rules">
2040-
<p class="advisement">When basing a protocol, format, or API on a protocol, format, or API that already has rules for choosing, applying, or labeling the character encoding, specifications SHOULD use the existing rules rather than change these rules.</p>
2041-
<details class="links"><summary>explanations &amp; examples</summary>
2042-
<p><a href="https://www.w3.org/TR/charmod/#sec-Encodings">Choice and Identification of Character Encodings, C017</a>, in <cite>Character Model for the World Wide Web: Fundamentals</cite></p>
2043-
</details>
2044-
</div>
2045-
2046-
<p class="issue">The above needs more work to incorporate the guidance to use UTF-8 when the protocol/format is used in a new context.</p>
2047-
2048-
<div class="req" id="char_charset">
2049-
<p class="advisement">Specifications SHOULD avoid using the terms 'character set' and 'charset' to refer to a character encoding, except when the latter is used to refer to the MIME charset parameter or its IANA-registered values. The terms [=character encoding=] or [=character encoding form=] are RECOMMENDED.</p>
2050-
<details class="links"><summary>explanations &amp; examples</summary>
2051-
<p><a href="https://www.w3.org/TR/charmod/#sec-EncodingIdent">Mandating a unique character encoding, C020</a>, in <cite>Character Model for the World Wide Web: Fundamentals</cite></p>
2052-
</details>
2053-
</div>
2054-
2055-
<p class="issue">Is the above MUSTard needed?</p>
2056-
20572032
<div class="req" id="char-use-encoding-std">
2058-
<p class="advisement">If a specification permits [=legacy character encodings=], it <del>SHOULD</del>MUST restrict the set of [=character encodings=] to those listed in the [[[Encoding]]] in the section "Names and Labels". Other encodings SHOULD NOT be used, except by private agreement.</p>
2033+
<p class="advisement">If, for historical reasons, a specification permits [=legacy character encodings=], it MUST restrict the set of [=character encodings=] to those listed in the [[[Encoding]]] in the section "Names and Labels". Other encodings SHOULD NOT be used, except by private agreement.</p>
20592034
<details class="links"><summary>explanations &amp; examples</summary>
20602035
<p><a href="https://www.w3.org/TR/charmod/#sec-EncodingIdent">Character encoding identification, C021</a>, in <cite>Character Model for the World Wide Web: Fundamentals</cite></p>
20612036
<p><a href="https://www.w3.org/TR/charmod/#sec-EncodingIdent">Character encoding identification, C022</a>, in <cite>Character Model for the World Wide Web: Fundamentals</cite></p>
@@ -2079,6 +2054,50 @@ <h3>Identifying character encodings</h3>
20792054
</ul>
20802055
</aside>
20812056

2057+
<div class="req" id="char_identification">
2058+
<p class="advisement">Specifications that allow multiple [=character encoding forms=] MUST provide a mechanism, such as a field or parameter, that clearly identifies the encoding of text.</p>
2059+
<details class="links"><summary>explanations &amp; examples</summary>
2060+
<p><a href="https://www.w3.org/TR/charmod/#sec-Encodings">Choice and Identification of Character Encodings, C015</a>, in <cite>Character Model for the World Wide Web: Fundamentals</cite></p>
2061+
</details>
2062+
</div>
2063+
2064+
<p>[=Character encodings=] cannot be reliably detected just from the byte values. If encodings other than UTF-8 are permitted, there has to be some mechanism for the [=consumer=] to determine what the encoding is.</p>
2065+
2066+
<aside class="example" title="Examples of character encoding mechanisms">
2067+
<p>Here are a few examples of ways that some common specifications indicate encoding:</p>
2068+
<table>
2069+
<tr>
2070+
<th>Format</th><th>Example</th><th>Note</th>
2071+
</tr>
2072+
<tr>
2073+
<td>XML</td>
2074+
<td><code class="xml" style="color:gray">&lt;?xml version="1.0" <strong style="color:blue">encoding="UTF-8"</strong> ?&gt;</code></td>
2075+
<td></td>
2076+
</tr>
2077+
<tr>
2078+
<td>HTML</td>
2079+
<td><code class="html" style="color:gray">&lt;html&gt;<br><strong style="color:blue">&lt;meta charset="UTF-8"&gt;</strong>...</code></td>
2080+
<td></td>
2081+
</tr>
2082+
<tr>
2083+
<td>MIME type=text/*</td>
2084+
<td><code style="color:gray">Content-Type: text/plain<strong style="color:blue">;charset=UTF-8</strong></code></td>
2085+
<td>New MIME types should not specify a <code>charset</code> parameter. They should always specify UTF-8 instead.</td>
2086+
</tr>
2087+
</table>
2088+
</aside>
2089+
2090+
2091+
<div class="req" id="char_enc_rules">
2092+
<p class="advisement">If a protocol, format, or API is based on a format that already has rules for choosing, applying, or labeling the character encoding, the specification MUST NOT define a separate mechanism for identifying the encoding.</p>
2093+
<details class="links"><summary>explanations &amp; examples</summary>
2094+
<p><a href="https://www.w3.org/TR/charmod/#sec-Encodings">Choice and Identification of Character Encodings, C017</a>, in <cite>Character Model for the World Wide Web: Fundamentals</cite></p>
2095+
</details>
2096+
</div>
2097+
2098+
<div class="req" id="char_enc_rules">
2099+
<p class="advisement">If a specification is based on a format that permits encodings other than UTF-8, the specification SHOULD restrict the encoding to UTF-8.</p>
2100+
</div>
20822101

20832102
<div class="req" id="char_heuristics">
20842103
<p class="advisement">Specifications MUST NOT propose the use of heuristics to determine the encoding of data.</p>

0 commit comments

Comments
 (0)