Skip to content

Commit 4fb6dbb

Browse files
committed
Address isomorphic string and ByteString guidance
Fixes #151 Adds a new subsection about byte-oriented formats. Adds guidance about ByteString and isomorphic strings. Moves the guidance about not defining legancy encoding to the encoding section.
1 parent deda87b commit 4fb6dbb

File tree

1 file changed

+25
-18
lines changed

1 file changed

+25
-18
lines changed

‎index.html

Lines changed: 25 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1671,33 +1671,38 @@ <h3>Choosing a definition of 'string'</h3>
16711671
<p class="advisement">Avoid mixing {{DOMString}} and {{USVString}} in a single document or protocol operation. Often you will choose a {{DOMString}} over a {{USVString}}, since the latter requires extra processing that does not benefit most document formats or protocols.</p>
16721672
</div>
16731673

1674+
1675+
<section id="char-string-byte-oriented">
1676+
<h4>Working with Byte-oriented Formats</h4>
1677+
1678+
<div class="xref"><span class="seealso">See also</span>
1679+
<p>[[[#char_choosing]]] for additional best practices related to [=character encodings=].</p>
1680+
<p><a href="https://www.w3.org/TR/string-meta/#protocol-strings">Strings that are part of a legacy protocol or format</a>, in <cite>Strings on the Web: Language and Direction Metadata</cite> [[STRING-META]]</p>
1681+
</div>
1682+
1683+
<p>Sometimes specifications need to deal with byte-oriented contexts. For example, the specification might be defining a binary format or working with a byte-oriented protocol.</p>
1684+
1685+
1686+
<div class="req" id="char-string-dom-usv-bytes">
1687+
<p class="advisement">Specify fields in protocols that are 'string-like' as {{DOMString}} or, rarely, {{USVString}}, unless there is some reason to interact with specific bytes values or for which the UTF-8 [=character encoding=] cannot be assumed.</p>
1688+
</div>
1689+
1690+
<p>If the field in question is meant to be treated as a string, working with (Unicode) characters will be more reliable than trying to work with byte values. The data encoded into these fields will be deserialized from the wire format into your local in-memory string representation, such as the [[DOM]], JavaScript strings, or your platform's native Unicode string type and later it will need to be serialized into the wire format using some [=character encoding form=] (usually UTF-8).</p>
1691+
16741692
<div class="req" id="char_string_byte">
1675-
<p class="advisement">Specifications SHOULD NOT define a string as a {{ByteString}} or as a sequence of bytes ('byte string'). For binary data or sequences of bytes, use {{Uint8Array}} instead.</p>
1693+
<p class="advisement">Specify {{ByteString}} only when working with protocols (such as HTTP) or formats that don't distinguish between bytes and strings. If you need to represent a sequence of bytes, use {{Uint8Array}}.</p>
16761694
<details class="links"><summary>explanations &amp; examples</summary>
16771695
<p><a href="https://www.w3.org/TR/charmod/#sec-Strings">String concepts, C011</a>, in <cite>Character Model for the World Wide Web: Fundamentals</cite>.</p>
1678-
<p><a href="https://www.w3.org/TR/string-meta/#protocol-strings">Strings that are part of a legacy protocol or format</a>, in <cite>Strings on the Web: Language and Direction Metadata</cite> [[STRING-META]]</p>
16791696
<p><a href="https://www.w3.org/TR/design-principles/#idl-string-types">IDL String Types</a> in <cite>Web Platform Design Principles</cite> [[DESIGN-PRINCIPLES]]</p>
16801697
</details>
16811698
</div>
16821699

1683-
<div class="req" id="char_string_no_legacy">
1684-
<p class="advisement">Specifications SHOULD NOT add or define support for <a>legacy character encodings</a> unless there is a specific reason to do so.</p>
1685-
<details class="links"><summary>explanations &amp; examples</summary>
1686-
<p>See also <a href="#char_choosing"></a>.</p>
1687-
</details>
1688-
</div>
1689-
1690-
<p>The type {{ByteString}} defines strings as sequences of bytes (octets). Interpretation of byte strings thus requires the specification of a <a>character encoding form</a>. UTF-8 is the preferred encoding for wire and document formats [[ENCODING]], but there is generally no reason to specify strings in terms of the underlying byte values.</p>
1700+
<p>{{ByteString}} isn’t a general-purpose string type. The type {{ByteString}} defines strings as sequences of bytes (octets). Interpretation of byte strings thus requires the specification of a [=character encoding form=]. UTF-8 is the preferred encoding for wire and document formats on the Web [[ENCODING]] or the Internet in general [[RFC3629]]. If the field is encoded in UTF-8, there is rarely a reason to interact with it as a byte sequence.</p>
16911701

1692-
<aside class="note">
1693-
<p>Specifications for document formats or protocols often deal with the specific byte values used for various fields or values or with the <a>character encoding</a> used for serializing the data. It is therefore tempting to specify a text field ("string") as a {{ByteString}} which uses the <a>UTF-8</a> <a>character encoding form</a>.</p>
1694-
1695-
<p>It is preferable, however, to specify these fields as a {{DOMString}} (or, rarely, a {{USVString}}), since the data encoded into these fields must be serialized from and deserialized into in-memory string representations, such as the [[DOM]] or JavaScript strings or your platform's native Unicode string type.</p>
1696-
</aside>
1697-
1698-
<p>See <a href="#char_choosing"></a> for additional best practices.</p>
1702+
<p>If a specification needs to interact with or process specific byte values, such as when working with a binary format, and does not or cannot rely on the later UTF-8 serialization of a {{DOMString}} or {{USVString}}, it might be necessary to specify the use of an [=isomorphic string=] [[INFRA]] for processing. The specification will then use an [=isomorphic encode=] to serialize the the string to bytes and an [=isomorphic decode=] when deserializing from the wire or storage format.</p>
16991703

17001704
</section>
1705+
</section>
17011706

17021707

17031708

@@ -1900,7 +1905,9 @@ <h3>Choosing character encodings</h3>
19001905
</ul>
19011906
</aside>
19021907

1903-
1908+
<div class="req" id="char_string_no_legacy">
1909+
<p class="advisement">Specifications SHOULD NOT add or define support for <a>legacy character encodings</a> unless there is a specific reason to do so.</p>
1910+
</div>
19041911

19051912
<div class="req" id="char_identification">
19061913
<p class="advisement">Specifications MUST either specify a unique character encoding, or provide character encoding identification mechanisms such that the encoding of text can be reliably identified.</p>

0 commit comments

Comments
 (0)