You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: index.html
+23-9Lines changed: 23 additions & 9 deletions
Original file line number
Diff line number
Diff line change
@@ -1684,25 +1684,39 @@ <h4>Working with Byte-oriented Formats</h4>
1684
1684
<li>Parsing HTTP headers</li>
1685
1685
<li>Describing the wire format of a protocol</li>
1686
1686
</ul></p>
1687
-
1688
-
1687
+
1689
1688
<divclass="req" id="char-string-dom-usv-bytes">
1690
1689
<pclass="advisement">Specify 'string-like' fields as {{DOMString}} or, rarely, {{USVString}}, unless there is some reason to interact with specific bytes values or for which the UTF-8 [=character encoding=] cannot be assumed.</p>
1691
1690
</div>
1692
1691
1693
1692
<p>If the field in question is meant to be treated as a string, working with (Unicode) characters will be more reliable than trying to work with the byte values directly. The data encoded into these fields will be deserialized from the wire format into your local in-memory string representation, such as the [[DOM]], JavaScript strings, or your platform's native Unicode string type. Later it will need to be serialized into the wire format using some [=character encoding form=] (usually—and <em>preferably</em>— UTF-8).</p>
1694
-
1695
-
<divclass="req" id="char_string_byte">
1696
-
<pclass="advisement">Specify {{ByteString}} only when working with protocols (such as HTTP) or formats that don't distinguish between bytes and strings. If you need to represent a sequence of bytes, use {{Uint8Array}}.</p>
1693
+
1694
+
<divclass="req" id="char-string-uint8">
1695
+
<pclass="advisement"> Specify {{Uint8Array}} when working with byte sequences, such as for data that does not contain text, or for byte sequences representing text for which processing is never required (such as when copying buffers).</p>
1696
+
</div>
1697
+
1698
+
<p>Some protocols or formats do not distinguish between bytes and strings. For example, HTTP [[RFC9112]] says:</p>
1699
+
1700
+
<pclass="localdef">A recipient MUST parse an HTTP message as a sequence of octets in an encoding that is a superset of US-ASCII.</p>
1701
+
1702
+
<divclass="req" id="char_string_byte">
1703
+
<pclass="advisement">Specify {{ByteString}} in the rare cases where the specification needs to work with strings which are encoded using bytes and for which the conversion to or from Unicode would be inappropriate.</p>
1697
1704
<detailsclass="links"><summary>explanations & examples</summary>
1698
-
<p><ahref="https://www.w3.org/TR/charmod/#sec-Strings">String concepts, C011</a>, in <cite>Character Model for the World Wide Web: Fundamentals</cite>.</p>
1699
1705
<p><ahref="https://www.w3.org/TR/design-principles/#idl-string-types">IDL String Types</a> in <cite>Web Platform Design Principles</cite> [[DESIGN-PRINCIPLES]]</p>
1700
1706
</details>
1701
-
</div>
1707
+
</div>
1702
1708
1703
-
<p>{{ByteString}} isn’t a general-purpose string type. The type {{ByteString}} defines strings as sequences of bytes (octets). Interpretation of byte strings thus requires the specification of a [=character encoding form=]. UTF-8 is the preferred encoding for wire and document formats on the Web [[ENCODING]] or the Internet in general [[RFC3629]]. If the field is encoded in UTF-8, there is rarely a reason to interact with it as a byte sequence.</p>
1709
+
<p>{{ByteString}} isn’t a general-purpose string type. Frequently processing of these will be done by performing an [=isomorphic decode=] of the {{ByteString}} into an [=isomorphic string=] or by performing an [=isomorphic encode=] of such a string back into bytes [[INFRA]]. (It is also possible that the specification with work with the bytes directly.)</p>
1710
+
1711
+
<p>{{ByteString}} should not be confused with the more general term [=byte string=].</p>
1712
+
1713
+
<pclass="localdef">A <dfnclass="lint-ignore">byte string</dfn> refers to a string type defined as a sequence of bytes in a specific [=character encoding form=].</p>
<p><ahref="https://www.w3.org/TR/charmod/#sec-Strings" target="_blank">String concepts</a> in [[[CHARMOD]]])</p>
1717
+
</div>
1704
1718
1705
-
<p>If a specification needs to interact with or process specific byte values, such as when working with a binary format, and does not or cannot rely on the later UTF-8 serialization of a {{DOMString}} or {{USVString}}, it might be necessary to specify the use of an [=isomorphic string=] [[INFRA]] for processing. The specification will then use an [=isomorphic encode=] to serialize the the string to bytes and an [=isomorphic decode=] when deserializing from the wire or storage format.</p>
1719
+
<p>Interpreting [=byte strings=] requires the specification of a [=character encoding form=]. UTF-8 is the preferred encoding for wire and document formats on the Web [[ENCODING]] or the Internet in general [[RFC3629]]. However, if the field is encoded in UTF-8, there is rarely a reason to interact with it as a byte sequence.</p>
0 commit comments