Skip to content

Commit b17b095

Browse files
committed
Further work on JcK's suggestions
1 parent 30e2656 commit b17b095

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

‎index.html

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1678,15 +1678,15 @@ <h4>Characters stored in byte sequences</h4>
16781678
<p><a href="https://www.w3.org/TR/string-meta/#protocol-strings">Strings that are part of a legacy protocol or format</a>, in <cite>Strings on the Web: Language and Direction Metadata</cite> [[STRING-META]]</p>
16791679
</div>
16801680

1681-
<p>Prior to the widespread adoption of Unicode, the basic definition of a string was a sequence of bytes in some (usually national or language-specific) [=coded character set=]. The general term <strong><em>byte string</em></strong> was sometimes used for this definition of a string.</p>
1681+
<p>Prior to the widespread adoption of Unicode, it was common to define a string as a <strong><em>byte string</em></strong>, in which a string was simply a sequence of byte values rather than sequences of character or [=code points=]. A familiar manifestation of byte strings is a <code lang="zxx" translate="no">char*</code> in the C programming language.</p>
16821682

1683-
<p>A familiar manifestation of byte strings is the <code>char*</code> type in the C programming language. Interpreting such byte strings requires the specification of a [=character encoding form=], because different [=character encodings=] use the same byte values for different purposes. Many [=legacy character encodings=] are stateful: processing such encodings often requires starting at the beginning of the byte buffer, so that character state is retained and the [=abstract character=] can be decoded, processed, or modified successfully.</p>
1683+
<p>Processing or interpreting a byte string depends on the [=character encoding form=]. Many [=legacy character encodings=] are stateful: processing such encodings often requires starting at the beginning of the byte buffer, so that character state is retained and the [=abstract character=] can be decoded, processed, or modified successfully. A given byte value in such an encoding might mean different things depending on the bytes adjacent to it. For example, the exact same byte value might stand alone to represent a character or, depending on the preceding bytes, be part of a multibyte sequence that represents a different character. The rules for determining how to interpret each byte or byte sequence are different for different [=legacy character encodings=].</p>
16841684

16851685
<div class="xref"><span class="seealso">See also</span>
16861686
<p><a href="https://www.w3.org/TR/charmod/#sec-Strings" target="_blank">String concepts</a> in [[[CHARMOD]]])</p>
16871687
</div>
16881688

1689-
<p>UTF-8 is the preferred encoding for wire and document formats on the Web [[ENCODING]] or the Internet in general [[RFC3629]]. When content is encoded in UTF-8, there is rarely a reason to interact with it as a byte sequence. Most Web APIs and interfaces are more concerned with the [=code point=] sequence, since that represents the characters in question, rather than the specific byte values.</p>
1689+
<p>UTF-8 is the preferred [=character encoding=] for wire and document formats on the Web [[ENCODING]] or the Internet in general [[RFC3629]]. When content is encoded in UTF-8, there is rarely a reason to interact with it as a byte sequence. Most Web APIs and interfaces are more concerned with the [=code point=] sequence, since that represents the characters in question, rather than the specific byte values.</p>
16901690

16911691
<p>Sometimes specifications do need to deal with the storage, interpretation, and manipulation of byte values. In particular, many document formats and protocols were defined around the use of 7-bit [[ASCII]] bytes, while allowing the inclusion or interchange of non-ASCII data values via the use of various character or data encoding schemes. Sometimes this is done by designating a [=character encoding form=], such as with the <code>charset</code> parameter of the <code>text</code> media types. Or it might be done by encoding byte values using some special syntax, an example of which would be [=percent encoding=].</p>
16921692

0 commit comments

Comments
 (0)