You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<p>The relationship between text and its storage and processing in computers is complicated, involving many terms and concepts that may be new to you. Understanding this terminology is important to understanding the guidance in the internationalization guidelines and best practices. We've created this article to explain and illustrate what you need to know:</p>
@@ -1495,10 +1495,10 @@ <h3 id="char_def">Characters and character encoding basics</h3>
1495
1495
<tr><tdcolspan="8" style="text-align:left">Some of the [=visual text units=] require more than one character. This particular phrase consists of seven characters because these emoji characters use some of the more unusual features of Unicode. <spanclass="codepoint"><bdilang="en">❤</bdi><codeclass="uname">U+2764 HEAVY BLACK HEART</code></span> is followed by <spanclass="codepoint"><imgsrc="./images/FE0F.png" alt="U+FE0F"><codeclass="uname">U+FE0F VARIATION SELECTOR-16</code></span> in order to display as an emoji variant, while the flag of Switzerland is formed using a pair of emoji flag characters. (Note: on some browsers, the "flag" might display as a grapheme cluster with the letters <kbd>CH</kbd>. These are still treated by the browser as a single grapheme cluster!)</td></tr>
@@ -3098,7 +3098,7 @@ <h3>Truncating or limiting the length of strings</h3>
3098
3098
3099
3099
<p>Consider the words <em>indivisible</em> and <em>memorable</em>. The first has 11 letters, the second only 9 letters. However, <em>indivisible</em> consumes fewer pixels in many proportional fonts (it has a number of letters ‘I’ and ‘L’, which are narrow, versus the letters ‘M’ in <em>memorable</em>):</p>
3100
3100
3101
-
<imgsrc="images/indivisible-memorable.jpg">
3101
+
<imgsrc="images/indivisible-memorable.jpg"alt="the words indivisible and memorable showing how one is longer than the other">
3102
3102
3103
3103
<p>Some scripts make the difference between visual text units and pixel size even more obvious. For example, here is the word for "Unicode" in Tamil:</p>
3104
3104
@@ -3125,7 +3125,7 @@ <h3>Truncating or limiting the length of strings</h3>
3125
3125
3126
3126
<p>Finally, notice that the Tamil word quite often takes more room with four [=visual text units=] than an English word with nine:</p>
3127
3127
3128
-
<imgsrc="./images/memorable-unicode.png" alt="comparison of யூனிகோட் and memorable">
3128
+
<imgsrc="images/memorable-unicode.png" alt="comparison of யூனிகோட் and memorable">
3129
3129
3130
3130
</aside>
3131
3131
@@ -3180,7 +3180,7 @@ <h3>Truncating or limiting the length of strings</h3>
@@ -3391,7 +3391,7 @@ <h3>Truncating or limiting the length of strings</h3>
3391
3391
3392
3392
<p>Finally, don't forget that the limit will also interact with the truncation boundary chosen (as shown in [[[#example-code-unit-trunc-bad]]]): if the truncation is done naively at the 15th byte, the resulting string might contain only a partial character. For example, the Marathi could experience this problem:</p>
3393
3393
3394
-
<pclass="bigtext" lang="ma">मी का�...</p>
3394
+
<pclass="bigtext" lang="mr">मी का�...</p>
3395
3395
3396
3396
</aside>
3397
3397
<asideclass="example" id="family-example" title="Emoji sequences as an example of grapheme clusters">
0 commit comments