Why EMUs?
by Rick Jelliffe
The reason for this may become clearer if I note that, using the Adobe "big point" of 72 points per inch (rather than the old 72.72), there is 1270 emu per point. Err, maybe not...
What about this then: 36000 and 91440 are divisible by 2,3,4,5,6, and multiples?
Still no idea? Well, representing numbers in computers is frought with errors every time you have to have anything that requires fractions, or with multiplication or division by numbers that are not 2^n. That even can includes multiplying by 0.5. Computer scientists spent a lot of their early time investigating various techniques to overcome these problems: in a branch of mathematics (or is it engineering?) called numerical methods.
These errors are small by themselves, but when you have, for example, long sequences of calculations such as graphics object where one segment is positioned using the result of the last segment, the accumulated error can increase. In publishing, misalignent can have a serious effect when there is some kind of multi-color printing: you can get registration errors.
One way to circumvent the problem is to move to integer (whole number) arithmetic: you find some convenient small measure that can be multiplied so that you don't need to use floating-point numbers. When you do divide, you throw away the remainder, because it is below the precision you are supporting; but because the data frequently is aligned to grid positions (1/2 inch, etc) there will be no loss of precision from data capture (what the user sees) to the internal representation. Now armed with this perspective, lets imagine a set of criteria for a typesetting system or vector graphics system:
* use a small unit to allow implementation in integer arithmetic
* this unit should allow allow exact whole divisions (no remainder) of the common measures of modern English-speaking countries' typesetting: the cm, the inch, and the point. So a half inch, 10.5 points, or a third of a CM are all exact (within the bounds of the system)
* the unit should be small enough to allow non-"English" measurements with, say, 0.01% precision (or do I mean inaccuracy?): the continental diderot or the Japanese Q system for example
If you take these kinds of criteria, and work through the numbers you get something like EMU. They are used by Ecma 376's DrawingML for 'high precision co-ordinates" in certain places. The rest of the time, people can use locale-dependent measures.
So if EMU is a reasonable technical approach, is it a reasonable measure to appear in a standard? To my mind, this falls in exactly the same bucket as SpreadsheetMLs use of numeric indexes, though there are accuracy issues as well as performance issues. I think it comes down to the purpose of the standard: when the purpose of the standard is too allow high-quality typesetting and graphics and to reflect the triggering application, I think the exact numbers such as EMU may win. However, when the purpose is to allow data interchange and human/read and writability, then using SI and locale-dependent measurements will win.
The EMU issue is also a interesting one from a standardization viewpoint: there is a kind of premise that supporting a standard (obviously the specific application-independent alternative is SVG-in-ODF in this case, but this applies to systems supporting Open XML too) involves adding functionality or adjusting superficial details (names of elements and attributes, use of property elements rather than attributes, and so on): this is, I think, the view that underlies Tim Bray's comment (from memory) "how many ways do we need to say some text is bold or italic"? However, there are other changes that go to implementation: converting to and from SVG (as it is) presumably entails foregoing give up exact import and export of data in the "high precision coordinate" system. The difference would be minimal, a rare pixel here or there, I'd expect.
Like the data indexes, I don't particuarly know why Open XML couldn't support both the common notations as well as the optimized one. Best of both worlds. But EMUs are a rational solution to a particular set of design criteria, it seems to me: and the name English Metric Units that has caused alarm seems less alarming when understood as just a descriptive name and not a reference to something external.
9 Comments
Josh Peters 2007-04-16 08:39:16 |
Why is it that so many XML dialects tend to reinvent the wheel? If SVG doesn't provide an accurate enough metric for a "high precision coordinate system" why not mix the elements of a namespace that does provide them into the output?
|
Rick Jelliffe 2007-04-16 09:56:03 |
Josh: One of the welcome things about Ecma 376 (whether it makes it to an ISO standard or not) is that it does provide a really clear and detailed list of ideas and features for ODF to consider. In the short-term, this exposes weaknesses in both Open XML (where there are features that people don't like) and ODF (where there are features that ODF doesn't handle well, or handles differently.) But in the medium term, it is a different proposition. Now I don't exactly think Open XML is Bill Gate's love letter to the anti-trust regulators, but it is certainly some kind of backflip if you compare it to Gates' comments of 10 years ago (which I see have recently been bandied about again, as if they were current.)
|
orcmid 2007-04-16 16:42:40 |
Oh, did you mean "date indexes" in the last paragraph? I suppose one move toward convergence would be to add time-point (date-time) and time-interval data types to the types handled by formulas, as is the case in ODF (although those were evidently introduced without consideration of how spreadsheet formulas and any implicit conversions with numerics would work).
|
orcmid 2007-04-16 17:07:54 |
"However, when the purpose is to allow data interchange and human/read and writability"
|
Rick Jelliffe 2007-04-16 23:30:22 |
Orcmid: Since ISO 8601 dates are lexically distinguiahable from numbers (having multiple ":" for example) I think it would be trivial to serialize and parse dates in the data format, regardless of whether Excel supports date types itself. I am not suggesting that the underlying application should change; indeed, for Open XML that would be quite cart before the horse.
|
orcmid 2007-04-18 11:06:01 |
@Rick [In what social protocol did that convention start in? I like it though.]
|
Rick Jelliffe 2007-04-21 03:07:42 |
Orcmid: I know that Patrick Durusau, the editor of ISO ODF and one of the good guys, has been going over the OpenXML spec recently with a view to seeing what the substance or nature of the differences between OpenXML and ODF are.( I'm going to blog about interoperability soon, because there is one important point that people who want to adopt ODF or allow OpenXML should be aware of, that regularly eludes mention: Patrick mentioned it to me recently, actually.)
|
Jimbo 2007-04-24 08:19:17 |
Another way to skin a cat:
|
Rick Jelliffe 2007-04-27 00:36:06 |
Jimbo: Yes, it is decision each developer makes, not a moral issue, and if the grain is fine enough, user interface systems can apply heuristics to reconstruct the original units ("this is XXX scaled points = 3.49999cm, therefore that must have been 3.5cm".) The more you get away from type an into the world of colour and registration and calculated paths, the more benefit that comes from exact numbers. So it would be odd if typesetting properties were specified in terms of EMUs or scaled points; not appropriate or relevant to end users. But it is more appopriate if a graphics program exposes its co-ordinate system in generated markup. (Whether it should also be able to generate and accept standard units as well is another issue.) |