The Wayback Machine - https://web.archive.org/web/20150202075938/http://lwn.net:80/Articles/591782/
LWN.net Logo

Support for shingled magnetic recording devices

By Jake Edge
March 26, 2014
2014 LSFMM Summit

One of the plenary sessions on the first day of the Linux Storage, Filesystem, and Memory Management (LSFMM) Summit concerned Linux support for shingled magnetic recording (SMR) devices. These next-generation hard disks have a number of interesting characteristics that will be challenging to fully support. Martin Furuhjelm led a discussion among a few drive vendor representatives and the assembled kernel developers about the latest developments in SMR-land.

There are three types of SMR drives: device managed, host aware, and host managed. Device-managed drives will essentially act just like regular disk drives, though the translation layer in the drive may cause unexpected performance degradation at times (much like flash devices today). Existing drivers don't need to change for device-managed disks. The discussion concentrated mostly on host-aware drives (where the host should try to follow the requirements for shingled regions) and host-managed devices (where the requirements must be followed).

SMR drives will be made up of multiple zones, some that are "normal" and allow random reads and writes throughout the zone, and some that can only be written sequentially. For the sequential zones, there is a write pointer maintained for each zone that corresponds to where the next write must go. Depending on the mode, writing elsewhere in the zone will either be an error (in host-managed devices) or will lead to some kind of remapping of the write (for host-aware devices). That remapping may lead to latency spikes due to garbage collection at some later time.

Two new SCSI commands have been added, one to query what zones exist on the drive and another to reset the write pointer to the beginning of a particular zone. To get the best performance, an SMR-aware driver will need to only write sequentially to the sequential zones (that will likely make up most of the disk), but if it fails to do so, it will be a fatal error only on host-managed drives. For that reason, most of the kernel developers seemed to think the first SMR drives are likely to be host-aware since those will work (though perhaps poorly at times) with today's software.

The T10 technical committee (for SCSI interface standards) is currently working on finishing the standards for SMR, so it is important that Linux developers make any concerns they have with the drafts known soon. Ted Ts'o noted that the drafts are available from the T10 site (Furuhjelm recommended looking for "ZBC"). In addition, more information on SMR and Linux can be found in a writeup from last year's LSFMM.

There were some questions about the zone reporting functionality, but much of that is still up in the air at this point. Currently, all zones are expected to be the same size, though there is a belief that will change before the draft is finalized. There has also been talk of adding a filtering capability on the query, so that only zones fitting a particular category (active, full, sequential-only, etc.) would be returned.

The overall sense was that kernel developers are waiting for hardware before trying to determine how best to support SMR in Linux. No major complaints about the draft interface were heard, but until hardware hits, it will be difficult for anyone to determine where the problems lie.

[ Thanks to the Linux Foundation for travel support to attend LSFMM. ]


(Log in to post comments)

Support for shingled magnetic recording devices

Posted Mar 27, 2014 14:20 UTC (Thu) by fuhchee (subscriber, #40059) [Link]

(Can someone save us a bit of googling in explaining why this sort of storage device is worth inventing?)

Support for shingled magnetic recording devices

Posted Mar 27, 2014 14:30 UTC (Thu) by jake (editor, #205) [Link]

The short answer is storage density. There is a bit more in the article from last year's summit linked here: https://lwn.net/Articles/548116/ (scroll down a ways).

I should probably have said a bit more here, sorry!

jake

Support for shingled magnetic recording devices

Posted Mar 27, 2014 14:32 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

They have much higher storage density, up to 10Tb on one platter (possibly up to 20Tb with other tricks like helium-filled drives).

Support for shingled magnetic recording devices

Posted Mar 27, 2014 18:10 UTC (Thu) by k3ninho (subscriber, #50375) [Link]

The reason that some parts of the drive have to be written consecutively is that the magnetic domains are close enough to each other and our write head so large that there is a domino effect on the domains around the one we want to write.

I recall reading this at Anand Tech:
http://www.anandtech.com/show/7290/seagate-to-ship-5tb-hd...

K3n.

Support for shingled magnetic recording devices

Posted Mar 27, 2014 22:51 UTC (Thu) by giraffedata (subscriber, #1954) [Link]

It's 20%. You can get 25% more areal density, but after accounting for the extra data you have to store to manage it, you get 20% more data on a square millimeter, or on a device, or on a square meter of floor space. That's in an apples to apples comparison: two disk drives that differ only in that one is shingled and the other not.

That makes it hard to see how bending over backwards to make applications fit the requirements of a shingled disk (basically, sequential writing) is worth the cost.

If you could identify systems that exist today and happen to access existing disk drives in the required pattern, then maybe it would make sense to steer those systems toward shingled disks, but even then I suspect the administrative cost of splitting your storage applications into two camps would outweigh the 20% cost savings at the lower levels.

And if you could arrange to have a lot of sequential writing, unless you also have fast random reading, it would be hard to justify not using tape, for a 500% volume-per-dollar saving.

Support for shingled magnetic recording devices

Posted Mar 27, 2014 22:55 UTC (Thu) by dlang (✭ supporter ✭, #313) [Link]

shingled drives have the same speed random reading as non-shingled drives

Support for shingled magnetic recording devices

Posted Mar 27, 2014 23:05 UTC (Thu) by giraffedata (subscriber, #1954) [Link]

shingled drives have the same speed random reading as non-shingled drives

Right, so if you can find an application where you have large sequential writing, but fast random reading, that might justify shingled disks. If instead, you have random writing, you'll want a non-shingled disk, and if instead you have large sequential reading, you'll want tape.

Support for shingled magnetic recording devices

Posted Mar 28, 2014 5:12 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

Tapes are waaaay too impractical. Besides, their density is not that great - the best models are just breaking 2Tb barrier per tape.

And you'd be amazed by the number of applications where you need fast access to immutable (or slowly changing) data. Even better, it's possible to use faster hard drives (or even SSDs) as a frontend for the slow shingled disks.

Support for shingled magnetic recording devices

Posted Mar 28, 2014 23:07 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

There are many applications for which tape storage is practical. We know this because people are doing it.

Many of the applications that could tolerate shingled disk could also tolerate tape.

Density per tape cartridge isn't really the point. Cost is the point, and tape is cheaper per terabyte than shingled disk. Part of the reason for that is the storage density per tape drive is far, far greater than for disk drives. Data rate per drive is much greater too.

I don't think I would be amazed at the number of applications that are appropriate for shingled disk, but I also know that there are a lot of applications that aren't, and there are significant storage management costs in using different kinds of disk drives for different kinds of data. I suspect one would need more than a 20% differential in per-terabyte cost to justify that.

Even better, it's possible to use faster hard drives (or even SSDs) as a frontend for the slow shingled disks.

That's the bending over backwards I was talking about that I doubt is worth it for a 20% improvement. Have we ever seen people make such a disruptive transition for 20%? Would people have gone from floppy disks to CD-ROM for 20%? Or CD-ROM to DVD? Would people even have gzipped tar files for only 20%?

Support for shingled magnetic recording devices

Posted Mar 29, 2014 11:25 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link]

Well, tape is actually NOT cheaper unless you do it in a REALLY big way with tape libraries and robots. Even then it's only marginally cheaper than HDs.

And shingled disks are not that bad, it's not a completely disruptive transition. Sure, it'll require some additional engineering for the front-end write-through caches but it's not a big deal compared to tape.

So think about it - would you build a tape library with expensive robots and lots of tape or would you just prefer to buy somewhat slower hard drives?

Support for shingled magnetic recording devices

Posted Mar 29, 2014 19:15 UTC (Sat) by giraffedata (subscriber, #1954) [Link]

Well, tape is actually NOT cheaper unless you do it in a REALLY big way with tape libraries and robots.

Yes, that's what I was talking about. When I compare the economics of storage technologies, I think of large scale storage. With tape, there are thousands of cartridges and plenty of robots.

Even then it's only marginally cheaper than HDs.
The last figures I saw were 5x cheaper. That's total cost (not just e.g. purchase price of some box), and assuming a tape-friendly application. That seems perfectly believable to me, but if you know of a study showing otherwise, do tell.

So think about it - would you build a tape library with expensive robots and lots of tape or would you just prefer to buy somewhat slower hard drives?

I'm not sure what you're comparing here. Shingled disks aren't somewhat slower. Used right, they're the same speed as regular drives; used wrong, they're unusably slow. Since tape applications also work on shingled drives, the question would be, would you use something with 2 minute access time or just use slightly more expensive disk drives. Only since I'm claiming shingled drives are 4X more expensive than tape, that question is moot.

By the way, some of my data is on tape. My company backs up its general purpose filesystem to tape. It takes me 4 minutes to recover a lost file - 2 minutes to go through the interactive dialog and 2 minutes for the robots and tape drives to do their thing. Shingled disk would cut that to 2 minutes total. I can't imagine my company switching unless there is virtually no difference in the storage cost.

Support for shingled magnetic recording devices

Posted Mar 29, 2014 20:32 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link]

> The last figures I saw were 5x cheaper. That's total cost (not just e.g. purchase price of some box), and assuming a tape-friendly application. That seems perfectly believable to me, but if you know of a study showing otherwise, do tell.
No way. The last time we checked (in 2013) the price difference was about 2x. The major advantage of tape was reliability - tape cartridges themselves contain no sensitive mechanic parts, not price.

LTO6 tapes are about $40 per tape (2.5Tb) when bought in bulk. Maybe $30 if you are really big. Hard drives are around $50 per 2Tb in bulk.

And that's without considering the cost of streamers (multiple $$$$), tape robots ($$$$$) and the storage software solution (shockingly, there are no good OpenSource hierarchical storage managers).

Of course, HDDs need some kind of SAN, but they are cheap these days. AoE/iSCSI solution for 1000 drives capable of 100Gb throughput can be bought for just under $30k and doesn't need any fancy software.

Support for shingled magnetic recording devices

Posted Mar 30, 2014 10:22 UTC (Sun) by khim (guest, #9252) [Link]

Shingled disks aren't somewhat slower. Used right, they're the same speed as regular drives; used wrong, they're unusably slow.

And when used with GFS or HDFS which need random read access to it's 64 megabytechunks and kinda streamlined write access to these chunks (GFS gives you the ability to append-write files, HDFS does not even offer that today) they are fast.

Shingled disks are only on horizon today, but API which is basically custom-taylored for their limitations is more than decade old and there are thousands of companies and millions of drives which are used for such kinds of applications. End of story.

Support for shingled magnetic recording devices

Posted Mar 30, 2014 16:34 UTC (Sun) by giraffedata (subscriber, #1954) [Link]

And when used with GFS or HDFS which need random read access to it's 64 megabyte chunks and kinda streamlined write access to these chunks (GFS gives you the ability to append-write files, HDFS does not even offer that today) they are fast.

I think those are two examples of things that would require re-engineering to work with shingled disks, because neither writes in log-structured fashion today. The only reason I can think of that an existing disk drive application would write in log-structured fashion (fill the drive, or a large segment of it, from beginning to end) is to maximize write speed. But GFS and HDFS assume there is little writing. HDFS is specifically aimed at fast sequential reading of large files, which means it needs to keep files contiguous on disk, which is not possible with a log structured file system.

Support for shingled magnetic recording devices

Posted Mar 31, 2014 10:36 UTC (Mon) by dlang (✭ supporter ✭, #313) [Link]

I think that you wildly underestimate how expensive random writes are today, and how much software works to avoid them

I've seen a number of packages that have the pattern of having large chunks of data, but new data is not written directly to those large chunks, instead new data is written sequentially to an 'updates' file, and periodically some other job comes along and re-writes the large chunks to include the changes from the updates files, and then deletes the updates files.

such systems would be perfect for shingled drives, they would just get their chunk sizes and alignments adjusted to match.

Support for shingled magnetic recording devices

Posted Apr 1, 2014 11:10 UTC (Tue) by ricwheeler (subscriber, #4980) [Link]

Not sure where you got the 20% number from, but the better way to think of this is that each kind of disk technology hits a plateau at some point. Further investment in that will not bring improvements in density.

Moving to SMR is effectively moving from curve for our existing technology that is about to plateau onto a new curve. The delta between the current curve and the new one does start small, but over time will take us to a significant density improvement. The slide shown by one vendor showed that eventual difference being closer to 3-4 times the density but that was more of a hand wave I would guess.

The drive vendors shied away from specific numbers, but they all agreed that SMR was a new foundation that other technologies will build on (not something that will be replaced in time).

Reading a crystal ball is hard, but is does seem like a promising technology to invest in :)

Support for shingled magnetic recording devices

Posted Apr 3, 2014 1:56 UTC (Thu) by giraffedata (subscriber, #1954) [Link]

Not sure where you got the 20% number from,

There was a paper from Seagate, which I believe is referenced earlier in this thread, that said the technology provided a 25% improvement in areal density. I read somewhere else that there's a 5% overhead for something else - metadata or guard bands or something, bringing the effective improvement down to 20%.

If that's just a prototype figure and the technology eventually gets to 3-4 times areal density improvement, that's a different story.

Support for shingled magnetic recording devices

Posted Mar 28, 2014 17:52 UTC (Fri) by Creideiki (subscriber, #38747) [Link]

Wouldn't that application be "every log-structured file system ever"? New data and metadata is written to the end of the log, and GC-copied to another zone as necessary. Keep a pointer to the currently valid root block in the random-write zone.

Support for shingled magnetic recording devices

Posted Mar 28, 2014 23:27 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

I don't know what "GC-copied to another zone" is, but ordinary log structured filesystems are ideal applications for shingled disks. An application that works with a log structured filesystem on a traditional disk drive today could presumably use shingled disk instead for a 20% cost saving. Not counting any costs associated with having special disk drives provisioned for that application.

Support for shingled magnetic recording devices

Posted Mar 29, 2014 7:05 UTC (Sat) by Creideiki (subscriber, #38747) [Link]

That's probably my LISP bias showing, assuming everyone is intimately familiar with garbage collection strategies. A copying garbage collector partitions storage into two zones, using only one at a time, and when that one gets full copies only the live data to the other one, leaving the garbage behind to get overwritten in the next cycle.

Support for shingled magnetic recording devices

Posted Mar 29, 2014 12:00 UTC (Sat) by james (subscriber, #1325) [Link]

Reading between the lines, I suspect that these drives are primarily aimed at customers like Facebook and Google, who (obviously) have a lot of historic user data that is basically never touched, but whose users expect to be able to retrieve it in a few seconds.

These are companies that design their own servers to save money: they'll certainly be interested in minimising the cost of storing that data. They also already identify this data: sending it to special disks is not a big cost for them.

This market is probably big enough on its own to justify the investment: Facebook and Google will certainly have been consulted on these drives, and may have committed to buying a certain quantity if they meet price, performance, and reliability criteria.

Then there are systems like CCTV and personal video recorders, which may well run Linux but won't need small random write speeds (and will love the extra capacity).

The four remaining questions are:

  • are there any other applications where support for SMR disks could be easily added?
  • would a SMR-based hybrid disk (SMR, with maybe 8 GB of flash in front of it for extra performance) perform better at a lower cost per gigabyte than a conventional disk? If so, conventional hard drives above a certain capacity may die out;
  • which unscrupulous boxshifters are going to quietly put an "8 TB Hard Drive!" in PCs without warning the customer?
  • is anyone else pronouncing it "smear"? It seems appropriate.

Support for shingled magnetic recording devices

Posted Mar 27, 2014 18:09 UTC (Thu) by wahern (subscriber, #37304) [Link]

Seagate has a nice description with illuminating diagrams:

http://www.seagate.com/tech-insights/breaking-areal-densi...

Basically, write heads are larger than read heads, but neither is going to get any smaller anytime soon. To get more data onto the platter, you can squeeze tracks together. The read head can still read each track individually, but the write head has to update multiple tracks at once, and this process can cascade.

That tracks are overlapped from the perspective of the write head is why it's called "shingled", like on a roof, and not because of a herpes zoster infection--which is the first thought I had :)

Support for shingled magnetic recording devices

Posted Apr 4, 2014 10:50 UTC (Fri) by Jonno (subscriber, #49613) [Link]

> http://www.seagate.com/tech-insights/breaking-areal-densi...
While the Seagate description of SMR drives in the linked article is good, their comparison to conventional drives are disingenuous at best. No one has manufactured what they calls "Conventional" drives for some time now, writes to adjacent tracks already overlap, though not to the extent that SMR writes do (see diagram below for an illustration).

http://jon.severinsson.net/lwn/smr_track_layout.png

Support for shingled magnetic recording devices

Posted Jul 14, 2014 16:43 UTC (Mon) by ds... (guest, #97865) [Link]

It seems to me that it would be better to have a wider read head: if the drive needs to write n physical tracks at once, why not read n at once too?

Support for shingled magnetic recording devices

Posted Jul 14, 2014 18:03 UTC (Mon) by dlang (✭ supporter ✭, #313) [Link]

it doesn't write n tracks in one pass, what it does is that it overlaps the tracks the way a row of shingles overlaps the row before them. This means that you can't just write wherever you want to, if you want to write a track you have to write the track that partially overlaps that track, the track that partially overlaps, the second track, etc.

it still takes n rotations of the media to write n tracks

Support for shingled magnetic recording devices

Posted Apr 3, 2014 19:02 UTC (Thu) by redden0t8 (guest, #72783) [Link]

What size of bands/zones are we looking at? 1 MB? 100MB?


Copyright © 2014, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds