It's hard to imagine sorting a terabyte of data in one minute...but that's what computer scientists at the Jacobs School did...and for their efforts, they got themselves a world record at the Sort Benchmark competition. (Check out the full Sort Benchmark / UC San Diego press release.) But before you go, consider the fact that data sorting is a big deal, for a variety of reasons. Facebook ads and Amazon product suggestions are generated thanks to heavy duty data sorting techniques. Companies across the world are turning to data sorting to sift through the mountains of potentially relevant data that are piling up...data analytics in action.
The lead computer science graduate student on the project, Alex Rasmussen (pictured below), explained to me during our photo shoot in a Calit2 server room, that data sorting is a good way to flex a whole bunch of computing / networking / systems muscles. He explained it this way:
“Sorting is also an interesting proxy for a whole bunch of other data processing problems. Generally, sorting is a great way to measure how fast you can read a lot of data off a set of disks, do some basic processing on it, shuffle it around a network and write it to another set of disks,” explained Rasmussen. “Sorting puts a lot of stress on the entire input/output subsystem, from the hard drives and the networking hardware to the operating system and application software.”
For anyone following along, this is the follow up to the 10,318 seconds post.