A little more than two years ago, Facebook had a revolutionary idea: Inspired by the model of open-source software, it would bring together a community – the Open Compute Project (OCP) – focused on the creation of open-source hardware specifications for energy-efficient and economical data centres.
Now OCP plans to expand its efforts by developing an alternative to the black-box switches that connect data centres to the outside world.
“Open source has clearly had a huge impact on the pace of innovation in software, and it’s starting to have an impact on hardware as well,” Frank Frankovsky, vice president of Hardware Design and Supply Chain at Facebook and chairman/president of the Open Compute Project Foundation, said during his keynote address at Interop on Wednesday.
“We are working together, in the open, to design and build smarter, more scalable, more efficient data centre technologies – but we’re still connecting them to the outside world using black-box switches that haven’t been designed for deployment at scale and don’t allow consumers to modify or replace the software that runs on them,” he said.
“With that in mind,” Frankovsky said, “we are today announcing a new project within OCP that will focus on developing a specification and a reference box for an open, OS-agnostic top-of-rack switch.”
In 2009, Facebook set a daunting challenge for a small team of its engineers: Figure out how to scale the company’s massive computing infrastructure in the most efficient and economical way possible.
“Working out of an electronics lab in the basement of our Palo Alto, California, headquarters, the team designed our first data centre from the ground up. A few months later, we started building it in Prineville, Oregon,” Jonathan Heiliger, vice president of Technical Operations at Facebook,wrote in 2011.
“The project,” Heiliger wrote, “which started out with three people, resulted in us building our own custom-designed servers, power supplies, server racks and battery back-up systems. Because we started with a clean slate, we had total control over every part of the system, from the software to the servers to the data centre.”
The result was Facebook’s Prineville data centre, which uses 38 percent less energy to do the same work as Facebook’s existing facilities, while costing 24 percent less. In April 2011, Facebook decided that it needed to share what it had done.
It founded OCP and published the specifications and mechanical designs for the hardware used in its data centre, including motherboards, power supply, server chassis, server rack and battery cabinets. It also shared its data centre electrical and mechanical construction specifications.
“It would be really unfortunate if the technology that’s available to us was the limiting factor of the richness of the experience that we can deliver to users,” Frankovsky says.
“It’s our hope that an open, disaggregated switch will enable a faster pace of innovation in the development of networking hardware; help software-defined networking continue to evolve and flourish; and ultimately provide consumers of these technologies with the freedom they need to build infrastructures that are flexible, scalable and efficient across the entire stack,” Frankovsky says.
“This is a new kind of undertaking for OCP-starting a project with just an idea and a clean sheet of paper, instead of building on an existing design that’s been contributed to the foundation – and we are excited to see how the project group delivers on our collective vision,” Frankovsky says.
Najam Ahmad, who runs the network engineering team at Facebook, will lead the project, and a number of organisations have already announced their plans to participate, including Big Switch Networks, Broadcom, Cumulus Networks, Facebook, Intel, Netronome, OpenDaylight, the Open Networking Foundation and VMware. They will start work next week at the OCP Engineering Summit at MIT.
“We are going to end up spending time on the specifications of what we’re trying to solve,” Ahmad explains. “We already have a couple of proposals from the project team, but we don’t have a whole lot of detail at this point.”
But Frankovsky notes that the project needs to address some simple form factor issues that OCP has seen in the data centre.
He notes that when you put black-box switches into a rack built on OCP’s Open Rack standard, they’re the only components that need their own special shelf.
“Some of the cluster switches that are designed, I don’t know if the engineers that designed them thought anyone would use more than one, because they vent side-to-side,” Frankovsky says, noting that Facebook has had to design special chimneys to prevent them from venting hot air into each other.
“At the last Open Compute Summit, we talked about the importance of disaggregation, of separating the components of these technologies from each other so we can build systems that truly fit the workloads they run and update those components independently of each other, on an as-needed basis,” Frankovsky says.
“But the promise of disaggregation-a promise that’s been made since the days of the mainframe-can truly be delivered on only if we work together, in the open, to establish common standards that everyone can adopt and build upon, from the bottom of the hardware stack to the top,” Frankovsky says. “And with the addition of this new project, the OCP community now has an opportunity to do exactly that.”