For the week of May 18th, 2012:
- What’s the Technology Behind FaceBook?
“IOUG Podcast 18-MAY-2012: Facebook Technology and Oracle”
Subscribe to this Podcast (RSS) or via iTunes
What’s the Technology Behind Facebook?
With this week’s largest in history IPO leading the news. IOUG takes a look at what kind of infrastructure do you design for a company that has some of the following user-load statistics? How about 570 billion page views per month And more than 3 billion user photos being uploaded every month serving 1.2 million photos per second. Add to that more than 25 billion pieces of content (status updates, comments, etc) are shared every month in a data cloud of more than 60,000 servers
Well, even these days, it is still considered an open-source LAMP technology stack, referring to Linux, Apache, MySQL, and PHP. And since the acquisition by Oracle, Facebook does purchase support and maintenance from the Big Red O.
Facebook still uses PHP, with an in-house built compiler called HipHop, running on an in-house optimized Linux, and continues to deploy MySQL, but primarily as a key-value persistent storage. Data joins and logic are processed in the middleware tier just beyond the caching components.
Memcached is their distributed memory caching system deployed between the web servers and MySQL servers, similar to well-known hardware accelerator appliances, but without the additional hardware block management intelligence found in the box-based solutions.
Scribe is their logging system that has a multitude of internal uses, from auditing, to statistics analysis.
Image and object storage is managed under Haystack – a technology stack object store, with an NFS/XFS based back-end, but employing a novel in-house built indexing metadata subsystem generating the Needles or pointers which define the begin and end points of the storage blocks for writes and reads.
Hadoop is used for Big Data analysis .and is accessed via another in-house product called Hive enabling users to execute SQL queries against Hadoop.
The Thrift software framework provides a universal service bus for Facebook’s cross-language service development since many of its applications and modules were written in different languages.
Cassandra is the storage subsystem under all that data, providing the required clustering and redundancy for the massive number of applications systems.
And Facebook’s multi-site Content Delivery Network of servers is where all those static objects, like Facebook logos, are served, assuring geographically proximal delivery of static content is performed without overhead to the transactional systems.
In 2010, at an OpenWorld conference keynote, Larry Ellison made an interesting statement to the effect that the entire Facebook universe could be run off of a 2-rack Exalogic system. The rest of the IT universe started poking holes in that generous statement, while also admitting that it was a keynote address, and as such renown for use of puffery and unsubstantiated claims.
But as a DBA myself, I was wondering what would it actually take to run Facebook, if it were being approached as a new client lead by Oracle?
Based upon the 2011 datacenter migration of Hadoop alone, which involved 30 petabytes of storage, that would be roughly 6,000 Exadata Storage units. With that same year’s estimate of 60,000 servers, applying the typical ratio of 4 app servers to 1 database node, and figuring 1 Exalogic rack replacing every 30 standalone nodes, gives us about 500 ExaLogic T3 units to serve up that part of the stack. But that’s based upon last year’s numbers. So for this year, multiply everything by a factor of 2, since Facebook grows at a rate of around 10 million users per month now, counting in all of the external social address imports from other systems. All of this may change drastically if Facebook pursues its much-rumored “pay to play” plan of charging users for various levels of access and use, transitioning Facebook into another Software as a Service vendor. And that in-turn reinforces why Oracle might be pursuing further expansion of its WebCenter Social product line.
Oh and in case you were wondering about human resources, it takes Facebook approximately 1 IT engineer for every 1.2 million users to manage this beast.
You can learn more about the individual components and many many click-throughs that power the Facebook application system at developers.facebook.com/opensource.