I’m at Write the Docs today in Portland and will be posting notes from sessions throughout the day. These are all posted right after a talk finishes so they’re rough around the edges.
Ashleigh started at Google in 2004 as a data center hardware technician. In 2010 she got involved with a team of tech writers working on API documentation. The story she told was of how Google’s CMS came to be.
Google now has so many developer products it fills a periodic table. Literally. They made one.
Scaling problems can show up so gradually, you barely notice them until you’re already in big trouble. This happened for Google with their CMS. What worked in 2005 was horribly broken by 2010.
In 2005 Google had just hired Chris DiBona as the head of Open Source at Google. He started by focusing on getting Google to contribute more to Open Source projects. They created code.google.com as a place for them to share code. When they launched this it was an introductory place to put some code. They started with documentation around their 10 APIs at the time. It’s build using EZT, or EaZy Templating. It’s a simple markup language you can use to define build objects in your documentation.
Google’s code site was optimized for small files, about 256K, and cached things in memory. This grew from Google’s issues scaling the hardware impacts of their consumption at the time. It was a time when a gigabyte of storage was still a lot.
In 2006 Google launched Project Hosting. In the days before Github this mean that they had a place to host and share open code projects.
By 2010 the builds for code.google.com started running in to serious issues. New docs weren’t going live and they were hitting consistent errors. Files were taking almost 45 minutes to build. This meant that a tech writer working on a document had to give themselves a 45 minute lead time. A new project document set to launch at 2pm had to be filed at 1pm. Any typo or issue in the doc submitted meant another 45 minute delay. All of that was compounded by the fact that each build would fail with a typo in any new doc. One doc with an issue caused problems with new docs across all services.
There were other failures, too. Outside of writer mistakes they hit issues with disk I/O. This caused them to push the build cron jobs back to once every 2 hours. The fun part of that was that to pull any technical documentation down from the web also took 2 hours. Picture how awesome that is when you accidentally publish something. This 2 hour turn around time just didn’t work for how Google wanted to publish technical content.
They faced a choice between a band-aid fix and pushing the reset button on their CMS. They decided to develop a CMS that was actually meant for developer documentation. A team of people worked on this new site and the new CMS. The product of this was developers.google.com.
Google’s new developer site as built differently. Gone were the days of having to do everything manually. Since Google now had App Engine they were able to leverage this as the platform from which they could build docs. Using Django nonrel so that they could work with the Django framework with the non-relational database structure of App Engine.
By moving the CMS away from EZT they avoided relying upon a site-wide build. Now they could build only what the writer asks for, when the writer asks for it. Syntax errors now returned in 60 seconds, not 60 minutes. And, your syntax errors don’t affect the system, just you. One downside to no site-wide builds is that when changes (for example, with pricing) happen outside the document tree Google has to manually rebuild the document to reflect the new pricing structure.
In late-2011 they started the process of migrating over to the new site. With 80,000 documents that’s a slow process. The problem is that it split their code documentation across 2 sites. It was a short-term issue that would eventually be fixed. The goal was to complete the move by May 2012 and all went smoothly.