June 12, 2010

BBC News, pt I

For the inaugural WhitherApps project, we’re going to look at the (excellent) BBC News iPhone/iPad app and see if it could have been built as a web app, rather than in native code.

To kick off with, we’re going to try and emulate the iPad version, in both orientation modes. What we’re basically aiming for is this:

The BBC News iPad app in landscape mode

And this:

The BBC News iPad app in portrait mode

Let’s see how we get on. I’ve decided to break the process down into 4 steps:

  • Figure out how the app pulls content from the BBC
  • Make a wireframe web page that behaves like the app
  • Stitch the content into the wireframe
  • Decorate the wireframe so it looks just like the real thing

This gives me a chance to assess the feasibility of the undertaking before I get too swept up with pushing pixels around.

Behind the scenes

Obviously the app does not ship with the news in it, and my assumption is that the app is little more than a client to render a feed of content and images located somewhere on the web. Architecturally, it’s nothing more than a customized browser-like client… but we do need to figure out where the data comes from and how we might be able to use it in a real browser-based app.

I’m using my iPad on a home WiFi network. I take a quick look at the network settings on the device and I see that I can set up a proxy for web access and so on. This seems like the easiest way to get in to see the traffic coming to and from the app: I run up a Squid proxy on my Mac (which is conveniently on the same local WiFi network) and set the iPad’s proxy to be my Mac’s local IP address.

As a proxy, Squid does lots of clever things. For our purposes though, we really just want to see the requests the device is making, so I tail the access.log file. This shows me the HTTP requests from the device as it makes them:

GET http://www.live.bbc.co.uk/moira/feeds/ipad/news/en/v1 - application/json
GET http://www.bbc.co.uk/moira/feed/news_world/front_page - application/atom+xml
GET http://bbc.112.2o7.net/b/ss/bbcwnewsiphone/0/OIP-2.0/s82818894? - text/html
GET http://cdnedge.bbc.co.uk/nol/ifs_news/hi/front_page/ticker.json - text/javascript
GET http://www.bbc.co.uk/moira/feed/news_world/americas - application/atom+xml
GET http://static.bbc.co.uk/moira/img/ipad/thumbnail/48058000/jpg/_48058752_009512185-2.jpg - image/jpeg
...

This looks very promising. Firstly the app seems to be using HTTP – no proprietary protocols here – so this will work well if we come to use some sort of AJAX technique ourselves. Secondly, the URLs are self-explanatory, so it’s easy to see how things are working. There’s some sort of initialising JSON at the start, then an ATOM feed of the front page news (and shortly afterward the ‘Americas’ page), and then a whole bunch of thumbnails that are used for the navigation icons at the top (or side) of the app.

(Incidentally, if you do do this sort of thing, prepare to be intrigued by the amount of background HTTP that the iPad is sending to Apple!)

Playing it back

My first impulse of course is to fire up a web browser (or, in this case, wget on the command line) and see what the payload of some of these responses is. So I try the initial JSON file:

~ > wget http://www.live.bbc.co.uk/moira/feeds/ipad/news/en/v1
--2010-06-13 13:13:50--  http://www.live.bbc.co.uk/moira/feeds/ipad/news/en/v1
Resolving www.live.bbc.co.uk (www.live.bbc.co.uk)... 212.58.246.160
Connecting to www.live.bbc.co.uk (www.live.bbc.co.uk)|212.58.246.160|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2010-06-13 13:13:53 ERROR 403: Forbidden.

And this doesn’t seem so good. How does the iPad app get the content over HTTP (via my Mac) when my Mac itself can’t?

Well, like all self-respecting mobile technologists, I have something of a fetish for HTTP user-agents. I wonder how the iPad app identifies itself when it makes requests to the BBC server? I go back to Squid, alter the configuration slightly so that the HTTP headers are logged, and refresh the iPad app. A whole load of new HTTP goes past, but this time I can see that the app’s requests include:

User-Agent: BBC News 1.2.1 (iPad; iPhone OS 3.2; en_US)

Call it a hunch, but I wonder how wget on my Mac will get on if I spoof the user-agent header?

~ > wget --user-agent="BBC News 1.2.1 (iPad; iPhone OS 3.2; en_US)" http://www.live.bbc.co.uk/moira/feeds/ipad/news/en/v1
--2010-06-13 13:18:20--  http://www.live.bbc.co.uk/moira/feeds/ipad/news/en/v1
Resolving www.live.bbc.co.uk (www.live.bbc.co.uk)... 212.58.246.160
Connecting to www.live.bbc.co.uk (www.live.bbc.co.uk)|212.58.246.160|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6199 (6.1K) [application/json]

Bingo! That’s all it takes to be not ‘forbidden’ and get our initial 6Kb’s worth of JSON. It looks like this:

{
 "name": "WWW iPad Application Bootstrap",
 "version": "1.0.4",
 "published": "2010-06-07 11:09:39 Etc/GMT",
 "ticker_url": "http://cdnedge.bbc.co.uk/nol/ifs_news/hi/front_page/ticker.json",
 "live_feed_uri_template": "http://www.bbc.co.uk/worldservice/meta/mobile/iphone/%7Bbandwidth%7D",
 "ugc_sms_number": "+447725100100",
 "ugc_email": "talkingpoint@bbc.co.uk",
 "feedback_email": "iphone-feedback@bbc.co.uk",
 "faq_url": "http://www.bbc.co.uk/moira/html/%7bdevice%7d/news/faq/en",
 "feedback_url": "http://www.bbc.co.uk/moira/html/%7bdevice%7d/news/feedback/en",
 "conditions_url": "http://www.bbc.co.uk/moira/html/%7bdevice%7d/news/tandc/en",
 "privacy_url": "http://www.bbc.co.uk/moira/html/%7bdevice%7d/news/privacy/en",
 "copyright": "BBC © 2010",
 "feeds": [
 {
  "type": "group",
  "title": "More",
  "feeds": [
  {
   "title": "Top Stories",
   "feed_url": "http://www.bbc.co.uk/moira/feed/news_world/front_page",
   "default": true,
   "movable": 0
  },
  {
   "title": "Americas",
   "feed_url": "http://www.bbc.co.uk/moira/feed/news_world/americas",
   "default": true
  },
...
  {
   "title": "Audio & Video",
   "feed_url": "http://www.bbc.co.uk/moira/feed/avod/iphone/news/en/v1"
  },
  {
   "type": "group",
   "title": "News in Other Languages",
   "feeds": [
   {
    "title": "Mundo",
    "feed_url": "http://www.bbc.co.uk/worldservice/syndication/mobileiq/iphone/mundo/homepage/full.xml",
    "logo_url": "bbcimage://logomundo/wsdownload.bbc.co.uk/worldservice/images/branding/languages/iphone/mundo_125x19.png"
   },
...
   {
    "title": "Urdu",
    "feed_url": "http://www.bbc.co.uk/worldservice/syndication/mobileiq/iphone/urdu/homepage/full.atom",
    "logo_url": "bbcimage://logourdu/wsdownload.bbc.co.uk/worldservice/images/branding/languages/iphone/urdu_117x28.png"
   }

   ]
  }
  ]
 }
 ]
}

This looks good. We can see that the first thing the app is doing is being told where all the critical feeds are stored, and how they should be structured in the navigational menu. We can see how to get the ticker data for the top of the page, and we can even get some ideas about how the ‘News in other languages’ will be fetched.

Let’s start by looking at the main feed for what must be the front page:

http://www.bbc.co.uk/moira/feed/news_world/front_page

Again, this requires the user-agent to be spoofed, and the response is as follows:

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns='http://www.w3.org/2005/Atom' xmlns:media='http://search.yahoo.com/mrss/' xmlns:dc='http://purl.org/dc/elements/1.1/'>
  <title>BBC News | News Front Page | World Edition</title>
  <updated>2010-06-13T20:29:08+00:00</updated>
  <id>urn:news-bbc-co-uk:section:bbc_news:front_page:world_edition</id>
  <author>
    <name>BBC</name>
  </author>
  <entry>
    <title>Thousands flee Kyrgyzstan unrest</title>
    <summary>Escalating ethnic violence in Kyrgyzstan that has killed nearly 100 people prompts tens of thousands to flee to Uzbekistan.</summary>
    <category label='World/Asia Pacific' term='World/Asia Pacific' />
    <updated>2010-06-13T17:51:40+00:00</updated>
    <id>urn:news-bbc-co-uk:story:8737578</id>
    <link rel='alternate' href='http://news.bbc.co.uk/1/hi/world/asia_pacific/10304165.stm' type='text/html' title='Thousands flee Kyrgyzstan unrest' />
    <media:thumbnail url='bbcimage://urn%3Anews-bbc-co-uk%3Astory%3A8737578/static.bbc.co.uk/moira/img/%7bdevice%7d/thumbnail/48063000/jpg/_48063440_48063241.jpg' />
    <content type='xhtml'>
      <div xmlns='http://www.w3.org/1999/xhtml' class='body'>
        <div class='fullwidth_img'>
          <a href='bbcvideo://urn%3Anews-bbc-co-uk%3Astory%3A8737578/www.bbc.co.uk/moira/avod/%7bdevice%7d/av/urn-news.bbc.co.uk-story-8737578/urn-news.bbc.co.uk-media-48063655/news/world/604000/604036/%7bbandwidth%7d'>
            <img alt='Soldiers in central Osh' src='bbcimage://urn%3Anews-bbc-co-uk%3Astory%3A8737578/static.bbc.co.uk/moira/img/%7bdevice%7d/styfull/48063000/jpg/_48063696_jex_721239_de27-1.jpg' class='fullwidth_512x288' />
          </a>
        </div>
        <p>Escalating ethnic violence in Kyrgyzstan has prompted tens of thousands of ethnic Uzbeks to flee the country.</p>
...
        <div class='inline_img'>
          <img alt='Map of Kyrgyzstan' src='bbcimage://urn%3Anews-bbc-co-uk%3Astory%3A8737578/static.bbc.co.uk/moira/img/%7bdevice%7d/styhalf/48063000/gif/_48063789_kyrgyz_osh_jalal_0610.gif' class='inline_226x170' />
        </div>
...
      </div>
    </content>
  </entry>
...
</feed>

Excellent. This looks like a very straightforward ATOM feed, containing thumbnails for the navigation and fairly simple HTML formatting. The style of BBC articles is to have small, bite-sized paragraphs with small inline images and occasional videos. We’ll probably have to sort out all the styling ourselves, but the markup looks clean and workable.

The bbcimage://urn URLs for the images and thumnails look a bit strange, but we’ve already seen the HTTP traffic when they’re fetched, so we can figure out how they’ll need to be rewritten. At first glance it looks like:

bbcimage://urn%3Anews-bbc-co-uk%3Astory%3A8737578/static.bbc.co.uk/moira/img/%7bdevice%7d/thumbnail/48063000/jpg/_48063440_48063241.jpg

Will become:

http://static.bbc.co.uk/moira/img/ipad/thumbnail/48063000/jpg/_48063440_48063241.jpg

…which is a fairly simply transformation.

What next?

OK, so we’ve figured out how to bootstrap our app, get the structure of the navigation and news categories, and then receive some HTML-like content from the BBC feeds. So far so good: it looks quite simple, and certainly something that a self-respecting web app is going to be able to do.

The user-agent spoofing may not be possible from an AJAX-like context in a web browser, so that might require me to build a small proxy on a server somewhere, and means my web app will ultimately have to be more than a static HTML file. But then I suspected I would need to do that anyway, to avoid cross-site scripting issues, so that’s no big deal. (If the BBC were hosting the web app themselves, both reasons would be moot).

So, next installment, I’m going to be taking a look at the structure and user-interface of the app and see how easy it is to synthesize the same overall look and feel of it. Since I’ve seen the data in the background, I now suspect that much of the app uses Safari browser components anyway… so I’m quietly confident!

Stay tuned.

Comments (58)

  1. February 19, 2012
    Nutralogistic said...

    I don’t even understand how I finished up right here, but I thought this put up was once great. I don’t recognize who you’re however certainly you are going to a well-known blogger if you aren’t already. Cheers!

Leave a Reply