6.7. XML, JSON, and API based Web Programming

A growing trend among web based services is to provide an application programming interface (API). This allows programmers to write applications that work with the service by issuing simple HTTP requests. The returned data from the requests is usually an XML document or formated as JSON data. This is much preferred to scrapping and parsing HTML because changes to a web page, do not break the program that you write.

A couple simple examples of this are The US Postal Service (also pyusps ) and Twitter. Note that the page referenced here for the Postal Service is just for the zip code database. They also have APIs for other services, such as tracking packages.

6.7.1. XML

XML documents are similar to HTML documents in format. They both use beginning and ending tags, which may contain any number of other nested tags.

<TAG1> Tag Data </TAG1>

The difference between XML and HTML is XML tags describe the data and maybe given arbitrary names that are appropriate to the application; whereas, HTML tags define page layout and must conform to a standard, such HTML 5 or XHTML.

Like HTML, parsing of XML documents is usually accomplished with a module that reads the document into memory parsing it into a tree based data structure that may be accessed from an API. The most common API is the Document Object Model (DOM). Here is a nice XML tutorial.

6.7.2. JSON

JSON is a newer web standard that appears to be a functional replacement for XML. JSON data retrieved from a web server is a string that can be easily imported into a data structure. Where the data needs to build into a Built-in List Operations, the data data looks much like how lists are declared in Python. It begins and ends with a pair of square brackets and the elements are separated by commas. Likewise, dictionary data is represented with curly brackets, commas between elements, colons between the key and value.

For example, here is a JSON string that represents a list of short dictionaries.

[ {'first name': 'Robert', 'last name': 'Crawley'},
  {'first name': 'Cora', 'last name': 'Crawley'},
  {'first name': 'Thomas', 'last name': 'Branson'} ]

JSON is growing in popularity as a replacement for XML for a couple reasons.

  1. The simple encoding of the data makes it simple to pull the data into a data structure containing both lists and dictionaries. Parsers need more logic to distinguish which XML data should be a list and which should be a dictionary.
  2. As the acronym (JavaScript Object Notation) implies, JSON is well supported in Java Script, which is the client side scripting language for web browsers.

Here is a nice JSON tutorial.