6.8. Programming Assignment 4 - Web Based APIs¶
You are to develop a simple Python program that will prompt the user to enter the name of a movie or TV show and then it will display a list of the actors in the movie. You will get the data by sending a request, per the published API, to IMDbPY. Although the API parses the data and allows easy access to it, it can also return the data as XML. XML is important enough that a little practice working with it seems important. Then pass the data to a parser such as BeautifulSoup, xml.dom.minidom or xml.etree.ElementTree. If you use BeautifulSoup, see also Parsing XML with BeautifulSoup.
Previous version of IMDbPY could be installed with easy_install. However, due to dependencies on external tools, this may fail to install correctly. For the simple usage that we need, it will work to simply copy the imdb directory extracted from the downloaded program to the site-packages directory under the Lib directory of Python installation.
From the documentation, here are some simple things that you can do with IMDbPy.
from imdb import IMDb ia = IMDb() try: the_matrix = ia.get_movie('0133093') except: pass # ignore the errors / warnings # now, the_matrix.data is a large dictionary containing lots of information. # You can print some of this information using the dictionary keys. # Note that some alias names are provided, so you don't have to directly # use the_matrix.data. print the_matrix['director'] for name in the_matrix['cast']: print name
You can also convert the data to XML, which might be useful for making a stand alone file. Due to the requirement of BeautifulSoup to use lxml to parse XML, xml.dom.minidom may be easiest to use. The lxml module is difficult to install in Windows. See the xml.dom.minidom documentation.
Here is some starter code:
from imdb import IMDb import xml.dom.minidom ia = IMDb() the_matrix = ia.get_movie('0133093') folks = the_matrix.getAsXML( 'cast' ) dom = xml.dom.minidom.parseString(folks) people = dom.getElementsByTagName('person') for peep in people: for node in peep.childNodes: if node.nodeName == u'name': for n in node.childNodes: if n.nodeType == n.TEXT_NODE: print n.data