6. Topic 3 - Programming for the Web¶
Here we study programming tools for use with the HTTP protocol and associated web page specifications such as HTML, XHTML and XML. Chapters from The Text Book covered in this topic are:
Chapter 6 HTTP Protocol
Chapter 6 provides an overview of HTTP and shows some of the useful Python modules that simplify the job of connecting to a web server to request and retrieve desired information.
Chapter 7 Parsing HTML Data
Chapter 7 discusses how an application program can make use of data retrieved from a web server. The data is likely to be in HTML or, the closely related, XHTML format. It is quite possible to parse the HTML to pick-out specific pieces of desired information. Unfortunately, HTML defines how the data is to appear on the screen rather than defining meaning to the data. So parsing HTML is harder than we would like it to be. The programming project for this topic is related to parsing HTML, but is a fairly manageable. You should find the programming project interesting.
Chapter 8 XML and XML-RPC
Chapter 8 discusses XML, which is similar to HTML, but describes the meaning of the data rather than the formatting of the data onto the screen. So parsing XML is easier than parsing HTML. XML is the data format of choice for transaction processing of various kinds. Furthermore, XML has been used in a clever way to invoke procedures on remote computers (XML-RPC). XML-RPC introduces another interesting topic of distributed computing. Due to time constraints and the fact that distributed computing is really off-topic from web programming, we give fairly minimal coverage of XML-RPC.
In this class, we will not cover server side web applications. That is covered in the Web Programming Class at K-State Salina. Although, chapters 18 and 19 of the text book relate to server side web applications. Also, since the book was published, some interesting developments have occurred, which allows Python usage for server applications to increase.
- One is the much needed Web Server Gateway Interface (WSGI), which provides a much better model for the interface between web servers and application servers. CGI is a pretty clean model, but it has performance and scalability issues because a CGI program is a forked process and uses pipes (I/O constrained) to communicate with the server. To get around the performance issues, but not necessarily the scalability issues, various other approaches are used by web servers. The later techniques tend to limit the programming choices for web applications (PHP, ASP, ...) and make server programs, such as Apache, larger and more complex to configure. With WSGI, the application is itself a server, running as a separate process from the web server process and communicating with the server using a fixed network connection.
- WSGI also offers the flexibility of using a middle-ware program that is connected to the server and provides a flexible framework for application software. Such is the case with the Google App Engine. Google App Engine is significant because it offers a simple Python based means to develop an application, and because it is hosted by Google, which of course has significant resources, so the applications you develop are likely to be discovered and used by others.
Well, enough about server side applications, let’s see how we can do some web based programming at the client side.
- Retrieving Web Pages with HTTP (Chapter 6) Lecture
- The Chapter 6 Power Point slides
ch6_Web_client.ppt) that go with the chapter 6 lecture. However, you may do just as well, or better, by following the information below instead of from the slides.
- Parsing HTML Documents (Chapter 7) Lecture
- The Chapter 7 Power Point slides
ch7_Parsing_HTML.ppt) that go with the chapter 7 lecture. However, you may do just as well, or better, by following the information below instead of from the slides.
- 6.1. HTTP Protocol
- 6.2. Retrieving Web Pages with HTTP
- 6.3. Parsing HTML Data
- 6.4. Example of Parsing HTML Web Pages with Html5lib
- 6.5. Documentation of the wxWeather Program
- 6.6. Programming Assignment 3 – Parsing HTML Web Pages
- 6.7. XML, JSON, and API based Web Programming
- 6.8. Programming Assignment 4 - Web Based APIs
- 6.9. XML and XML-RPC