CGI Programming Is Simple!

 

That’s a bold claim, isn’t it.

I bet you are no stranger to computer programming. Why else would you be reading this? You know that nothing about computers is simple!

And that’s precisely why I can make such a bold claim.

Why? Because, as long as you know how to write any kind of computer programs, you already know everything you need to know about writing CGI programs. Yes, everything!

   What is CGI?

 

There seems to be a lot of confusion even among experienced programmers about it. Myths abound. I am sure you have heard at least some of them.

Which is why I would like to tell you first what CGI is not.

  • It is not a programming language. That means, for example:
    • You do not have to learn Perl
    • You can use the languages you already know
    • You can use any language as long as it
      • can read input
      • can write output
      And what computer language can’t?
    • For that matter, you do not need to use a language.
  • It is not a programming style. You can use your own.
  • It is not cryptic. Perl is cryptic, all right, but see above: You don’t need to use Perl.
  • It is not for Unix gurus only. In fact, you don’t have to be any kind of guru. All you need is to know how to program. And you already know that!

NOTE: Please don’t misconstrue me. I have nothing against Perl. But from browsing the web you may get the impression you must learn it. All I’m saying is that you don’t have to. But if you want to, be my guest.

ANOTHER NOTE: If you don’t know anything about programming, you need to learn that first. But you can still continue reading.

   All right, already!
   What is it?

 

Quite simply, CGI stands for Common Gateway Interface. That’s a fancy term for something we all know as Application Programming Interface. So, CGI is the API for the web server.

The web server, of course, is the software that sends web pages to web browsers. Technically, web browsers should be called web clients, and people who use them should be called web browsers. But no need to get technical here.

   What does a server do?

 

Essentially, it waits. Unless the site is very busy, of course.

What does it wait for? For a client, I mean browser, to ask for a file. The file can be an HTML document, or a graphic, or just about any kind of file.

Once the server receives the request, it does three things:

  • It sends a line of plain text which explains what kind of file is being sent, i.e. HTML, or GIF, or whatever else.
  • It sends out a blank line.
  • It sends out the contents of the file.
In that order.

   How many files does it send?

 

One.

Now you may be shaking your head. After all, a typical web page consists of an HTML document and some graphics, each of them residing in a different file.

That, of course, is true. Nevertheless, during a single session, a web server sends out only one file. The browser must start a new session for each and every file it needs to get. And since all servers and most browsers are perfectly capable of multitasking, they can have several sessions running at the same time. But they do need a separate session for every file.

   Does it have to be a file?

 

Not necessarily. All that is transferred is data.

Remember: The server and the client (the browser) usually run on different computers. They may run under different operating systems, even with different microprocessors. The browser really asks for a “resource” and does not know, or care, where the server gets the data from.

Nevertheless, a typical server is programmed to get its data from a file. It simply reads the data from the file and sends it to the client during the last of the three steps I talked about before.

As a result of this process, the server only sends static data. That is to say, the server does not dynamically modify the data.

   But I want to send dynamic data!

 

And you can! Quite easily at that.

This is precisely what CGI was designed for. You simply write a program that produces data dynamically. Your data then goes to the browser instead of a file. That way your CGI program effectively extends the functionality of the server, just as, for example, a DLL extends the functionality of Windows.

Except, CGI is much simpler than anything you would write for Windows.

   But, how do I talk to the browser?

 

You don’t. The server handles that for you. In fact, the beauty of it is that you do not even have to talk to the server. All you do is write to standard output. So, for example, in C you could use printf().

The only thing you have to take care of is using all three steps I talked about before. Since the server does not know what kind of data you are outputing, you need to write that information to standard output yourself.

Do you remember I said you could even do it without the use of a programming language?

Let’s suppose, for example, that your server is running under MS DOS. Well, none of them do, but there are Windows servers, and Windows can handle MS DOS commands.

Now, let us say you would like to send the listing of your current directory to the web (not a good idea, but it shows just how simple it is). Well, MS DOS has the dir command that sends the directory listing to standard output.

   What about the first two steps?

 

True, dir will not explain it is sending plain text before sending it. But never fear! You can write a batch file like this:



   echo Content-type: text/plain
   echo.
   dir


The first line of this batch file tells the browser to expect plain text. The second sends a blank line. The third lists the contents of the current directory.

A disclaimer is in place here: Since my web site is on a Unix server, I could not test this. I know you can use Unix shells for similar purposes. I do not know whether Windows servers let you use batch files for CGI. But since more people understand MS DOS batch files than Unix shells, I chose this example.

   How do I get input?

 

First off, let me emphasize that the web is not interactive. That is your CGI program cannot ask the user for input, process it, send out some output, ask for more input etc.

But this is one of the reasons why CGI programming is so simple. The program receives user input at most once, right at the start, and sends output once. Nevertheless, both the input and output can be of any size your program can handle.

That said, your program can receive user input in one of two ways depending on what method the browser uses to send it to the server.

   Where does the browser
   find user input?

 

The browser receives user input using HTML forms. A form can instruct the browser to send the data in one of two methods: GET and POST.

The GET method sends it to you as part of the URL. The POST method sends it as input from stdin. This seems to have several major advantages over using the URL:

  • You can send more data (URL has a size limit).
  • The data is not logged along the way. Sending a password, for example, as part of the URL leaves a trail in the various systems your data is travelling through!
  • Data does not appear in the browser Location bar. Again, showing a password there may not be appreciated by the user if someone is watching over his shoulder.

   How do I know which method is used?

 

Before the web server loads your CGI program, it sets several environment variables which you can study to know how much input data you are getting and where it is coming from (i.e. URL or stdin.

One of these environment variables is REQUEST_METHOD. Its value can be POST, GET, or occasionally HEAD.

In the first case, CONTENT_LENGTH tells you how many bytes of data you should read from stdin. And CONTENT_TYPE, tells you that this data is coming from a form, or possibly from some other source.

Once you have read the data, you can process it, and send your output to stdout. Of course, you will probably want to write it as HTML data, with all of its formatting. But CGI programs can produce any kind of output, for example a GIF file, or anything else.

This is why, in the first two steps, you need to tell the browser just what kind of data you are sending it. For HTML you do it by sending the string Content-type: text/html followed by two line feeds before doing anything else. So, in C, you could code something like printf("Content-type: text/html\n\n");

   Let’s see an example

 

Armed with this knowledge, I wrote a simple C program which outputs its command line using argc and argv, then checks the environment variables I mentioned above, and if there is data at stdin, it reads it. It then sends all this information out in plain HTML. I have called this program c and placed it in my cgi-bin directory.

I also created a simple HTML form. The code for the form is here:


    <b>Pick your favorite color</b><br>
    <form method="POST" action="http://www.whizkidtech.net/cgi-bin/c">
    <input type="RADIO" name="color" value="red"> Red<br>
    <input type="RADIO" name="color" value="green"> Green<br>
    <input type="RADIO" checked name="color" value="blue"> Blue<br>
    <input type="RADIO" name="color" value="cyan"> Cyan<br>
    <input type="RADIO" name="color" value="magenta"> Magenta<br>
    <input type="RADIO" name="color" value="yellow"> Yellow<br>
    <br><b>On the scale 1 - 3, how favorite is it?</b><br><br>
    <select name="scale" size=1>
    <option>1
    <option selected>2
    <option>3
    </select>
    <br>
    <input type="HIDDEN" name="favorite color" size="32">
    <input type="Submit" value="I'm learning" name="Attentive student">
    <input type="Submit" value="Give me a break!" name="Overachiever">
    <input type="Reset" name="Reset">
    </form>

 

Please note the existence of a hidden input with no assigned value, just to test what it sends to the program. You can play with this form and see what it sends to my program below.

A thing to note is that it converts any spaces into plusses, and any other non-alphanumeric values into %xx, where xx is the hexadecimal version of its ASCII value. Fortunately, that is fairly simple to fix, and c will also show you the fixed input.

After that, there is one thing left: You need to parse the input. By that I mean, you need to break it appart into pairs of key and value. Each pair is separated by an ampersand (&). Please note I said separated, not terminated. There is no ampersand after the last pair.

Within each pair, the key part is on the left of an equal sign (=), while the value is on the right. Pretty much like an assignment in C and many other programming languages. To illustrate this, c parses the data and shows you the pairs.

You will note that the favorite color key seems to have no value. But it does. It just happens to be spaces. Instruct your browser to show you the page source and take a look at the HTML code c produces to see what I am talking about.

Here is the form, play with it as much as you want. Just click BACK after viewing the results to return here.



 

Pick your favorite color

Red
Green
Blue
Cyan
Magenta
Yellow

On the scale 1 - 3, how favorite is it?


   What about the other method?

 

You need to try the same form with one modification: Instead of POST, use GET.

Note that this time the program will show no input from stdin. Unfortunately, it also gets no additional data in its argv array. Yet, if you take a look at the URL (which your browser should show you), you will see all the data placed there.

The trick is in realizing that, despite appearances, the URL has nothing to do with the command line of the program.

How, then, do you get to the data from within your program? You read it from the environment variable QUERY_STRING.



 

Pick your favorite color

Red
Green
Blue
Cyan
Magenta
Yellow

On the scale 1 - 3, how favorite is it?




 

Feel free to modify the form in any way you want (just copy its source code above), and try it again. But do me a favor: Do it from your own computer. Please do not place it on a web page unless you write your own program to test it with. Please understand that I have a bandwidth limit with my host and it might cost me extra money if you let the whole world use my CGI program from my server.

By the way, if you clicked RELOAD while you were in the c program, your browser probably reacted differently when the data was sent by POST from when it was sent by GET. If you did not do that, go back and try it!