Working with the Proxy
Gathering Real Time Data from the Web

by D. Hyatt, I. Mirkin, and D. Donado

The Problem with Filters

The problem with accessing real time data from the web at TJHSST is that our website is being filtered. Any HTTP requests to remote sites must go through a separate machine called a proxy server whose IP address is 151.188.17.247 and then through port 8002 rather than the standard 80 that most web servers use. The proxy server then compares the HTTP request against a known list of banned locations, and if suitable it forwards the request to the real site on the Internet but otherwise discards the request. This is handled transparently once the proper manual settings have been set in a web browser such as Netscape, but it does create special problems for dynamic websites that wish to gather data from the web. These requests must also be routed through the proxy since they also must be filtered.

The following two examples show how to access a remote site through the school's proxy sever using either PHP or Perl. Give special thanks to student sysadmin, Ilia Mirkin, who figured this out.

Remote Access with PHP

The following segment of code opens a socket connection through TJHSST's filter, affectionately known as bigbrother.tjhsst.edu, and will go through the required port 8002 which now handles http requests. This connection will then be used to echo a hypothetical web page called somepage.html that is in a subdirectory called somedir at a theoretical site called www.somesite.edu Naturally, users will have to change these lines to access pages and data from real sites!

  <?php

  $fp = fsokopen("bigbrother.tjhsst.edu", 8002);
  if (!$fp) die;
  fputs($fp, "GET /somedir/somepage.html HTTP/1.0\r\n");
  fputs($fp, "Host: www.somesite.edu\r\n\r\n");
  while (!feof($fp)) {
	echo fgets($fp, 128);
  }
  fclose($fp);

  ?>

Remote Access with Perl

Perl has a very useful module for Internet access called LWP (Lib-WWW-Perl). It can directly contact the proxy making a new request where the host and full URL can all be in one string.
  #!/usr/bin/perl

  use LWP;
  $ua = new LWP::UserAgent;
  $ua->proxy(http => "http://bigbrother.tjhsst.edu:8002/");
  $req = new HTTP::Request GET => "http://www.somesite.edu/somedir/somepage.html";
  $stuff = $ua->request($req);
  print "Content-type: text/html\n\n";
  print $stuff->content;

Remote Access with Java

For Java, it is necessary to establish the route through the proxy so that something like a Java Servelet will be able to route the necessary data. Danny Donato (TJ02) offers the following code to handle the proxy implementation in Java.
 

  System.getProperties().put( "proxySet", "true" );
  System.getProperties().put( "proxyHost", "bigbrother.tjhsst.edu" );
  System.getProperties().put( "proxyPort", "8002" );

If you have created some dynamic webpages at TJ and they no longer work, try modifying your program to include routing HTTP requests through our proxy.

Donald W. Hyatt
dhyatt@tjhsst.edu

June 11, 2001