URLs and URIs, Proxies and Passwords
Pages: 1, 2, 3, 4, 5
Resolving Relative URIs
The URI class has three methods for converting
back and forth between relative and absolute URIs.
public URI resolve(URI uri)
This method compares the
uri argument to this URI and
uses it to construct a new URI object that wraps
an absolute URI. For example, consider these three lines of code:
URI absolute = new URI("http://www.example.com/");
URI relative = new URI("images/logo.png");
URI resolved = absolute.resolve(relative);
After they've executed, resolved
contains the absolute URI
http://www.example.com/images/logo.png.
If the invoking URI does not contain an absolute
URI itself, the resolve( ) method resolves as much
of the URI as it can and returns a new relative URI object as a
result. For example, take these three statements:
URI top = new URI("javafaq/books/");
URI relative = new URI("jnp3/examples/07/index.html");
URI resolved = top.resolve(relative);
After they've executed, resolved
now contains the relative URI
javafaq/books/jnp3/examples/07/index.html with
no scheme or authority.
public URI resolve(String uri)
This is a convenience method that
simply converts the string argument to a URI and then resolves it
against the invoking URI, returning a new URI object as the result.
That is, it's equivalent to
resolve(newURI(str)). Using
this method, the previous two samples can be rewritten as:
URI absolute = new URI("http://www.example.com/");
URI resolved = absolute.resolve("images/logo.png");
URI top = new URI("javafaq/books/");
resolved = top.resolve("jnp3/examples/07/index.html");
public URI relativize(URI uri)
It's also possible
to reverse this procedure; that is, to go from an absolute URI to a
relative one. The relativize( ) method creates a
new URI object from the uri
argument that is relative to the invoking URI. The
argument is not changed. For example:
URI absolute = new URI("http://www.example.com/images/logo.png");
URI top = new URI("http://www.example.com/");
URI relative = top.relativize(absolute);
The URI object relative now
contains the relative URI images/logo.png.
Utility Methods
The URI
class has the usual batch of utility methods: equals(), hashCode( ), toString(
), and compareTo( ).
public boolean equals(Object o)
URIs are tested for equality pretty much as you'd expect. It's not a direct string comparison. Equal URIs must both either be hierarchical or opaque. The scheme and authority parts are compared without considering case. That is, http and HTTP are the same scheme, and www.example.com is the same authority as www.EXAMPLE.com. The rest of the URI is case-sensitive, except for hexadecimal digits used to escape illegal characters. Escapes are not decoded before comparing. http://www.example.com/A and http://www.example.com/%41 are unequal URIs.
public int hashCode( )
The hashCode( ) method
is a usual hashCode( ) method, nothing special.
Equal URIs do have the same hash code and unequal URIs are fairly
unlikely to share the same hash code.
public int compareTo(Object o)
URIs can be ordered. The ordering is based on string comparison of the individual parts, in this sequence:
-
If the schemes are different, the schemes are compared, without considering case.
-
Otherwise, if the schemes are the same, a hierarchical URI is considered to be less than an opaque URI with the same scheme.
-
If both URIs are opaque URIs, they're ordered according to their scheme-specific parts.
-
If both the scheme and the opaque scheme-specific parts are equal, the URIs are compared by their fragments.
-
If both URIs are hierarchical, they're ordered according to their authority components, which are themselves ordered according to user info, host, and port, in that order.
-
If the schemes and the authorities are equal, the path is used to distinguish them.
-
If the paths are also equal, the query strings are compared.
-
If the query strings are equal, the fragments are compared.
URIs are not comparable to any type except themselves. Comparing a
URI to anything except another
URI causes a
ClassCastException.
public String toString( )
The toString( ) method
returns an unencoded string form of the
URI. That is, characters like é and \
are not percent-escaped unless they were percent-escaped in the
strings used to construct this URI. Therefore, the
result of calling this method is not guaranteed to be a syntactically
correct URI. This form is sometimes useful for display to human
beings, but not for retrieval.
public String toASCIIString( )
The toASCIIString( ) method returns an
encoded string form of the
URI. Characters like é and \ are always
percent-escaped whether or not they were originally escaped. This is
the string form of the URI you should use most of the time. Even if
the form returned by toString( ) is more legible
for humans, they may still copy and paste it into areas that are not
expecting an illegal URI. toASCIIString( ) always
returns a syntactically correct URI.
Proxies
Many systems access the Web and sometimes other non-HTTP parts of the Internet through proxy servers. A proxy server receives a request for a remote server from a local client. The proxy server makes the request to the remote server and forwards the result back to the local client. Sometimes this is done for security reasons, such as to prevent remote hosts from learning private details about the local network configuration. Other times it's done to prevent users from accessing forbidden sites by filtering outgoing requests and limiting which sites can be viewed. For instance, an elementary school might want to block access to http://www.playboy.com. And still other times it's done purely for performance, to allow multiple users to retrieve the same popular documents from a local cache rather than making repeated downloads from the remote server.
Java programs based on the URL class can work
through most common proxy servers and protocols. Indeed, this is one
reason you might want to choose to use the URL
class rather than rolling your own HTTP or other client on top of raw
sockets.
System Properties
For basic operations, all you have
to do is set a few system properties to point to the addresses of
your local proxy servers. If you are using a pure HTTP proxy, set
http.proxyHost to the domain name or the IP
address of your proxy server and http.proxyPort to
the port of the proxy server (the default is 80). There are several
ways to do this, including calling System.setProperty() from within your Java code or using the -D options when
launching the program. This example sets the proxy server to
192.168.254.254 and the port to 9000:
% java -Dhttp.proxyHost=192.168.254.254 -Dhttp.proxyPort=9000
com.domain.Program
If you want to exclude a host from being proxied and connect directly
instead, set the http.nonProxyHosts system
property to its hostname or IP address. To exclude multiple hosts,
separate their names by vertical bars. For example, this code
fragment proxies everything except
java.oreilly.com and
xml.oreilly.com:
System.setProperty("http.proxyHost", "192.168.254.254");
System.setProperty("http.proxyPort", "9000");
System.setProperty("http.nonProxyHosts", "java.oreilly.com|xml.oreilly.com");
You can also use an asterisk as a wildcard to indicate that all the hosts within a particular domain or subdomain should not be proxied. For example, to proxy everything except hosts in the oreilly.com domain:
% java -Dhttp.proxyHost=192.168.254.254 -Dhttp.nonProxyHosts=*.oreilly.com
com.domain.Program
If you are using an FTP proxy server, set the
ftp.proxyHost, ftp.proxyPort,
and ftp.nonProxyHosts properties in the same way.
Java does not support any other application layer proxies, but if
you're using a transport layer SOCKS proxy for all
TCP connections, you can identify it with the
socksProxyHost and
socksProxyPort system properties. Java does not
provide an option for nonproxying with SOCKS. It's
an all-or-nothing decision.
The Proxy Class
Java 1.5 allows more fine-grained
control of proxy servers from within a Java program. Specifically,
this allows you to choose different proxy servers for different
remote hosts. The proxies themselves are represented by instances of
the java.net.Proxy class. There are still only
three kinds of proxies, HTTP, SOCKS, and direct
connections (no proxy at all), represented by three constants in the
Proxy.Type enum:
-
Proxy.Type.DIRECT -
Proxy.Type.HTTP -
Proxy.Type.SOCKS
Besides its type, the other important piece of information about a
proxy is its address and port, given as a
SocketAddress object. For example, this code
fragment creates a Proxy object representing an
HTTP proxy server on port 80 of
proxy.example.com:
SocketAddress address = new InetSocketAddress("proxy.example.com", 80);
Proxy proxy = new Proxy(Proxy.Type.HTTP, address);
Although there are only three kinds of proxy objects, there can be many proxies of the same type for different proxy servers on different hosts.
The ProxySelector Class
Each running Java 1.5 virtual machine
has a single java.net.ProxySelector object it uses
to locate the proxy server for different connections. The default
ProxySelector merely inspects the various system
properties and the URL's protocol to decide how to
connect to different hosts. However, you can install your own
subclass of ProxySelector in place of the default
selector and use it to choose different proxies based on protocol,
host, path, time of day, or other criteria.
The key to this class is the abstract select( )
method:
public abstract List<Proxy> select(URI uri)
Java passes this method a URI object (not a
URL object) representing the host to which a
connection is needed. For a connection made with the URL class, this
object typically has the form
http://www.example.com/ or
ftp://ftp.example.com/pub/files/, or some such.
For a pure TCP connection made with the Socket class, this URI will
have the form socket://host:port:, for instance,
socket://www.example.com:80. The
ProxySelector object then chooses the right
proxies for this type of object and returns them in a
List<Proxy>.
The second abstract method in this class you must implement is
connectFailed( ):
public void connectFailed(URI uri, SocketAddress address, IOException ex)
This is a callback method used to warn a program that the proxy
server isn't actually making the connection. Example 7-11 demonstrates with a
ProxySelector that attempts to use the proxy
server at proxy.example.com for all HTTP
connections unless the proxy server has previously failed to resolve
a connection to a particular URL. In that case, it suggests a direct
connection instead.
import java.net.*;
import java.util.*;
import java.io.*;
public class LocalProxySelector extends ProxySelector {
private List failed = new ArrayList( );
public List<Proxy> select(URI uri) {
List<Proxy> result = new ArrayList<Proxy>( );
if (failed.contains(uri)
|| "http".equalsIgnoreCase(uri.getScheme( ))) {
result.add(Proxy.NO_PROXY);
}
else {
SocketAddress proxyAddress
= new InetSocketAddress( "proxy.example.com", 8000);
Proxy proxy = new Proxy(Proxy.Type.HTTP, proxyAddress);
result.add(proxy);
}
return result;
}
public void connectFailed(URI uri, SocketAddress address, IOException ex) {
failed.add(uri);
}
}
As I already said, each running virtual machine has exactly one
ProxySelector. To change the
ProxySelector, pass the new selector to the static
ProxySelector.setDefault( ) method, like so:
ProxySelector selector = new LocalProxySelector( ):
ProxySelector.setDefault(selector);
From this point forward, all connections opened by that virtual
machine will ask the ProxySelector for the right
proxy to use. You normally shouldn't use this in
code running in a shared environment. For instance, you
wouldn't change the ProxySelector
in a servlet because that would change the
ProxySelector for all servlets running in the same
container.