How to use the new Apache Http Client to make a HEAD request


Codever Logo

(P) Codever is an open source bookmarks and snippets manager for developers & co. See our How To guides to help you get started. Public bookmarks repos on Github ⭐🙏


If you’ve updated your Apache HTTP Client code to use the newest library (at the time of this writing it is version 4.3.5 for the httpclient and version 4.3.2 for httpcore) from the version 4.2.x you’ll notice that some classes, like org.apache.http.impl.client.DefaultHttpClient or org.apache.http.params.HttpParams have become deprecated. Well, I’ve been there, so in this post I’ll present how to get rid of the warnings by using the new classes.

1. Use case from Podcastpedia.org

The use case I will use for demonstration is simple: I have a batch job to check if there are new episodes are available for podcasts. To avoid having to get and parse the feed if there are no new episodes, I verify before if the eTag or the last-modified headers of the feed resource have changed since the last call. This will work if the feed publisher supports these headers, which I highly recommend as it spares bandwidth and processing power on the consumers.

So how it works? Initially, when a new podcast is added to the Podcastpedia.org directory I check if the headers are present for the feed resource and if so I store them in the database. To do that, I execute an HTTP HEAD request against the URL of the feed with the help of Apache Http Client. According to the Hypertext Transfer Protocol — HTTP/1.1 rfc2616, the meta-information contained in the HTTP headers in response to a HEAD request SHOULD be identical to the information sent in response to a GET request).

In the following sections I will present how the code actually looks in the Java, before and after the upgrade to the 4.3.x version of the Apache Http Client.

2. Migration to the 4.3.x version

2.1. Software dependencies

To build my project, which by the way is now available on GitHub – Podcastpedia-batch, I am using maven, so I listed below the dependencies required for the Apache Http Client:

2.1.1. Before

<!-- Apache Http client -->
<dependency>
	<groupId>org.apache.httpcomponents</groupId>
	<artifactId>httpclient</artifactId>
	<version>4.2.5</version>
</dependency>
<dependency>
	<groupId>org.apache.httpcomponents</groupId>
	<artifactId>httpcore</artifactId>
	<version>4.2.4</version>
</dependency>

2.1.2. After

<!-- Apache Http client -->
<dependency>
	<groupId>org.apache.httpcomponents</groupId>
	<artifactId>httpclient</artifactId>
	<version>4.3.5</version>
</dependency>
<dependency>
	<groupId>org.apache.httpcomponents</groupId>
	<artifactId>httpcore</artifactId>
	<version>4.3.2</version>
</dependency>

2.2. HEAD request with Apache Http Client

2.2.1. Before v4.2.x

private void setHeaderFieldAttributes(Podcast podcast) throws ClientProtocolException, IOException, DateParseException{

	HttpHead headMethod = null;
	headMethod = new HttpHead(podcast.getUrl());

	org.apache.http.client.HttpClient httpClient = new DefaultHttpClient(poolingClientConnectionManager);

	HttpParams params = httpClient.getParams();
	org.apache.http.params.HttpConnectionParams.setConnectionTimeout(params, 10000);
	org.apache.http.params.HttpConnectionParams.setSoTimeout(params, 10000);
	HttpResponse httpResponse = httpClient.execute(headMethod);
	int statusCode = httpResponse.getStatusLine().getStatusCode();

	if (statusCode != HttpStatus.SC_OK) {
		LOG.error("The introduced URL is not valid " + podcast.getUrl()  + " : " + statusCode);
	}

	//set the new etag if existent
	org.apache.http.Header eTagHeader = httpResponse.getLastHeader("etag");
	if(eTagHeader != null){
		podcast.setEtagHeaderField(eTagHeader.getValue());
	}

	//set the new "last modified" header field if existent
	org.apache.http.Header lastModifiedHeader= httpResponse.getLastHeader("last-modified");
	if(lastModifiedHeader != null) {
		podcast.setLastModifiedHeaderField(DateUtil.parseDate(lastModifiedHeader.getValue()));
		podcast.setLastModifiedHeaderFieldStr(lastModifiedHeader.getValue());
	}

	// Release the connection.
	headMethod.releaseConnection();
}

If you are using a smart IDE, it will tell you that DefaultHttpClient, HttpParams and HttpConnectionParams are deprecated. If you look now in their java docs, you’ll get a suggestion for their replacement, namely to use the HttpClientBuilder and classes provided by org.apache.http.config instead.

So, as you’ll see in the coming section, that’s exactly what I did.

 2.2.2. After v 4.3.x

private void setHeaderFieldAttributes(Podcast podcast) throws ClientProtocolException, IOException, DateParseException{

	HttpHead headMethod = null;
	headMethod = new HttpHead(podcast.getUrl());

	RequestConfig requestConfig = RequestConfig.custom()
			.setSocketTimeout(TIMEOUT * 1000)
			.setConnectTimeout(TIMEOUT * 1000)
			.build();

	CloseableHttpClient httpClient = HttpClientBuilder
								.create()
								.setDefaultRequestConfig(requestConfig)
								.setConnectionManager(poolingHttpClientConnectionManager)
								.build();

	HttpResponse httpResponse = httpClient.execute(headMethod);
	int statusCode = httpResponse.getStatusLine().getStatusCode();

	if (statusCode != HttpStatus.SC_OK) {
		LOG.error("The introduced URL is not valid " + podcast.getUrl()  + " : " + statusCode);
	}

	//set the new etag if existent
	Header eTagHeader = httpResponse.getLastHeader("etag");
	if(eTagHeader != null){
		podcast.setEtagHeaderField(eTagHeader.getValue());
	}

	//set the new "last modified" header field if existent
	Header lastModifiedHeader= httpResponse.getLastHeader("last-modified");
	if(lastModifiedHeader != null) {
		podcast.setLastModifiedHeaderField(DateUtil.parseDate(lastModifiedHeader.getValue()));
		podcast.setLastModifiedHeaderFieldStr(lastModifiedHeader.getValue());
	}

	// Release the connection.
	headMethod.releaseConnection();
}

Notice

  • how the HttpClientBuilder has been used to build a ClosableHttpClient [lines 11-15], which is a base implementation of HttpClient that also implements Closeable
  • the HttpParams from the previous version have been replaced by org.apache.http.client.config.RequestConfig [lines 6-9] where I can set the socket and connection timeouts. This configuration is later used (line 13) when building the HttpClient
  • The remaining of the code is quite simple:

    • the HEAD request is executed (line 17)
    • if existant, the eTag and last-modified headers are persisted.
    • in the end the internal state of the request is reset, making it reusable – headMethod.releaseConnection()

    2.2.3. Make the http call from behind a proxy

    If you are behind a proxy you can easily configure the HTTP call by setting a org.apache.http.HttpHost proxy host on the RequestConfig:

    HttpHost proxy = new HttpHost("xx.xx.xx.xx", 8080, "http");
    RequestConfig requestConfig = RequestConfig.custom()
    		.setSocketTimeout(TIMEOUT * 1000)
    		.setConnectTimeout(TIMEOUT * 1000)
    		.setProxy(proxy)
    		.build();

    Resources

    Source Code – GitHub

  • podcastpedia-batch – the job for adding new podcasts from a file to the podcast directory, uses the code presented in the post to persist the eTag and lastModified headers; it is still work in progress. Please make a pull request if you have any improvement proposals
  • Web

    Podcastpedia image
    Codepedia.org was founded by Adrian Matei (ama [AT] codepedia dot org), a computer science engineer, husband, father, curious and passionate about science, computers, software, education, economics, social equity, philosophy.
    Subscribe to our newsletter for more code resources and news

    Adrian Matei (aka adixchen)

    Adrian Matei (aka adixchen)
    Life force expressing itself as a coding capable human being

    routerLink with query params in Angular html template

    routerLink with query params in Angular html template code snippet Continue reading