Wednesday, October 17, 2012

How to escape HTML Special characters in JSP and Java

Escaping HTML special characters in JSP or Java is common task for Java programmers. There are many ways to escape HTML meta characters in Java, some of them we have already seen in last article escaping XML meta characters in Java.  For those who are not familiar with HTML special characters, there are five e.g. < , >, &, ' and '' and if you want to print them literally just like here, Than you need to escape those character so < becomes &lt; , > becomes &gt; and so on. Of course you can write your own custom tag or method for converting HTML special characters to entity format which browser understand but you don't need to do this because there are more easy and standard way to escape HTML special characters in JSP and Java. In this JSP and Java tutorial we will learn about HTML special characters and explore some techniques to escape them in JSP pages and Java code. By the way this is also a popular JSP Interview question mostly asked to 2 years experience programmers.

List of special HTML Characters needs escaping
Here is a list of special HTML characters which needs to be escaped in order to be displayed as it is literally in browser. Good thing is there are only five characters that are requires escaping.

>  - &lt;
<  - &gt;
&  - &amp;
'  - &#039;
'' - &#034;

How to escape special HTML Characters in JSP

How to escape HTML special characters in JSP and Java program
In JSP if you are using EL or JSP expression for displaying String you must have faced issue related to HTML Special characters. Suppose you are printing ${info} and if info contains special HTML characters like < or > they will not be displayed literally like that instead they will be interpreted as opening and closing tag by browser. Here is a common example which shows issue caused by HTML special characters. Suppose In dispaly.jsp we have following JSP code

<body>
     <%
   request.setAttribute("specialCharString", "<i> is called italic tag");
    %>
 
    HTML: ${specialCharString}
</body>

Output:
HTML: is called italic tag

It didn't print <i> instead it make the text "is called italic tag"  italic because browser interpreted "<" angle bracket as opening tag. if you want to display angle bracket as it is you need to escape it and instead of "<" you need to use &lt;
so if you change "specialCharString" to "&lt;i&gt; is called italic tag" its called escaping HTML special characters and it will display the text "<i> is called italic tag" as it is. Now instead of doing manually there are two ways to escape HTML characters in JSP

1) by using <c:out> tag
2) by using EL function fn:escapeXml(string)

<c:out> tag has an attribute called "escapeXml" if its true it escapes all HTML special character in "value" attribute. So,
if you use <c:out value=${specialCharString} escapeXml='true'/> it will display exact text with HTML special characters like "<" will be displayed as angle bracket. Here is modified code example of displaying HTML special characters using JSTL core <c:out> tag:

<body>
<%
request.setAttribute("specialCharString", "<i> is called italic tag");
%>
 
HTML: <c:out value="${specialCharString}" escapeXml="true"/>
</body>

Output:
HTML: <i> is called italic tag

Also by default escapeXml is true so <c:out/> is equivalent to <c:out escapeXml='true'/>

Another way to escape XML or HTML special character in JSP is by using EL (Expression Language) function called escapeXml(string). In order to use this function you need to import functions from JSTL library by using @taglib directive. here is an example of using EL function for display special HTML characters:

<%@taglib uri="http://java.sun.com/jsp/jstl/functions" prefix="fn" %>
HTML: ${fn:escapeXml("<i> is called italic tag")}

Output:
HTML: <i> is called italic tag

Good part of both approach is that they are part of JSTL core library so you don't need to add any more dependency for this
functionality.

How to escape HTML Special Characters in Java
Even in core Java, If you are working with HTML or xml document you need to escape those HTML special characters in order to display them as it is. There are lots of open source library available which allows you to handle HTML special characters.
here are some of them:

1) StringEscapeUtils from Apache's commons lang library.
2) HtmlUtils from Spring
3) Own custom method using String replace

here is complete code example of using both Apache Commons StringEscapeUtils and Spring framework’s HtmlUtils for escaping HTML special characters:

import org.apache.commons.lang.StringEscapeUtils;
import org.springframework.web.util.HtmlUtils;

/**
 * Java program to escape String in Java and HTML.
 * This program converts HTML meta characters to there escape form.
 */

public class HtmlEscapeExample {

    public static void main(String args[])  {
        String input = "This String contains HTML Special characters requires encoding e.g. < and >";
        System.out.println("Input: " + input);
        System.out.println("Conversion using Spring HtmlUtils: " + HtmlUtils.htmlEscape(input));
        System.out.println("Conversion using Apache commons StringEscapeUtils: " + StringEscapeUtils.escapeHtml(input));

    }  
}

Output:
Input: This String contains HTML Special characters requires encoding e.g. < and >
Conversion using Spring HtmlUtils: This String contains HTML Special characters requires encoding e.g. &lt; and &gt;
Conversion using Apache commons StringEscapeUtils: This String contains HTML Special characters requires encoding e.g. &lt; and &gt;


That's all on how to escape HTML special characters in JSP and Java code. we have seen JSTL <c:out> tag to escape  HTML in JSP and Spring's HtmlUtils for escaping HTML in Java, these are my preferred way. On a side note I would also say that use
<c:out> tag for displaying String in JSP because it prevent cross site hijacking by displaying danger java-script code as it is by escaping HTML special character entered by user.

Related Java programming tutorials for beginners

5 comments:

  1. The programmer must explicitly invoke or fn:escapeXml. If the programmer forgets to do so and the data being rendered was supplied by the user, then the application is vulnerable to cross-site scripting. Here is a way to escape EL valuess by default:
    http://pukkaone.github.com/2011/01/03/jsp-cross-site-scripting-elresolver.html

    ReplyDelete
  2. Great article it helped me a lot. Thank you!!!

    ReplyDelete
  3. in java application its working fine, where come to web application its not working, I'm using springs and eclispe IDE . I'm accepting all languages values , there I'm converting the values by htmlEscape and storing the values, now i want those values to convert back to the exact value in java , i used StringEscapeUtils.unescapeHtml its showing me output like this ????? can anyone help me.

    ReplyDelete

Java67 Headline Animator