Thursday Nov 12, 2009

Java Garbage Collection, Monitoring and Tuning

Java garbage collection, monitoring and tuning Yesterday I gave a talk at a the Jacksonville JUG about Java garbage collection, monitoring and tuning, which included a demo of Finding Memory Leaks Using the NetBeans Profiler and a demo of the  VisualGC plugin for  VisualVM


You can view or download the presentation here

Java garbage collection, monitoring and tuning

References and More Information:

Thursday Oct 15, 2009

The Top 10 Web Application security vulnerabilities

Yesterday I gave a talk at a the Jacksonville JUG about the  Top 10 most critical web application security vulnerabilities identified by the Open Web Application Security Project (OWASP).

You can view or download the presentation here

Top 10 Web Security Vulnerabilities

References and More Information:

You can use OWASP's WebGoat to learn more about the OWASP Top Ten security vulnerabilties. WebGoat is an example web application, which has lessons showing "what not to do code", how to exploit the code, and corrected code for each vulnerability.

You can use the OWASP Enterprise Security API Toolkit to protect against the OWASP Top Ten security vulnerabilties.

The ESAPI Swingset is a web application which demonstrates the many uses of the Enterprise Security API.

Thursday Oct 08, 2009

OWASP Top 10 number 3: Malicious File Execution

Number 3 in the Top 10 most critical web application security vulnerabilities identified by the Open Web Application Security Project (OWASP) is Malicious File Execution, which occurs when attacker's files are executed or processed by the web server. This can happen when an input filename is compromised or an uploaded file is improperly trusted.


  • file is accepted from the user without validating content
  • filename is accepted from the user
In the example below a file name is accepted from the user and appended to the server's filesystem path.
// get the absolute file path on the server's filesystem 
String dir = servlet.getServletContext().getRealPath("/ebanking")
// get input file name
String file = request.getParameter(“file”); 
//  Create a new File instance from pathname string   
File f = new File((dir + "\\\\" + file).replaceAll("\\\\\\\\", "/")); 

If the filename was compromised to  ../../web.xml , it might allow access to web server properties

Malicious File Execution can result in:

  • files loaded from another server and executed within the context of the web server
  • modifying paths to gain access to directories on the web server
  • malicious scripts put into a directory with inadequate access controls

Protecting against Malicious File Execution

  • the Java EE Security Manager should be properly configured to not allow access to files outside the web root.
  • do not allow user input to influence the path name for server resources
    • Inspect code containing a file open, include, create, delete...
  • firewall rules should prevent new outbound connections to external web sites or internally back to any other server. Or isolate the web server in a private subnet
  • Upload files to a destination outside of the web application directory.
    • Enable virus scan on the destination directory.

Java specific Protecting against Malicious File Exection

Use the OWASP ESAPI  HTTPUtilities interface:

  • The ESAPI HTTPUtilities interface is a collection of methods that provide additional security related to HTTP requests, responses, sessions, cookies, headers, and logging.

    The HTTPUtilities getSafeFileUploads method uses the Apache Commons FileUploader to parse the multipart HTTP request and extract any files therein
    public class HTTPUtilities 
        public void getSafeFileUploads( tempDir,
                            throws ValidationException

References and More Information:

Friday Oct 02, 2009

Top 10 web security vulnerabilities number 2: Injection Flaws

OWASP Top 10 number 2: Injection Flaws

Number 2 in the Top 10 most critical web application security vulnerabilities identified by the Open Web Application Security Project (OWASP) is Injection Flaws. Injection happens whenever an attacker's data is able to modify a query or command sent to a database, LDAP server, operating system or other Interpreter. Types of injections are SQL, LDAP, XPath, XSLT, HTML, XML, OS command... SQL injection and Cross-Site Scripting account for more than 80% of the vulnerabilities being discovered against Web applications (SANS Top Cyber Security Risks).

SQL Injection Example

Use of string concatenation to build query: SQL Injection can happen with dynamic database queries concatenated with user supplied input, for example with the following query:
 "select \* from MYTABLE where name=" + parameter
if the user supplies "name' OR 'a'='a' " as the parameter it results in the following:
"select \* from MYTABLE where name= 'name' OR 'a'='a'; 
the OR 'a'='a' causes the where clause to always be true which is the equivalent of the following:
"select \* from MYTABLE; 
if the user supplies "name' OR 'a'='a' ; delete from MYTABLE" as the parameter it results in the following:
"select \* from MYTABLE where name= 'name' OR 'a'='a'; delete from MYTABLE;
the OR 'a'='a' causes the where clause to always be true which is the equivalent of the following:
"select \* from MYTABLE; delete from MYTABLE;
some database servers, allow multiple SQL statements separated by semicolons to be executed at once.

SQL Injection can be used to:
  • create , read , update, or delete database data

Protecting against SQL Injection

  • Don't concatenate user input data to a query or command!
    • Use Query Parameter binding with typed parameters, this ensures the input data can only be interpreted as the value for the intended parameter so the attacker can not change the intent of a query.
  • Validate all input data to the application using white list (what is allowed) for type, format, length, range, reject if invalid. (see previous blog entry)
  • don't provide too much information in error messages (like SQL Exception Information, table names..) to the user.

Java specific Protecting against SQL Injection

Don't concatenate user input data to a query or command:

  • Don't do this with JDBC:
    String empId= req.getParameter("empId") // input parameter
    String query = "SELECT \* FROM Employee WHERE 
                         id = '" + empId +"'";  
  • Don't do this with JPA:
    q = entityManager.createQuery(“select e from Employee e WHERE ”
    		+ “ = '” + empId + “'”);

Use Query Parameter binding with typed parameters

  • With JDBC you should use a PreparedStatement and set values by calling one of the setXXX methods on the PreparedStatement object, For example:
    String selectStatement = "SELECT \* FROM Employee WHERE id = ? ";
    PreparedStatement pStmt = con.prepareStatement(selectStatement);
    pStmt.setString(1, empId);
    This sets the first question mark placeholder to the value of the input parameter empId in the SQL command. Any dangerous characters - such as semicolons, quotes, etc.. should be automatically escaped by the JDBC driver.

  • With JPA or Hibernate you should use Named Parameters. Named parameters are parameters in a query that are prefixed with a colon (:). Named parameters in a query are bound to an argument by the javax.persistence.Query.setParameter(String name, Object value) method. For example:
    q = entityManager.createQuery(“select e from Employee e WHERE ”
                 + “ = ':
    This sets the id to the empId in the SQL command, again any dangerous characters should be automatically escaped by the JDBC driver.

  • With JPA 2.0 or Hibernate you can use the Criteria API. The JPA 2.0 criteria API providies a typesafe object-based Query API based on a metamodel of the Entity classes, rather than a string-based Query API. This allows you to develop queries that a Java compiler can verify for correctness at compile time. Below is an example using the Criteria API for the same query as before :

    QueryBuilder qb = em.getQueryBuilder();
    Employee> q = qb.createQuery(Employee.class);
    Employee> e = q.from(Employee.class);
    id = cb.parameter(String.class);

    Employee> query = em.createQuery(, id) );
    id, empId);

References and More Information:

Tuesday Sep 29, 2009

The Top 10 Web Application security vulnerabilities starting with XSS

This and the next series of blog entries will highlight the Top 10 most critical web application security vulnerabilities identified by the Open Web Application Security Project (OWASP).

You can use OWASP's WebGoat to learn more about the OWASP Top Ten security vulnerabilties. WebGoat is an example web application, which has lessons showing "what not to do code", how to exploit the code, and corrected code for each vulnerability.

You can use the OWASP Enterprise Security API Toolkit to protect against the OWASP Top Ten security vulnerabilities.

The ESAPI Swingset is a web application which demonstrates the many uses of the Enterprise Security API.

OWASP Top 10 number 1: XSS = Cross Site Scripting

Cross Site Scripting (XSS) is one of the most common security problems in today's web applications. According to the SANS Top Cyber Security Risks, 60% of the total attack attempts observed on the Internet are against Web applications and SQL injection and Cross-Site Scripting account for more than 80% of the vulnerabilities being discovered. You are at risk of an XSS attack any time you put content that could contain scripts from someone un-trusted into your web pages.
There are 3 types of cross site scripting:
  • Reflected XSS: is when an HTML page reflects user input data, e.g. from HTTP query parameters or a HTML form, back to the browser, without properly sanitizing the response. Below is an example of this in a servlet:
     out.writeln(“You searched for: “+request.getParameter(“query”);
  • Stored XSS: is when an Attacker’s input script is stored on the server (e.g. a database) and later displayed in the web server HTML pages, without proper HTML filtering. Examples of this are in blogs, or forums where users can input data that will be displayed to others. Below is an example of this in a servlet, where data is retrieved from the database and returned in the HTML page without any validation:
    out.writeln("<tr><td>" + + "<td>" + guest.comment); 
  • DOM XSS: is when JavaScript uses input data or data from the server to write dynamic HTML (DOM) elements, again without HTML sanitizing/escaping/filtering.

XSS can be used to:
  • deface web pages
  • hijack user sessions
  • conduct phishing attacks
  • execute malicious code in the context of the user's session
  • spread malware

Protecting against XSS

To protect against XSS all the parameters in the application should be validated and/or encoded before being output in HTML pages.
  • Always validate on the server side for data integrity and security:
    • Validate all input data to the application for type, format, length, range, and context before storing or displaying.
    • Use white-listing (what is allowed), reject if invalid, instead of filtering out black-list (what is not allowed).
  • Output encoding:
    • Explicitly set character encoding for all web pages (ISO-8859-1 or UTF 8):
      <%@ page contentType="text/html;charset=ISO-8859-1" language="java" %>
    • all user supplied data should be HTML or XML entity encoded before rendering.

Java specific Protecting against XSS

Validating Input with Java

  • You can use Java regular expressions to validate input, this example from WebGoat allows whitespace, a-zA-Z_0-9, and the characters - and ,
    String regex = "[\\\\s\\\\w-,]\*";
    Pattern pattern = Pattern.compile(regex);
    validate(stringToValidate, pattern);
  • Use Framework (Struts, JSF, Spring...) validators. With Java EE 6 you can use the Bean Validation Framework to centrally define validation constraints on model objects and with JSF 2.0 to extend model validation to the UI. For example here is a JSF 2.0 input field:
    <h:inputText id="creditCard" value="#{booking.creditCardNumber}"/>
    Here is the JSF 2.0 booking Managed Bean using the Bean Validation Framework :
    public class Booking { 
     @NotNull(message = "Credit card number is required") 
     @Size(min = 16, max = 16, 
     message = "Credit card number must 16 digits long") 
     @Pattern(regexp = "\^\\\\d\*$", 
     message = "Credit card number must be numeric") 
     public String getCreditCardNumber() { 
     return creditCardNumber; 
    In addition there are new JSF 2.0 Validators:
    • <f:validateBean> is a validator that delegates the validation of the local value to the Bean Validation API.
    • <f:validateRequired> provides required field validation.
    • <f:validateRegexp> provides regular expression-based validation

  • Use the OWASP Enterprise Security API Java Toolkit's Validator interface:
    ESAPI.validator().getValidInput(String context,String input,String type,int maxLength,
       boolean allowNull,ValidationErrorList errorList)
    ESAPI.validator().getValidInput() returns canonicalized and validated input as a String. Invalid input will generate a descriptive ValidationErrorList, and input that is clearly an attack will generate a descriptive IntrusionException.

Output Encoding with Java

  • You can use Struts output mechanisms such as <bean:write… >, or use the default JSTL escapeXML="true" attribute in <c:out … > 
  • JSF output components filter output and escape dangerous characters as XHTML entities.
    <h:outputText value="#{}"/>

  • You can use the OWASP Enterprise Security API Java Toolkit's ESAPI Encoder.encodeForHTML() method to encode data for use in HTML content. The encodeForHTML() method uses a "whitelist" HTML entity encoding algorithm to ensure that encoded data can not be interpreted as script. This call should be used to wrap any user input being rendered in HTML element content. For example:
    <p>Hello, <%=ESAPI.encoder().encodeForHTML(name)%></p>

References and More Information:

Thursday Sep 17, 2009

Some Concurrency Tips

Here is a review of some concurrency tips from Joshua Bloch, Brian Goetz and others.

Prefer immutable objects/data

Immutable objects do not change after construction. Immutable objects are simpler, safer, require no locks, and are thread safe. To make an object immutable don't provide setters/mutator methods, make fields private final, and prevent subclassing. If immutability is not an option, limit mutable state, less mutable state means less coordination.  Declare fields final wherever practical, final fields are simpler than mutable fields.

When threads share mutable data, each thread that reads or writes must coordinate access to the data. Failing to synchronize shared mutable data can lead to atomicity failures, race conditions, inconsistent state, and other forms of non-determinism. These erratic problems are among the most difficult to debug.

Limit concurrent interactions to well defined points, limit shared data, consider copying instead of sharing.

Threading risks for Web applications

A Servlet get, post, service method can be called for multiple clients at the same time. Multi-threaded Servlet Instance and Static variables are shared and therefore if mutable, access must be coordinated. Servlets are typically long-lived objects with a high thread load, if you over-synchronize performance suffers, try to either share immutable (final) data, or don’t share at all, request arguments and local variables are safer.

Hold Locks for as short a time as possible

Do as little work as possible inside synchronized regions.  Move code that doesn't require the lock out of synchronized block, especially if  time-consuming(!).

The Lock interface provides more extensive locking operations than using a synchronized block, one advantage is the ability to not block if a lock is not available.  You should obtain the lock, read or write shared data only as necessary, and unlock within a finally clause to ensure that the lock is released. Below is an example using a ReentrantReadWriteLock:

A way to reduce the time that a lock is held is lock splitting or lock striping, which uses different locks for state variables instead of a single lock. This reduces the lock granularity, allowing greater scalability but you must take locks in a disciplined order or risk deadlock.

Prefer executors and tasks to threads

Instead of working directly with threads, use the Java Concurrency Utilities Executor Framework. The Executor service decouples task submission from execution policy. Think in terms of runnable tasks and let an executor service execute them for you.

Executors can be created either directly or by using the factory methods in the Executors class:

Here is an example that uses the Executor, Executors and ExecutorService classes:

The example is a web service class that handles multiple incoming connections simultaneously with a fixed pool of threads. A fixed thread pool is initialized with the newFixedThreadPool method of the Executors class which returns an ExecutorService object. Incoming connections are handled by calling execute on the ExecutorService pool object, passing it a Runnable object. The Runnable object's run method processes the connection. When the run method completes the thread will automatically be returned to the thread pool. If a connection comes in and all threads are in use, then the main loop will block until a thread is freed.

Prefer Concurrency utilities to wait and notify

Whenever you are about to use wait and notify check and see if there is a class in java.util.concurrent that does what you need.  The concurrent collections provide high-performance concurrent implementations of standard collection interfaces such as List, Queue, and Map.

BlockingQueues are concurrent queues extended with blocking methods, which wait (or block) until an element becomes available for retrieving or space becomes available for storing.

Producer Consumer Pattern

Blocking queues are useful for the Producer Consumer Pattern where producer threads enqueue work items and consumer threads dequeue and process work items. Below is an example of a Consumer Pattern for a logger used by multiple threads. The Logger constructor takes a BlockingQueue as an input argument. In the run method messages are retrieved from the queue and logged. When the queue is empty the logging thread will block until an message becomes available for retrieving.

Below is an example of a Producer that uses the logger. A new ArrayBlockingQueue is instantiated for passing to the logger constructor. In the run method messages are put into the queue for logging. If the queue is full, the put will block until the logger has removed messages.


Synchronizers are objects that help with coordinating access between threads. The most used synchronizers are CountDownLatch and Semaphore.  The use of synchronizers can eliminate most uses of wait or notify.

Below is an example of the use of a semaphore to control access to pool of resources. Multiple threads can request the use of a resource and return it when they have finished with it.

In the creator we create a new semaphore with the same size as the pool of resources we're creating.

In the getResource() method the semaphore aquire method is called to try to aquire a permit to use a resource. If there are resources available this will return and a resource will be returned from the pool. If all the resources are in use the call to aquire will block until another thread calls release on the semaphore.  When a thread finishes with a resource the resource is returned to the pool and the release method is called. Both aquire and release can be considered atomic operations.

Multithreaded Lazy Initialization is tricky

When threads share a lazily initialized field, access to the field must be synchronized, or non-determinism type bugs can result.

Prefer Normal initialization

Don't use lazy initialization unless an object or field is costly to initialize and not used often. Normally normal initialization is best ;) Below is a thread safe example of eager initialization for a singleton, the private final instance field and the private constructor make it immutable.

If you need to use lazy initialization for performance on a static field, use the initialize-on-demand holder pattern. This pattern takes advantage of the guarantee that a class will not be initialized until it is used :

References and More Information:

Effective Java, Second Edition by Joshua Bloch
Java Concurrency in Practice by Brian Goetz
Robust and Scalable Concurrent Programming: Lessons from the Trenches
Concurrency: Past and Present

Friday Aug 28, 2009

JPA Performance, Don't Ignore the Database


Database Schema

Good Database schema design is important for performance. One of the most basic optimizations is to design your tables to take as little space on the disk as possible , this makes disk reads faster and uses less memory for query processing.

Data Types

You should use the smallest data types possible, especially for indexed fields. The smaller your data types, the more indexes (and data) can fit into a block of memory, the faster your queries will be.


Database Normalization eliminates redundant data, which usually makes updates faster since there is less data to change. However a Normalized schema causes joins for queries, which makes queries slower, denormalization speeds retrieval. More normalized schemas are better for applications involving many transactions, less normalized are better for reporting types of applications.  You should normalize your schema first, then de-normalize later.  Applications often need to mix the approaches, for example use a partially normalized schema, and duplicate, or cache, selected columns from one table in another table. With JPA O/R mapping you can use the @Embedded annotation for denormalized columns to specify a persistent field whose @Embeddable type can be stored as an intrinsic part of the owning entity and share the identity of the entity.

Database Normalization and Mapping Inheritance Hiearchies

The Class Inheritance hierarchy shown below will be used as an example of JPA O/R mapping.

In the Single table per class mapping shown below, all classes in the hierarchy are mapped to a single table in the database. This table has a discriminator column (mapped by @DiscriminatorColumn), which identifies the subclass.  Advantages: This is fast for querying, no joins are required. Disadvantages:  wastage of space since all inherited fields are in every row, a deep inheritance hierarchy will result in wide tables with many, some empty columns.

In the Joined Subclass mapping shown below, the root of the class hierarchy is represented by a single table, and each subclass has a separate table that only contains those fields specific to that subclass. This is normalized (eliminates redundant data) which is better for storage and updates. However queries cause joins which makes queries slower especially for deep hierachies, polymorphic queries and relationships.

In the Table per Class mapping (in JPA 2.0, optional in JPA 1.0),  every concrete class is mapped to a table in the database and all the inherited state is repeated in that table. This is not normlalized, inherited data is repeated which wastes space.  Queries for Entities of the same type are fast, however  polymorphic queries cause unions which are slower.

Know what SQL is executed

You need to understand the SQL queries your application makes and evaluate their performance. Its a good idea to enable SQL logging, then go through a use case scenario to check the executed SQL.  Logging is not part of the JPA specification, With EclipseLink you can enable logging of SQL by setting the following property in the persistence.xml file:

    <property name="eclipselink.logging.level" value="FINE"/>

With Hibernate you set the following property in the persistence.xml file:

    <property name="hibernate.show_sql" value="true" />

Basically you want to make your queries access less data, is your application retrieving more data than it needs, are queries accessing too many rows or columns? Is the database query analyzing more rows than it needs? Watch out for the following:
  • queries which execute too often to retrieve needed data
  • retrieving more data than needed
  • queries which are too slow
    • you can use EXPLAIN to see where you should add indexes

With MySQL you can use the slow query log to see which queries are executing slowly, or you can use the MySQL query analyzer to see slow queries, query execution counts, and results of EXPLAIN statements.

Understanding EXPLAIN

For slow queries, you can precede a SELECT statement with the keyword EXPLAIN  to get information about the query execution plan, which explains how it would process the SELECT,  including information about how tables are joined and in which order. This helps find missing indexes early in the development process.

You should index columns that are frequently used in Query WHERE, GROUP BY clauses, and columns frequently used in joins, but be aware that indexes can slow down inserts and updates.

Lazy Loading and JPA

With JPA many-to-one and many-to-many relationships lazy load by default, meaning they will be loaded when the entity in the relationship is accessed. Lazy loading is usually good, but if you need to access all of the "many" objects in a relationship, it will cause n+1 selects where n is the number of "many" objects.

You can change the relationship to be loaded eagerly as follows :

However you should be careful with eager loading which could cause SELECT statements that fetch too much data. It can cause a Cartesian product if you eagerly load entities with several related collections.

If you want to override the LAZY fetch type for specific use cases, you can use Fetch Join. For example this query would eagerly load the employee addresses:

In General you should lazily load relationships, test your use case scenarios, check the SQL log, and use @NameQueries with JOIN FETCH to eagerly load when needed.


the main goal of partitioning is to reduce the amount of data read for particular SQL operations so that the overall response time is reduced

Vertical Partitioning  splits tables with many columns into multiple tables with fewer columns, so that only certain columns are included in a particular dataset, with each partition including all rows.

Horizontal Partitioning segments table rows so that distinct groups of physical row-based datasets are formed. All columns defined to a table are found in each set of partitions. An example of horizontal partitioning might be a table that contains historical data being partitioned by date.

Vertical Partitioning

In the example of vertical partitioning below a table that contains a number of very wide text or BLOB columns that aren't referenced often is split into two tables with the most referenced columns in one table and the seldom-referenced text or BLOB columns in another.

By removing the large data columns from the table, you get a faster query response time for the more frequently accessed Customer data. Wide tables can slow down queries, so you should always ensure that all columns defined to a table are actually needed.

The example below shows the JPA mapping for the tables above. The Customer data table with the more frequently accessed and smaller data types  is mapped to the Customer Entity, the CustomerInfo table with the less frequently accessed and larger data types is mapped to the CustomerInfo Entity with a lazily loaded one to one relationship to the Customer.

Horizontal Partitioning

The major forms of horizontal partitioning are by Range, Hash, Hash Key, List, and Composite.

Horizontal partitioning can make queries faster because the query optimizer knows what partitions contain the data that will satisfy a particular query and will access only those necessary partitions during query execution. Horizontal Partitioning works best for large database Applications that contain a lot of query activity that targets specific ranges of database tables.

Hibernate Shards

Partitioning data horizontally into "Shards" is used by google, linkedin, and others to give extreme scalability for very large amounts of data. eBay "shards" data horizontally along its primary access path.

Hibernate Shards is a framework that is designed to encapsulate support for horizontal partitioning into the Hibernate Core.


JPA Level 2 caching avoids database access for already loaded entities, this make reading reading frequently accessed unmodified entities faster, however it can give bad scalability for frequent or concurrently updated entities.

You should configure L2 caching for entities that are:
  • read often
  • modified infrequently
  • Not critical if stale
You should also configure L2 (vendor specific) caching for maxElements, time to expire, refresh...

References and More Information:

JPA Best Practices presentation
MySQL for Developers Article
MySQL for developers presentation
MySQL for developers screencast
Keeping a Relational Perspective for Optimizing Java Persistence
Java Persistence with Hibernate
Pro EJB 3: Java Persistence API
Java Persistence API 2.0: What's New ?
High Performance MySQL book
Pro MySQL, Chapter 6: Benchmarking and Profiling
EJB 3 in Action
sharding the hibernate way
JPA Caching
Best Practices for Large-Scale Web Sites: Lessons from eBay




« April 2014