Welcome back to Mastering Maven. We'll continue our journey of feature discovery by learning about dependency resolution, for this we need to catch up a bit with the history of build tools in the Java space.
In the beginning there were a few command line tools distributed with the Java SDK (henceforth known as a JDK) such as the Java compiler (javac), the Java interpreter (java), the Java archiver (jar), and some more. JAR files quickly emerged as the preferred way to package and distribute class files. The JAR format allowed developers to expand the capabilities of their programs by reusing classes found in external JAR files, otherwise known as dependencies. Managing a large codebase with just javac, java, and jar is an impossible task, thus a new tool was needed when the first generation of Open Source projects came to the Java space, namely Apache Tomcat, this tool was Apache Ant.
Think of Apache Ant as a portable, platform independent version of Make, or at least that was its goal back in the day. Apache Ant lets developers create path and classpath elements that are comprised of JAR files, in turn these path and classpath elements are fed into Ant tasks (or commands if you want a simplistic view) to perform its work accordingly. The problem was (besides a boatload of XML) that files had to reside physically in the same system where the project was to be assembled, thus many developers chose to check into source control (CVS being the popular choice back then). As projects grew in size, frustration for managing JAR file dependencies grew as well; some developers were so tired (or lazy) by having to update JAR file names when a new dependency upgrade as needed that simply decided to rename all JAR files by removing version information; in this way their paths, classpaths, launch scripts, and other artifacts would remain constant. Brilliant.
The pains of dependency resolution plus having to deal with custom builds every single time, after all, Apache Ant did not provide any project conventions as we understand them today, prompted a group of Open Source enthusiasts to create Apache Maven. You'll be happy to know that the group learned from the mistakes from its predecessor ... kidding ... Apache Maven v1 did bring conventions, dependency resolution, but it was an even bigger mess of spaghetti XML than Apache Ant could ever be. Thus Apache Maven v2 was born and this time we got a much better build tool. Apache Maven 3.6.3 is the latest release at the time of writing this post.
To summarize, handling dependencies has been a pain since the early days, multiple attempts have been made by the Open Source Java Community to solve it, with Apache Maven and its POM format emerging as the leader and defacto standard. Alright, enough history and theory for today, let's continue with practical stuff.
Recall the sample class from the first article. It's a simple class with a main method, nothing more, nothing less. Let's enhance it by leveraging a very popular Java library: Apache Commons Lang. This particular library provides utilities for basic Java types; you'd be surprised to find out how many versions of a StringUtils class are found out there. If you have the budget to spare (technical debt budget that is) do yourself a favor and skip writing yet another StringUtils class and include commons-lang as a dependency.
Our current project structure remains unchanged. We have the project's build instructions (pom.xml) and a single Java class (Sample.java).
The source code now looks a bit different as we have included an import statement on commons-lang StringUtils and invoke the capitalize method. The choice of method is not important in the grand scheme of things but it'll suffice. If you attempt to build the project right now you'll get a failure, as we have not declared how the project can reach the dependency we need! For that we have to make sure that the dependency is reachable from a repository.
Apache Maven relies on the concept of repositories to resolve dependencies. Repositories come in two flavors: remote and local. Out of the box, Apache Maven supports one remote repository (Maven Central) and one local repository (Maven Local). Maven Central can be browsed and searched for dependencies. Here's for example how searching for commons-lang3 looks like
Notice the copy icon at the top right. If you click on it then you'll get the required XML snippet that must be added to your pom.xml file. http://mvnrepository.com is another choice for browsing and searching dependencies found in Maven Central, as well as additional remote repositories. Here's how the search for commons-lang3 looks like with this service
Alright, we now know what piece of XML needs to be added to our pom.xml file however it must be done so in a particular way. You see, dependencies belong together to a single block, the <dependencies> block, adding them outside of this block will likely result in an error as the POM XML format is validated with an XML schema. Armed with this knowledge the pom.xml should look like this
OK, we're ready to get going. Invoke the mvn verify command in the console. Barring any network issues or firewall misconfigurations you should see and output similar to the next one.
Notice that Apache Maven recognizes that a dependency is needed and downloads it, then proceeds to compile the project. Good. Let's invoke mvn verify once again.
This time Apache Maven correctly identifies that it doesn't have to download the commons-lang3 dependency because it did it in the last session. Alright, let's try to be sneaky and delete all compiled and computed results by invoking mvn clean verify.
Well, Apache Maven did it again. It figured out that the dependency is already available, did not incur a network hit and happily compiled the project. What's going on here? Recall that Apache Maven has 2 repositories available by default: one remote (Maven Central) which was used to locate and download the commons-lang3 dependency on the first compilation session; and one local (Maven Local) that's used as a local cache for all downloaded artifacts. This is the place where commons-lang3 has been downloaded to, you can verify this by inspecting the default location for Maven Local, which happens to be ~/.m2/repository
The behavior we just observed will occur for every single dependency you define in a pom.xml file. As a matter of fact, Apache Maven follows a series of rules to determine how to resolve dependencies. It's a bit more elaborate but can be simplified to the following:
Well, that was easy except that I just mentioned "remote repositories". It's possible to configure additional repositories but that is a topic for a later blog post. Coming back to Maven Central, as we can survey it's just a set of directories and files that follow a conventional structure; there's no access rights, no permissions, as it's assumed that the user invoking the mvn command has all access. Now for remote repositories the picture is a different one. These are complex applications that can apply access rights, permissions, enforce deploy and download rules, and more. You'll find commercial and Open Source options out there. If you're working on a large organization chances are you have encountered a custom remote repository in one way or another.
By no means we have finished all that there is to know about dependency handling and resolution with Apache Maven. We'll come back to this topic once we have covered other concepts that we need first.
One more thing now that we're speaking about dependencies and Maven Central, for a long time developers have asked Oracle to publish the Oracle JDBC drivers to a public repository. This wish was finally fulfilled in September 2019. You can find more details in this announcement.
In closing, dependencies enable code reuse by sharing classes packaged in JAR files. The history of dependency resolution has been painful (and we have yet to find the optimal solution) but the POM format is what most of us agree to use for the time being. Apache Maven makes it dead simple to define dependencies, which can be resolved from repositories, of which there are two kinds: local and remote. We also learned a second command: the clean command, used to delete the computed results of a build.