PHP vs Java

Introduction

First, a disclaimer: I favor Java, and as I start writing this I already know the winner of this comparison. However, I intend this piece to be a fair comparison, to help understand how these two languages compare.

Sometimes Java is unfairly judged by including in the package some complex and not always needed frameworks (J2EE, Spring, etc.). Without implying that these framework haven’t got their place, I intend to compare the bare platforms. Even if I’ll mention some libraries, the base platform compared is: straight PHP with Java’s JSP.

Another thing not being compared is extra-technical issues (hosting availability/price, developers avaibability/price, etc).

However, the comparison does not aim at being a theorical comparison between two abstract concepts. It aims to be down to earth and help to decide wether to use one or the other in real work.

PHP vs Java
Subject PHP Java
Integration with HTML PHP has several ways to “markup” the “scriptlets”, but the most common way is <?php, with no provision for easily printing a value. Activating the ASPtags configuration option enables Java like support, but this is not very recommended as is considered non­standard in PHP world. Java has copied from ASP the <% style. It has added an easy way to create custom tags, which get replaced by ad‐hoc Java code (taglibs).
Speed PHP is a purely interpreted language. Its completely dynamic nature makes it a very difficult language for compilation. In the PHP world, compiling just means caching the “opcodes”. The opcodes consist just in a parsed version of PHP’s source code (i.e. translating “$a = $b + 1;” in an opcode for the assignment and another for the addition. Sun has invested millions in Hotspot, a state‐of‐the‐art virtual machine that dynamically profiles the code being executed. When the VM detects a place worth of compiling, it goes ahead and compiles it right into machine code. The name of the VM, hotspot, comes from that.
Encoding support There’s no support for encoding in PHP. Everything is just bytes, and it’s up-to-you decide in which encoding you are working. There are some libraries to handle encoding, but they are not well integrated. Full Unicode support is promised for the allmighty 6.0 release, although is yet to be seen whether they will be able to retrofit encoding support without disrupting the PHP community. Java has been Unicode (UTF-16) based from the start. Every char is an unicode char, and naïve developers create unicode-ready applications without even think about it. When you do a str.length(), str.substring(), etc. you are already handling unicode chars.
Data access PHP lacks a standard data access API. It could be said that PDO is aiming to be that API, but it’s a recent addition and is not the dominating API. APIs in PHP had traditionally be DB specific (e.g. you have a “mysqli” API just for MySQL), and you need to call another set functions if you want to switch databases. Mysqli API is not well designed: prepared statement is handled completelly different than non-prepared statement, and the way you have to bind “result parameters” is anti-intuitive. Connection pooling is not a widely used feature. Java has a well designed set of clasess for handling database access: JDBC. This API is db agnostic, and you can switch databases without changing a single line of code (but you still might need to adjust your SQL). There are dozens for connection pooling implementations.
Language consistency PHP has the following “features”:
  • Everything is magically casted, integers to strings, strings to integers, strings to booleans, etc.
  • Arrays always get passed by value, so if you assign an array to a variable you are copying all its content to a new array, and modifications get lost.

Java has its issues with consistency too, mainly that arrays and primitive types are different things than the objects one can’t create. However, this is not an issue any developer would face in creating an application.

Java is also a stricter language, this can be a subjective issue. Personally, I like not being able to use a string as a boolean.

Collections PHP has nice arrays. They are both, numerically indexed, and associative (string indexed). They include an implicit pointer, and the php language itself has hice idioms for iterating over them. The downside is that keys can only be number or strings, you can’t create a map using some custom object as the key. Besides, there’s no support for collection types with other performance characteristics (ie. trees, linked lists, etc.). Java shines with its extensive collection support. It’s completelly OOP, yet simple to use and direct to the point: Maps, Lists, Sets... once you get used is a very productive API to use. All objects can be used as keys, and can be placed in efficient hash-backed sets. It’s also easy to extend the API and to create new kind of maps, lists, etc.
Execution model PHP starts, executes the page’s code and ends. Its model comes from the old CGI days. You can’t keep anything beyond the request. You have to use external utilities to manually achieve that (as APC, or even memcached).

Each request is like a simple method invocation in a long lived multi-threaded application. You are not forced to share anything with other threads, so this is not imposing additional complexity. However, you can now store useful data in application-wide variables (e.g. do a query once and store a map/array in memory).

Frameworks and libraries can also take advantage of this, and then be more powerful than they could be if they ran in PHP (e.g. Hibernate).

Chaos PHP allows for chaos. Included files get their variables from whatever file they were included, you can call normal methods as if they were static. When you call a static method, it has magically access to a caller's $this variable. As there are no "friend" or "package" visibilities, almost everything ends up being "public". All this means PHP does not scale well as teams get larger and more heterogeneus. Being a dynamically typed language complicates these issues. In Java everything is a class, period. And classes has certain, and expected, restrictions that prevent some of the ugly things that can be donde in PHP. There are no global variables. If you want something you must get it as a parameter, or get it as a static variable (precisely specifying the class that holds the value).
Refactoring Changing PHP code is very difficult, as the exact semantics are not clear from a "static view" of the code. Semantics might depend on the file that is currently "including" us, global variables which can contain any type, etc. No freely avaibale tool allows refactoring code automatically (e.g. changing a method/variable name and having that name automatically applied through all the code). As the language is simpler, the semantics are clear from the start. If you redesign a class, you break the parts of the code that used the old API. To complete the "refactoring" you have just to chase these compilation errors. IDEs are able to know exactly what your code is doing, and then they can offer intelligent refactoring options (see Eclipse refactor actions).

Was I fair? Am I missing something?


By Nicolás Lichtmaier. Questions, comments, suggestions and corrections will be well received.
From Buenos Aires, Argentina.