What's That Noise?! [Ian Kallen's Weblog]

« YAQPM: Debian's... | Main | Dumping Java object... »

20040414 Wednesday April 14, 2004

PHP versus J2EE I don't like language wars, different languages and frameworks usually have a mix of strengths and weaknesses. However, I'm presently looking at some fixes (mostly resiliency issues) that have to be applied to a web tier built on PHP and am thinking about what's over the horizon for future scaling requirements.

I first used PHP in 1996 (it was called PHP/FI at the time) -- it offered a very innovative alternative to Perl and CGI generated content by making the execution flow out UI embedded logic. I liked it. The language was easy; very Perl-ish. But over the years, other frameworks have emerged that, despite the many improvements PHP has enjoyed, easily eclipse PHP.

PHP's primary strength is its enablement of rapid development of database backended web applications. For prototyping and providing "functional mockups", that's all well and good. But what I'm seeing is that as soon as you want to scale the application along some axis (runtime traffic, i18l, collaborative development, multiple presentation formats), it's difficult to justify sticking with PHP. From what I reckon, a J2EE web tier (servlet container, JSP w/jstl taglibs, MVC i.e. struts, etc) offers a lot of high level infrastructure and, when contrasted with PHP, is a clear winner:

  PHP J2EE
Runtime scaling
  • There are opcode caching extensions but it's not a core component of the PHP runtime environment
  • PHP is not thread-safe. You pretty much have to run PHP on a pre-forked Apache server. That's not too bad but it means that if you have a large Apache process pool, each process has to have its own connection to a database. If you have multiple persistent connections and a few hundred Apache processes, you're taxing the databases (because of the large number of open connections) and taxing the OS (because of the large number file descriptors open and memory consumption). And every forked process is an island of process space; sharing data between them is a PITA (you have to manage shared memory) and sharing resources is impossible (e.g. forget about pooling your persistent database connections).
  • Servlet container hosted runtime engines (such as JSP or Velocity) usually have code caching out-of-the-box. Higher level caches such as OSCache promise to provide even more to reduce the computational burden on the web tier.
  • They easily take advantage of resources shared between threads such as database connection pools, external service interfaces (e.g. remote and/or local EJB factories; the home interfaces) and the container itself.
Content format scaling If you want to co-brand a PHP site, plan on having conditional logic scattered and tangled all over the UI code. This quickly degrades down into a maintenance nightmare. A J2EE MVC framework such as Jakarta Struts (with tiles) provides a centralized mechanism to declaritively and programmatically control how content is assembled for presentation.
Locale scaling PHP allow you to extend the runtime with the GNU gettext framework. IMO, this is a difficult to use system; you have to manage PO files that have a peculiar file format. The standard Java library has ResourceBundle support built-in with a simple file format (properties files) and, again, web tier infrastructure such as the servlet container, JSTL and Struts provides easy-to-use tools to access ResourceBundles.
Functional scaling
  • Extending PHP often requires recompiling the PHP runtime to statically link in different extensions. This results in closely binding your runtime requirements and your infrastructure compilation requirements.
  • Maintaining persistence flexibility is difficult with PHP; while the PEAR abstraction layer provides a baseline of RDBMS independence the language is still bloated with database function calls with every backend you use. Furthermore, as outlined above, changing backends requires platform dependent code, perhaps recompiling PHP itself and then having client libraries installed where the runtime is executing.
  • Most Java libraries are available as "Pure Java" i.e. jni support, deploying OS-native libraries, isn't an issue. You never have to recompile Java to extend it.
  • JDBC provides a standard interface to different database backends. While this won't help with things like SQL implementation differences, higher level tools like Hibernate's object-relational framework potentially will. Furthermore most database drivers are "Pure Java" i.e. there's no need to recompile anything and no need to, say, make sure LD_LIBRARY_PATH is set properly to enable your database drivers.
Collaborative scaling The distinction between UI code and business logic is fuzzy and requires lots of developer discipline to keep it cleanly separated; there's little support in the framework itself. This pretty guarantees closely mingling formatting and display code with lots of logic. Yuck. While it's certainly possible to write horrible applications with J2EE web infrastructure (i.e. the the fact that you can embed Java code, er, scriptlets, directly into a JSP is a terrible fact of life), the wealth of framework support (JSTL, Struts, etc) makes it easy to follow practices that keep the separation of concerns clean.
A clean separation of concerns is important. Collaborative web application development usually has different roles (the Java programmer providing business API's and beans, the UI layout coder developing the HTML structures, the UI logic programmer writing tag libraries and/or integrating them with the HTML structures) -- even if you're one person coding all of these out, having the different work elements clearly defined and cleanly separated will make your job easier. When you're working in a collaborative environment, the separation of concerns is essential.

There are other little things in the pro's and con's. PHP can support URL rewriting session tracking without doing anything special in the markup code -- nice that it can do that unintrusively but icky on other counts

I'd rather use a taglib in a servlet container to keep track of the URL rewriting needed, if any.

All of these points of contrast lay atop the basic structural differences between the PHP and Java languages: Java has Object Oriented Programming (OOP) as a core part of it's design, PHP has OOP as an odd afterthought. Java has real exception handling, PHP awkwardly provides function calls to register error handlers and function to trigger an error. OO is a core element of code modularization, reuse and extension.

Additionally, while I have my misgivings about EJB's and their misuse, EJB's are an established framework for separating the business and persistence tiers from that of the UI. Service oriented architecture (SOA) patterns are also well established amongst the J2EE development community -- this also better enables collaborative development and clean separations of concern, ergo, long term maintainability.

A final point of comparison and contrast is how closely bound PHP is to the web server in a typical architecture. It's nice to have the ability to scale the HTTP interface independently from the application layer. With PHP, the only option there is to run another webserver; a reverse proxy to offload all of the HTTP servicing. With a servlet container there are various options to connect the HTTP interface to the Java engine via a connector. For example with Apache and Tomcat, you can use mod_jk to connect them, and just to sweeten the deal a bit, mod_jk provides a bit of scaffolding to support load balancing.

It's not my intention to sweepingly indict PHP. For the simple stuff, I like it. And I know it's extremely popular ("50 Million Elvis Fans Can't Be Wrong!"). But when you get beyond needing "Server Side Includes On Steroids" and have to deploy a scaled up database backended application, PHP's weaknesses come to the foreground. Yea, yea... I know that Yahoo! uses PHP and I bet there's a lot to learn from their experiences with it -- I expect PHP creator Rasmus Lerdorf's employment with Yahoo! to drive a lot of innovation in PHP's future. But at this point in time, for the web application requirements I'm looking at, I gotta give it to J2EE.

Other links of interest

( Apr 14 2004, 12:12:22 PM PDT ) Permalink
Comments [2]

Comments:

Bullshit! "There are opcode caching extensions but it's not a core component of the PHP runtime environment." So what? "Higher level caches such as OSCache promise..." http://www.danga.com/memcached/ http://pecl.php.net/package/memcache http://jeremy.zawodny.com/blog/archives/001864.html "PHP is not thread-safe. You pretty much have to run PHP on a pre-forked Apache server." PHP is thread safe, but not all PHP modules are (yet). You don't need to run PHP in pre-forked mode. "If you want to co-brand a PHP site, plan on having conditional logic scattered and tangled all over the UI code." What are you talking about? (e.g. http://mojavi.org, and many others if you prefer MVC). "PHP allow you to extend the runtime with the GNU gettext framework." But is not limited to it. I can write you resource bundle implementation with a few lines of php code if you want. It already has ini file support. "Extending PHP often requires recompiling the PHP runtime to statically link in different extensions." Again bullshit. "the language is still bloated with database function calls with every backend you use." Using Oracle, MySQL etc. direct APIs is always faster than via some standardized JDBC on top of direct db calls. You can still abstract db's, and for PHP there is many great solutions. Pear DB is just one. (e.g. http://propel.tigris.org/ and http://creole.tigris.org/). "there's little support in the framework itself" There is no need for PHP to introduce yet another "standard" MVC implementation as there is already several frameworks in php scene that do make people to follow practices that keep the separation of concerns clean. "I'd rather use a taglib in a servlet container to keep track of the URL rewriting needed" Taglibs are already said to be major performance bottleneck. And talking about PHPs URL rewriting I think the overhead in you taglibs is much bigger and it's not applied to whole page that is send to browser. What if you use filters? You need to make sure that they also do the right thing. "PHP awkwardly provides function calls to register error handlers and function to trigger an error." PHP5 will have "try..catch" error handling. PHP errors are really really easy to debug compared to Java's cryptic NPE's and other shit that gets thrown almost in every place. "OO is a core element of code modularization, reuse and extension." Bullshit again. I have written framework for PHP with functional programming approach. I have implemented MVC, SiteMesh style (but more flexible) decoration support, db abstraction, command-pattern, pluggable modules, some sort of portlets etc. So, no need for OO. "It's nice to have the ability to scale the HTTP interface independently from the application layer." What the hell are you talking about? With PHP you simply put loadbalancer in front of web-farm. That's it. No need for connectors. And if you want to write some shitty connector, then go ahead and write one.

Posted by Aapo Laakkonen on April 19, 2004 at 10:26 PM PDT #

That post referred to on Jeremy Zawodny's blog says nothing about caching layers or thread safety in application logic, so what are _you_ talking about? Everytime I hear a shrill ninny like you snivelling when deficiencies are highlighted in your favorite tools, it just makes me shake my head sadly. You're so off base, you don't know where the base is. Taglibs aren't a performance bottleneck, URL rewriting is not applied to "the whole page" in JSP (whatever that's supposed to mean, anyway). If you have no appreciation for OO or separating the web interface from the application tier, that's fine. Data driven applications should be able to take advantage of resources pools such as database connections but if you're not thread safe, you have to run in a preforking model. There's nothing cryptic about NPE's, they indicate that you're accessing members for an object you haven't instantiated (i.e. you screwed up). Anyway, don't scream and cry that my assertions are bullshit when the thread safety and exception handling are "coming in the future" -- I'm talking about right here and now. Which is a more mature and scalable enabling technology -- if you really think it's PHP, you haven't had to build a scaled up data driven application with it, so go back to your little hole and shut your mouth.

Posted by Ian Kallen on April 21, 2004 at 10:21 PM PDT #

Post a Comment:

Comments are closed for this entry.