The application is only running on the developer system


today I'm going to write about a situation every sysadmin has already encountered. The sysadmin gets a new version of some type of software and should install it on a server. After some hours of trying he calls the developer and tells him he's not getting the application to start. The first answer all of us get: "But it is running on my PC." Let the discussion start. ;)

In my opinion it mostly a problem of proper communication. I have also seen (not only once) different types of development environments (mostly Windows) and production/test/... environments (be it Linux, AIX, HP-UX, ...). This could also be a reason for problems, but this is enough for an other topic. So let's come back to the communication.

As communication always two-sided, so is this problem. There can be different sources:
  • The admin has changed a default value of a configuration. 
  • The developer has changed some classes and now needs other permissions or files. 
  • The admin has installed an update and the application is installed on this test system. 
  • The developer uses a new library which is not installed on the server. 
  • ...

There can be a lot more other reasons why software fails, but I think you get the idea. Most of these supposed problems can easily be solved by proper communication. When the admin updates a system (be it security patches, os service packs, ...), just write a short e-mail to the users of the system and explain them in short words what you have done and what might be affected by the update. When a developer changes something in the code, keep a changelog. But as a developer do me a favour and do not mail the sysadmin the complete changelog. When it is to long or there are to many terms related to the business logic or how you changed some algorithm to get some more performance, he won't read it. The changes might also be interesting for the sysadmin, but mostly he will not have the time to read it all and get the parts interesting for him. I know developers do not have unlimited time, but for them it is much easier to find the parts affecting the sysadmin, because they (hopefully ;))understand the complete changelog.

In a perfect world we would have a change management which includes development and system administration, but as this will not always be present, just take the short track and write an e-mail, use the phone or do a short(!!!) meeting when anyone knows about changes, which could affect the release. Some people will now starting rolling their eyes and ask themselves who is not doing so already. It's sad but true there are a lot of people out there.

This way your releases will run a lot smoother and every side gets more understanding for the other side which will positively affect other parts of your daily work.

Create Java heapdumps with the help of core dumps


for some time I had the problem, that taking Java heap dumps with jmap took too long. When one of my tomcats crashed by an OutOfMemoryException, I had no time to do a heap dump because it took some hours and the server had to be back online.

Now I found a sollution to my problem. The initial idea came from this post. It had a solution for Solaris, but with some googling and try and error I found a solution for linux too.

  1. create a core dump of your java process with gdb
    gdb --pid=[java pid]
    gcore [file name]
  2. restart the tomcat or do whatever you like with the java process
  3. attach jmap to the core dump and create a Java heap dump
    jmap -heap:format=b [java binary] [core dump file]
  4. analyze your Java heap dump with your prefered tool
 When you get the following error in step three:
Error attaching to core file: Can't attach to the core file
This might help:
In my case the error apeared because I used the wrong java binary in the jmap call. When you are not sure about your java binary, open the core dump with gdb:
gdb --core=[core dump file]
You will get an output similar to this one:
GNU gdb 6.6 Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB.  Type "show warranty" for details. This GDB was configured as "i586-suse-linux"... (no debugging symbols found) Using host libthread_db library "/lib/".
warning: core file may not match specified executable file. (no debugging symbols found) Failed to read a valid object file image from memory. Core was generated by `/opt/tomcat/bin/jsvc'. #0  0xffffe410 in _start ()
 What you are looking for is in this line:
Core was generated by `/opt/tomcat/bin/jsvc'.
 Call jmap with this binary and you will get a heapdump.


When you would like to use JMX with SSL you have to configure some points on both sides. First, create yourself a self-signed certificate (details here) and insert it into a keystore (details here).

Let’s assume you want to use JMX over SSL with your Tomcat and JConsole on the client. Add these parameters to your tomcat script:[your jmx port][your password][full path to keystore file]
To configure JConsole to use SSL add these parameters to the call:
jconsole[full path to keystore file][your password]
Make sure that the trustStore file is the same as the keyStore file for Tomcat, or trustStore and keyStore contain the same certificates with the same alias.

Should you experience any problems using SSL, this parameter might help you: (for jconsole) (for tomcat)
This will also work with the check_jmx Nagios plugin. Just add the keystore file as trustStore to your call:
java -cp jmxquery.jar[full path to keystore file][your password] org.nagios.JMXQuery -U service:jmx:rmi:///jndi/rmi://:/jmxrmi -O "java.lang:type=MemoryPool,name=Perm Gen" -A Usage -K used -I Usage

Technorati Tags: , , ,


There are different check_jmx versions (ME, NE1, NE2 and CG) on NagiosExchange, MonitorExchange and Google Code but it seems none of them is still maintained. I tried to reach one author but got no reply. So I decided to put my modifications on the net. I also merged some other changes in this new release of check_jmx.

To be sure other people can continue development, should I'm not be reachable, I uploaded the source to gitorious. There is a new repository for Nagios plugins which is maintained by some community members.

check_jmx is a Nagios plugin to monitor your JVM, e.g. your Tomcat or JBoss Installation. It is possible to get data about your heap, gc, .... It is also possible to query MBeans which are part of your application. check_jmx also returns performance data.

For this release I merge the original check_jmx release with additions to support Longs instead of integers for the warning and critical value. I added authentication for connections to the JMX server.

Technorati Tags: , ,

devops cooperation at flickr


and now the matching presentation to the podcast at redmonk I mentioned earlier. Their are some very good statements in this presentation. The tools section is not so important for the mentioned tools, but for the statements that are combined with the tools:
  • single click build
  • single click deployment
  • monitoring for app and systems
  • understanding for all metrics independent of app or system
  • dark launches
But as the last slide says, it is not easy. Try to start with one point or in one project and try to establish it. When it is working take the next point or the next project.

Very good are the slides about culture. One I specially want to mention is slide 57 "Don't just say 'No'". I don't know what was said during the presentation, but my understanding of this sentence is as follows:
You can say 'No', but when explain why and give alternatives. When you don't have alternatives, just say it and try to find alternatives together.
The slides about fingerpointing don't need any comment. Just take a look at them and you know everything.

But there are also slides I do not fully agree with. I don't think dev should have full access to all systems. They definitly must have access to a test environment which is almost the same as production, but they do not necessaries need access to production. They should have access to logs, but not the wrights to restart services or change anything. In my opinion this would be the same as ops changing some code. This can work in small organisations where most people do have more than one role, but not in bigger organisations.

Here is the presentation:
Also take a look at Johns blog

Technorati Tags: , ,