Monday, June 24, 2013

Taking the first hurdles

Introduction


After the first exploits into OpenCV and Java/Groovy bindings, enthusiasm was cooled considerably, as I discovered that the GPU-enabled OpenCV classes do not have Java bindings yet. This is a bit of a bummer, as Groovy-bound GPU accelerated OpenCV routines is the holy grail that we are after. Even more worrying is that Java bindings for the GPU routines did not make it into the list of features for the OpenCV 3.0 release (Q3/Q4 of 2013). Mid-term idea is to try to write those bindings for the most interesting examples myself, and commit them to the code. Short term, I need to check which is the interesting example (this post).

A short detour


As a first reaction, I made a quick review of JCuda and JavaCV, hoping to find some further insights in the Java binding issue and checking out what's available there. JCuda is, more or less, a direct Java binding on the CUDA code, which means that you program lower level routines or use the standard CuFFT or CUBLAS libraries directly. JavaCV examples are mostly related to the earlier OpenCV implementation, and do not include the GPU routines either, although it seems to be able to work with OpenCL.

Other interesting developments are some projects that work on direct use of GPU inside the JDK, such as Project Sumatra and The Root Beer compiler and Aparapi. These projects seem to still be at a very basic level, though, but I'll have a closer look at these later on.

Most of the Java binding involves the use of the Java Native Interface (JNI), which is more or less a translation layer between Java objects and native methods that can use those objects (both ways). This suggests that the Java bindings for the GPU enabled OpenCV routines should not be that different from their non-GPU versions, as calling conventions tend to be the same. This is for the mid-term, though.


An interesting example


One of our more interesting CUDA applications has been template matching. In satellite image processing, template matching is typically used in geo-referencing workflow. The newly acquired image is matched against a template which is taken from a known source with an exact geographical reference to determine a tiepoint (or ground control point). OpenCV has a pretty good template matching routine, which is described in one of the examples.

The example code is converted to a groovy script below. Note that the window management has been stripped out, as this is also not yet available as Java wrappers of the relevant Highgui library routines (the suggestion is, again, to work this out on your own). Make sure to check out the OpenCV Java API docs to understand some subtle differences, esp. for some method calls.
/**
 * @file MatchTemplate_Demo.groovy
 * @brief Sample code to use the function MatchTemplate, rewritten in Groovy
 * @author Guido Lemoine
 */

import org.opencv.core.*
import org.opencv.imgproc.*
import org.opencv.highgui.Highgui

// Always start with this one
LibLoader.load()

/// Load image and template
def img = Highgui.imread(args[0])
def templ = Highgui.imread(args[1])
def match_method = args[2] as int

/// Create the result matrix
int result_cols =  img.cols() - templ.cols() + 1
int result_rows = img.rows() - templ.rows() + 1

def result = new Mat(result_cols, result_rows, CvType.CV_32FC1)

/// Do the Matching and Normalize
Imgproc.matchTemplate( img, templ, result, match_method )
Core.normalize( result, result, 0, 1, Core.NORM_MINMAX, -1)

/// Localizing the best match with minMaxLoc
def match = Core.minMaxLoc( result )

/// For SQDIFF and SQDIFF_NORMED, the best matches are lower values. For all the other methods, the higher the better
if ( match_method  == Imgproc.TM_SQDIFF || match_method == Imgproc.TM_SQDIFF_NORMED )
  { printf "Low value match %.2f at %s%n", match.minVal, match.minLoc }
else
  { printf "High value match %.2f at %s%n", match.maxVal, match.maxLoc  }


Compile and run as before. We're trying to find back the left eye (right in the picture) of the type below.


   +  


groovyc -cp ../build/bin/opencv-245.jar:. MatchTemplate_Demo.groovy

java -cp ../build/bin/opencv-245.jar:/usr/local/groovy/embeddable/groovy-all-2.1.4.jar:. -Djava.library.path=../build/lib MatchTemplate_Demo resources/NotSoAverageMaleFace.jpg resources/lefteye.jpg 0
Low value match 0.00 at {262.0, 182.0}
java -cp ../build/bin/opencv-245.jar:/usr/local/groovy/embeddable/groovy-all-2.1.4.jar:. -Djava.library.path=../build/lib MatchTemplate_Demo resources/NotSoAverageMaleFace.jpg resources/lefteye.jpg 1
Low value match 0.00 at {262.0, 182.0}
java -cp ../build/bin/opencv-245.jar:/usr/local/groovy/embeddable/groovy-all-2.1.4.jar:. -Djava.library.path=../build/lib MatchTemplate_Demo resources/NotSoAverageMaleFace.jpg resources/lefteye.jpg 2
High value match 1.00 at {394.0, 24.0}
java -cp ../build/bin/opencv-245.jar:/usr/local/groovy/embeddable/groovy-all-2.1.4.jar:. -Djava.library.path=../build/lib MatchTemplate_Demo resources/NotSoAverageMaleFace.jpg resources/lefteye.jpg 3
High value match 1.00 at {262.0, 182.0}
java -cp ../build/bin/opencv-245.jar:/usr/local/groovy/embeddable/groovy-all-2.1.4.jar:. -Djava.library.path=../build/lib MatchTemplate_Demo resources/NotSoAverageMaleFace.jpg resources/lefteye.jpg 4
High value match 1.00 at {353.0, 305.0}
java -cp ../build/bin/opencv-245.jar:/usr/local/groovy/embeddable/groovy-all-2.1.4.jar:. -Djava.library.path=../build/lib MatchTemplate_Demo resources/NotSoAverageMaleFace.jpg resources/lefteye.jpg 5
High value match 1.00 at {262.0, 182.0} 

 
The match method given by the last parameter (range 0-5) is as in the original example. Method 2 (CV_TM_CCORR) and 4 (CV_TM_CCOEFF) are the non-normalised versions, which fail to find the correct match location, exactly as in the example case (which uses different imagery).

Each run takes approximately 0.55 s on the HP Elitebook 8760w which has an Intel® Core™ i7-2820QM CPU @ 2.30GHz × 8 with 16 Gb RAM running Ubuntu 12.04 LTS 64-bits.

This is not bad, given the 451 by 452 size of the image and 56 by 56 of the template. However, we're normally interested in finding somewhat larger templates (e.g. 256 by 256) in a large image. We'll do some scaling in the next step

(to be continued)




Friday, June 14, 2013

WUBI or dual boot?

Installing Ubuntu, the hard way

 

The reasons it took me so long to get a dual boot installation to run on my new HP Elitebook 8760w were threefold. For one, our computer helpdesk guy did not want to install linux next to the corporate Windows 7 solution. This scared me into thinking that this was "forbidden", so my first thought was about staying inside Windows instead. That did not last too long, though, because I have never been comfortable with code development on Windows, unless it is Java, Groovy or SQL or similar. I have been test-running a MacBook Pro since a couple of months. Nice, sleek and fast as well (it has an Pentium I7, 16 Gb RAM and a 500 Gb solid state disk), but everything is packed behind flashy graphics and package management is weird. Nothing beats linux as a serious code development environment. And, by the way, this all comes for free as in "free beer"... It's a no brainer, really!

Going WUBI


So, the choice was linux, and my favourite version is Ubuntu. I know SuSE, CentOS, Fedora and Debian versions as well, but have worked mostly with Ubuntu (and Knoppix for special repair jobs). Still hesitating about the dual boot option, I came across WUBI which installs Ubuntu inside Windows. As usual, I did not read the instructions to the end, and happily installed Ubuntu 12.04. This is actually a great way to test linux on a Windows box. You get a pseudo-dual boot start-up, where you can choose between Ubuntu and Windows. However, there is no specific ext4 partition or swap space, for instance. It's not like a Virtual Machine either, which is good, because you would not be able to install the graphics driver you need for the GPU development. Everything seemed to work fine (WiFi, network, etc.) and since I just got used to the MacBook, I even liked the new interface of 12.04 (why not implement useful ideas from those who replicate, and then close up, your code base?). WUBI has some limitations, though, for instance, it can use maximum 30 Gb of disk space, inside the Windows file system.

I managed to install CUDA 5 using the "all in one" installer (669 Mb!) from the CUDA developer zone which took a long time.  I think this is due to the disk space allocation, which probably requires some Windows intervention. After the installation, which includes the development driver, I noticed that the touchpad no longer worked.

I went on to get the OpenCV code and tried to build with CUDA and Java support enabled. This is a fairly long process as well, and occasionally locks the machine with Ubuntu going "grey" several times. After a particularly long period of non-response of the system, I panicked! This is never a good idea during installs! Still, after Ctrl-Alt-F1 and sudo reboot now the grub menu came back up, but Ubuntu did not. Back in Windows, I understood from the web reports that the WUBI installation was damaged, and that the best way out was to remove the C:\ubuntu directory. Removal did not work, however, as the files in the directory were corrupted and Windows suggested to run chkdsk, which required a reboot. This took several hours, staring at the 3 stages of chkdsk, which goes through 100s of thousands of segments, security IDs, etc. After another restart and C:\ubuntu removal, and another reboot, the grub menu was gone, and Windows started normally as it did before the WUBI experiment.

Dual boot, alas!


Dual boot was now the only option left. I decided for a USB install. This went amazingly well and was dead-simple, as I did not opt for a sophisticated disk partitioning. Windows was "shrunk" in the process, but was working well, after it got used to it's reduced status on the box. Back to linux, I decided to install CUDA 4.2 instead, because I thought the CUDA 5 "all-in-one"  installation had something to do with the WUBI crash. Also, CUDA 4.2 comes as separate installs for the CUDA toolkit, the driver and the SDK. I first installed the driver, during which I was prompted for the DKMS option (chose yes, if you want to make sure the driver is included during a kernel update). This apparently went well, although I lost the touchpad functionality again after reboot. The update manager suggested some 250 Mb of upgrades, which included a kernel update. After yet another reboot, the screen went in VGA mode (it can do the full-HD 1920 by 1080 resolution if the driver works correctly) and the NVIDIA driver was no longer found. I guess DKMS did not work properly. Re-installing the 4.2 driver did not work either, and I was left in a terminal session of the "recovery mode" version.

At this stage, I needed a plan... I would do things in the right order and read up on the specific installation steps. I re-installed Ubuntu from the USB stick. This went fine and fast, as the process found the previous version and overwrote files without needing to reformat the ext4 partition. I read installing CUDA 5 on Ubuntu 12.04 and followed it to the letter. This time, I ran the CUDA 5 installer in one go. Only the SDK complained about some missing dependency (libglut), but re-installed after that was resolved. Curiously, though, the driver installation never prompted for anything, not even for the DKMS option. After testing the compiled CUDA code samples, I ran the system upgrade (same as previously). This time, I was expecting that the driver would not integrate with the kernel upgrade. But that was resolved by re-installing the driver only. We have to do this regularly on the workstations, which run Ubuntu 10.04, so I knew the issue. I just wish that Ubuntu kernel upgrades were less frequent.

Also this time, the touchpad stopped working but a simple re-install of the Synaptics driver resolved that issue.

Conclusions

In hindsight, I think a WUBI installation would have worked, after all. The 30 Gb limit may have been more of a hurdle, though, with the long term goal to process large satellite image data sets with GPUs. The Ubuntu partition is 150 Gb, so that limit is less problematic for now.

In the end, everything worked fine, or was, at least, under control. I was now able to run all the other installations (JDK, Groovy, Ant, the UbuntuGis suite (QGIS, GDAL, etc.), Skype) and the other essentials. I was stupid enough to do this in the previous installs before I ran into the (falsely perceived) CUDA install problems. In fact, I was so happy, that I decided to start this blog. If anyone ever arrives at this point, here's some simple rules on installations:

RTFM: obvious enough for many, but Mr. Expert usually thinks this is not for him. Still, reading the f#@king manual before you do anything in this context makes a lot of sense. The good news is that most of the essential material is one or two pages, usually with parse-able statements that guides you through the steps in the right order. Make sure the documentation is specific to the hardware/software components of your system (i.e. look for "Install CUDA 5 on a Quantum 3000M on Ubuntu 12.04" instructions, not "Install CUDA on Ubuntu").

Figure out the right order: do the most tricky things at the start of the process and make sure to test each step. This may require more frequent reboots, but it saves you lots of time when things go wrong. Run the installation of the least critical components only when the essentials run correctly.

Don't panick and persevere: Open Source is an experience, not a shrink-wrapped auto-run installation from a CD. Don't get discouraged by the occasional glitch. Someone, somewhere has likely run into the same problem, and squeezed out a solution on some forum. If the problems are discussed in old exchanges, you can be sure a solution was found.
Even on today's fast processor-based systems, compilations and installs may take considerable time. For instance, while GPUs are extremely fast in executing (well-designed) code, compiling that code is a serious computational effort (on a CPU). Software packages increase in size and complexity with every upgrade and release. Give your box the time to complete these complex tasks. Don't panick, don't hit that Ctrl-C or Ctrl-Alt-F1 combination, and don't "simulate" that power failure. If you screw up the process, you loose more time than the few minutes you could have waited.
Completing a complex install successfully is a very rewarding experience, especially if you build it from source (and repair a bug in the process...). It should make you pause and think about the excellence of open source software which is based on the collective efforts of so many people who solved issues before us and were kind enough to share their ideas with so many of us. 


Today is a good day!

Introduction


Today is a good day! I finally managed to get OpenCV to work with groovy. Finally, because it took me almost a week to get everything installed on a dual boot ubuntu 12.04 + Windows 7 laptop (see this post).

My laptop is a HP EliteBook 8760w, which has a NVIDIA Quadro 3000M  in it. That card is roughly equivalent to a Tesla computing card. It also makes your laptop more like a "schlepptop" as it weighs several kilos and looks like a stack of bricks.

I convinced management to get me this box, so that I can code GPUs "on the road". At work, we have a couple of Dell Precision workstations with each a Tesla card in them. Nice machines, but not easy to carry along. We have been dabbling in CUDA for 3 years now, demonstrating how their performance matches our increasing volume of image data we use in our daily work. CUDA kernel programming and, especially, optimisation is a pain, however. Most of our folks are not IT specialists, but rather domain experts in image use.

Our business is mostly in satellite and airborne image processing. One of my personal missions is to demonstrate processing solutions with new approaches that one can tool together with Open Source (OS) software tools. With satellite imagery like Landsat-8 now being made available under "free and open access" licensing schemes, and the similarly licensed European Sentinel data streams to start flowing next year, the time is ripe to put those capacities in "the hands of the masses", including the scale step in processing speed that is needed to handle these potentially massive data flows.

I am also a Java fan and, even more, a Groovy fan. So, one of my user requirements for a viable OS library is that it should have, at least, some wrappers for Java. Normally, if you can do Java, you can do Groovy. I am not a Java or Groovy expert either, but I can normally resolve most issues via debugging and on-line discovery, My single claim to coding fame is that I once repaired a bug in the TIFF reader of Java Advanced Imaging.

The best of all worlds, given the above, appears to be the  combination of OpenCV with it GPU enabled routine set and its Java bindings. OpenCV is a mature set of computer vision routines, many of which cover the basic functionality that we require in satellite image processing. The GPU enabled routines are supposedly hiding the complexity of the CUDA programming, while unleashing the raw power of these, low cost, computing hardware solutions and the Java binding matches all that with the fairly flat Groovy learning curve. So, here's my version of the 4th paradigm: GPU + OpenCV + Groovy, and tomorrow's satellite Big Data problem is your uncle. 

But, before we get carried away, let's look at the nitty gritty details.First we have to get Groovy to work with OpenCV (this post). In the next post, I'll demonstrate how this works with CPU- and GPU-versions of some of the more interesting OpenCV routines.

Setting up OpenCV with Java binding


So, when the ubuntu 12.04 was finally installed and stable (with all the dependencies resolved), I followed the OpenCV installation as suggested in the
Introduction to Java Development document.

That document suggests to use either an ant, Eclipse or sdt build environment to develop the code. The choice of those environments are definitely recommendable for more complex coding frameworks, but one of my base assumptions is that image processing code is typically a series of simple sequential workflow steps, if needed, organised in modules. Thus, simple command-line based compile, debug and run should be possible. In fact, it is as easy as:

$ javac -cp ../build/bin/opencv-245.jar:. SimpleSample.java
$ java -cp ../build/bin/opencv-245.jar:. -Djava.library.path=../build/lib SimpleSample

which gives you the expected output as:

Welcome to OpenCV 2.4.5.0

OpenCV Mat: Mat [ 5*10*CV_8UC1, isCont=true, isSubmat=false, nativeObj=0x7f5bf02b2d80, dataAddr=0x7f5bf02b2e40 ]

OpenCV Mat data:

[0, 0, 0, 0, 0, 5, 0, 0, 0, 0;

  1, 1, 1, 1, 1, 5, 1, 1, 1, 1;

  0, 0, 0, 0, 0, 5, 0, 0, 0, 0;

  0, 0, 0, 0, 0, 5, 0, 0, 0, 0;

  0, 0, 0, 0, 0, 5, 0, 0, 0, 0]


Groovify that!


So far, so good. The next step is to groovify the java code as follows:

import org.opencv.core.Core
import org.opencv.core.Mat
import org.opencv.core.CvType
import org.opencv.core.Scalar

LibLoader.load()
println("Welcome to OpenCV " + Core.VERSION)
def m = new Mat(5, 10, CvType.CV_8UC1, new Scalar(0))
println("OpenCV Mat: " + m)
def mr1 = m.row(1)
mr1.setTo(new Scalar(1))
def mc5 = m.col(5)
mc5.setTo(new Scalar(5))
println("OpenCV Mat data:\n" + m.dump())

Beyond the abolition of semi-colons at the end of the line, and the use of the def dynamic typing construct, this may not seem too spectacular, but the merit of groovy is expected later, in some more complex coding artefacts.

There are 2 points to make here: first, the static block in the Java code has been replaced with a call to a static method of the java class LibLoader:

import org.opencv.core.Core;

public class LibLoader {
  public static void load() {
    System.loadLibrary(Core.NATIVE_LIBRARY_NAME);
  }
}

which is compiled as:

javac -cp ../build/bin/opencv-245.jar:. LibLoader.java

This may seem a bit obsolete, but I have not found a way to get around this. There is apparently something fishy with calling static library functions in Groovy. Why this is not a problem in Java beats me. In any case, since the LibLoader is needed in all future scripts, multiple re-use is guaranteed. As soon as the groovy issue is resolved, the Java class can be replaced by a functional groovy equivalent (which should be a one-liner as well).


Second, if you try to run this as a groovy script, you will get a java.lang.UnsatisfiedLinkError which is thrown by the Mat statement (the first OpenCV function). This relates to the same issue, i.e. while the LibLoader appears to work, the actual function call can't find the entry point in the library (my explanation). The work-around solution to this problem is to compile first and then run the compiled class as follows:

groovyc -cp ../build/bin/opencv-245.jar:. TestGroovy.groovy

java -cp ../build/bin/opencv-245.jar:/usr/local/groovy/embeddable/groovy-all-2.1.4.jar:.
 -Djava.library.path=../build/lib TestGroovy


which will give you the expected results.

The issue was already filed as a bug request on the OpenCV Q&A forum.

Conclusions

 

Even if not perfect, I can now start looking into some more serious OpenCV/GPU/Groovy coding, which is more or less where I wanted to be last week...

As a little bonus, here is the DetectFaceDemo example in groovy that comes with the OpenCV java example code. It contains some code simplifications available in groovy (e.g. printf, list iterator in a loop)

import org.opencv.core.Core;
import org.opencv.core.Mat
import org.opencv.core.MatOfRect
import org.opencv.core.Point
import org.opencv.core.Rect
import org.opencv.core.Scalar
import org.opencv.highgui.Highgui
import org.opencv.objdetect.CascadeClassifier
/*
 * Detects faces in an image, draws boxes around them, and writes the results
 * to "faceDetection.png".
 */

LibLoader.load()

println("\nRunning DetectFaceDemo")

// Create a face detector from the cascade file in the resources directory.
// Note: I took a little shortcut here: I got rid of the getClass.getResource.getPath
// construction (I don't like the idea that my data should reside in the code directory)
def faceDetector = new CascadeClassifier("resources/lbpcascade_frontalface.xml")
def image = Highgui.imread("resources/NotSoAverageMaleFace.jpg")
        
// Detect faces in the image.
// MatOfRect is a special container class for Rect.
def faceDetections = new MatOfRect()
faceDetector.detectMultiScale(image, faceDetections) 

// Note: use printf rather than println(String.format... 
printf("Detected %d faces%n", faceDetections.toArray().length)

// Draw a bounding box around each face.
// Note: I replaced the object array loop in Java with a Groovy object list iterator
faceDetections.toList().each() { rect ->
  Core.rectangle(image, new Point(rect.x, rect.y), new Point(rect.x
      + rect.width, rect.y + rect.height), new Scalar(0, 255, 0))
}
// Save the visualized detection.
def filename = "faceDetection.png"
printf("Writing %s%n", filename)
Highgui.imwrite(filename, image)

Which works as expected as well...


QED! Time for a celebratory beer!

(to be continued...)