Worry-free text writing into OpenCV images

Using cv::putText is cumbersome and placing your text at the correct position with the correct size is hard. Here is a wrapper function dealing with all of this for you. The text is fitted inside the given image, even multiple lines are possible and everything is nicely centered.

void ioxp::putText(cv::Mat imgROI, const std::string &text, const int fontFace = cv::FONT_HERSHEY_PLAIN,
    const cv::Scalar color = cv::Scalar::all(255), const int thickness = 1, const int lineType = cv::LINE_8)
     * Split the given text into its lines
    std::vector<std::string> textLines;
    std::istringstream f(text);
    std::string s;
    while (std::getline(f, s, '\n')) {

     * Calculate the line sizes and overall bounding box
    std::vector<cv::Size> textLineSizes;
    cv::Size boundingBox(0,0);
    int baseline = 0;
    for (std::string line : textLines) {
        cv::Size lineSize = cv::getTextSize(line, fontFace, 1, thickness, &baseline);
        baseline += 2 * thickness;
        lineSize.width += 2 * thickness;
        lineSize.height += baseline;
        boundingBox.width = std::max(boundingBox.width, lineSize.width);
        boundingBox.height += lineSize.height;

    const double scale = std::min(imgROI.rows / static_cast<double>(boundingBox.height),
                                  imgROI.cols / static_cast<double>(boundingBox.width));
    boundingBox.width *= scale;
    boundingBox.height *= scale;
    baseline *= scale;
    for (size_t i = 0; i < textLineSizes.size(); i++) {
        textLineSizes.at(i).width *= scale;
        textLineSizes.at(i).height *= scale;
     * Draw the text line-by-line
    int y = (imgROI.rows - boundingBox.height + baseline) / 2;
    for (size_t i = 0; i < textLines.size(); i++) {
        y += textLineSizes.at(i).height;
        // center the text horizontally
        cv::Point textOrg((imgROI.cols - textLineSizes.at(i).width) / 2, y - baseline);
        cv::putText(imgROI, textLines.at(i), textOrg, fontFace, scale, color, thickness, lineType);

This is how you use it and how the results look like:

    cv::Mat outputImage(360, 640, CV_8UC3);
    ioxp::putText(outputImage, "Short text");
    cv::imshow("text", outputImage);

    cv::Mat outputImage(360, 640, CV_8UC3);
      "Some longer text, even with\nmultiple lines spread over the whole image");
    cv::imshow("text", outputImage);

    cv::Mat outputImage(360, 640, CV_8UC3);
    ioxp::putText(outputImage, "\n\n\nEmpty\n\n\nLines\n\n\n");
    cv::imshow("text", outputImage);

By using the Rectangle accessor you can define exactly, which part of the image the text should be placed in:

    cv::Mat outputImage(360, 640, CV_8UC3);
    ioxp::putText(outputImage(cv::Rect(0, outputImage.rows / 10, outputImage.cols, outputImage.rows / 10)),
      "Text placed in the upper\n10 percent of the image");
    cv::imshow("text", outputImage);

Combining ARCore tracking and Cardboard Spatial Audio

This week Google released ARCore, their answer to Apple’s recently published Augmented Reality framework ARKit. This is an exciting opportunity for mobile developers to enter the world of Augmented Reality, Mixed Reality, Holographic Games, … whichever buzzword you prefer.

To get to know the AR framework I wanted to test how easy it would be to combine it with another awesome Android framework: Google VR, used for their Daydream and Cardboard platform. Specifically, its Spatial Audio API. And despite never having used one of those two libraries, combining them is astonishingly simple.

Cf. to https://developers.google.com/vr/concepts/spatial-audio

The results:

The goal is to add correctly rendered three dimensional sound to an augmented reality application. For a demonstrator, we pin an audio source to each of the little Androids placed in the scene.
Well, screenshots don’t make sense to demonstrate audio but without them this post looks so lifeless 🙂 Unfortunately, I could not manage to do a screen recording which includes the audio feed.

The how-to:

  1. Setup ARCore as explained in the documentation. Currently, only Google Pixel and the Samsung Galaxy S8 are supported so you need one of those to test it out. The device coverage will increase in the future
  2. The following step-by-step tutorial starts at the sample project located in /samples/java_arcore_hello_ar it is based on the current Github repository’s HEAD
  3. Open the application’s Gradle build file at /samples/java_arcore_hello_ar/app/build.gradle and add the VR library to the dependencies
    dependencies {
        compile 'com.google.vr:sdk-audio:1.10.0'
  4. Place a sound file in the asset folder. I had some troubles getting it to work until I found out that it has to be a 32-bit float mono wav file. I used Audacity for the conversion:
    1. Open your Audio file in Audacity
    2. Click Tracks -> Stereo Track to Mono
    3. Click File -> Export. Select “Other uncompressed files” as type, Click Options and select “WAV” as Header and “Signed 32 bit PCM” as encoding

    I used “Sam’s Song” from the Ubuntu Touch Sound Package and you can download the correctly converted file here.

  5. We have to apply three modifications to the sample’s HelloArActivity.java: (1) bind the GvrAudioEngine to the Activity’s lifecycle, (2) add a sound object for every object placed into the scene and (3) Continuously update audio object positions and listener position. You find the relevant sections below.

    public class HelloArActivity extends AppCompatActivity implements GLSurfaceView.Renderer {
        private GvrAudioEngine mGvrAudioEngine;
        private ArrayList&amp;amp;lt;Integer&amp;amp;gt; mSounds = new ArrayList&amp;amp;lt;&amp;amp;gt;();
        final String SOUND_FILE = "sams_song.wav";
        protected void onCreate(Bundle savedInstanceState) {
            mGvrAudioEngine = new GvrAudioEngine(this, GvrAudioEngine.RenderingMode.BINAURAL_HIGH_QUALITY);
            new Thread(
                new Runnable() {
                    public void run() {
                        // Prepare the audio file and set the room configuration to an office-like setting
                        // Cf. https://developers.google.com/vr/android/reference/com/google/vr/sdk/audio/GvrAudioEngine
                        mGvrAudioEngine.setRoomProperties(15, 15, 15, PLASTER_SMOOTH, PLASTER_SMOOTH, CURTAIN_HEAVY);
        protected void onResume() {
        public void onPause() {
        public void onDrawFrame(GL10 gl) {
            // Clear screen to notify driver it should not load any pixels from previous frame.
            try {
                // Obtain the current frame from ARSession. When the configuration is set to
                // UpdateMode.BLOCKING (it is by default), this will throttle the rendering to the
                // camera framerate.
                Frame frame = mSession.update();
                // Handle taps. Handling only one tap per frame, as taps are usually low frequency
                // compared to frame rate.
                MotionEvent tap = mQueuedSingleTaps.poll();
                if (tap != null &amp;amp;amp;&amp;amp;amp; frame.getTrackingState() == TrackingState.TRACKING) {
                    for (HitResult hit : frame.hitTest(tap)) {
                        // Check if any plane was hit, and if it was hit inside the plane polygon.
                        if (hit instanceof PlaneHitResult &amp;amp;amp;&amp;amp;amp; ((PlaneHitResult) hit).isHitInPolygon()) {
                            int soundId = mGvrAudioEngine.createSoundObject(SOUND_FILE);
                            float[] translation = new float[3];
                            hit.getHitPose().getTranslation(translation, 0);
                            mGvrAudioEngine.setSoundObjectPosition(soundId, translation[0], translation[1], translation[2]);
                            mGvrAudioEngine.playSound(soundId, true /* looped playback */);
                            // Set a logarithmic rolloffm model and mute after four meters to limit audio chaos
                            mGvrAudioEngine.setSoundObjectDistanceRolloffModel(soundId, GvrAudioEngine.DistanceRolloffModel.LOGARITHMIC, 0, 4);
                            // Hits are sorted by depth. Consider only closest hit on a plane.
                // Visualize planes.
                mPlaneRenderer.drawPlanes(mSession.getAllPlanes(), frame.getPose(), projmtx);
                // Visualize anchors created by touch.
                float scaleFactor = 1.0f;
                for (int i=0; i &amp;amp;lt; mTouches.size(); i++) {
                    PlaneAttachment planeAttachment = mTouches.get(i);
                    if (!planeAttachment.isTracking()) {
                    // Get the current combined pose of an Anchor and Plane in world space. The Anchor
                    // and Plane poses are updated during calls to session.update() as ARCore refines
                    // its estimate of the world.
                    planeAttachment.getPose().toMatrix(mAnchorMatrix, 0);
                    // Update and draw the model and its shadow.
                    mVirtualObject.updateModelMatrix(mAnchorMatrix, scaleFactor);
                    mVirtualObjectShadow.updateModelMatrix(mAnchorMatrix, scaleFactor);
                    mVirtualObject.draw(viewmtx, projmtx, lightIntensity);
                    mVirtualObjectShadow.draw(viewmtx, projmtx, lightIntensity);
                    // Update the audio source position since the anchor might have been refined
                    float[] translation = new float[3];
                    planeAttachment.getPose().getTranslation(translation, 0);
                    mGvrAudioEngine.setSoundObjectPosition(mSounds.get(i), translation[0], translation[1], translation[2]);
                 * Update the listener's position in the audio world
                // Extract positional data
                float[] translation = new float[3];
                frame.getPose().getTranslation(translation, 0);
                float[] rotation = new float[4];
                frame.getPose().getRotationQuaternion(rotation, 0);
                // Update audio engine
                mGvrAudioEngine.setHeadPosition(translation[0], translation[1], translation[2]);
                mGvrAudioEngine.setHeadRotation(rotation[0], rotation[1], rotation[2], rotation[3]);
            } catch (Throwable t) {
                // Avoid crashing the application due to unhandled exceptions.
                Log.e(TAG, "Exception on the OpenGL thread", t);
  6. That’s it! Now, every Android placed into the scene also plays back audio.

Some findings:

  1. Setting up ADB via WiFi is really helpful as you will walk around a lot and don’t want to reconnect USB every time.
  2. Placing the Androids too close to each other will produce a really annoying sound chaos. You can modify the rolloff model to reduce this (cf. line 71 in the code excerpt above).
  3. It matters how you hold your phone (portrait with the current code), because ARCore measures the physical orientation of the device but the audio coordinate system is (not yet) rotated accordingly. If you want to use landscape mode, it is sufficient to set the Activity in the manifest to android:screenOrientation="landscape"
  4. Ask questions tagged with the official arcore tag on Stack Overflow, the Google developers are reading them!

OpenCV: Draw epipolar lines

Drawing epipolar lines in OpenCV is not hard but it requires a sufficient amount of code. Here is a out-of-the-box function for your convenience which only needs the fundamental matrix and the matching points. And it even has an option to exclude outliers during drawing!

#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/calib3d/calib3d.hpp>
#include <opencv2/imgproc/imgproc.hpp>
 * \brief	Compute and draw the epipolar lines in two images
 *			associated to each other by a fundamental matrix
 * \param title			Title of the window to display
 * \param F					Fundamental matrix
 * \param img1			First image
 * \param img2			Second image
 * \param points1		Set of points in the first image
 * \param points2		Set of points in the second image matching to the first set
 * \param inlierDistance			Points with a high distance to the epipolar lines are
 *								not displayed. If it is negative, all points are displayed
template <typename T1, typename T2>
static void drawEpipolarLines(const std::string& title, const cv::Matx<T1,3,3> F,
							  const cv::Mat& img1, const cv::Mat& img2,
							  const std::vector<cv::Point_<T2>> points1,
							  const std::vector<cv::Point_<T2>> points2,
							  const float inlierDistance = -1)
	CV_Assert(img1.size() == img2.size() && img1.type() == img2.type());
	cv::Mat outImg(img1.rows, img1.cols*2, CV_8UC3);
	cv::Rect rect1(0,0, img1.cols, img1.rows);
	cv::Rect rect2(img1.cols, 0, img1.cols, img1.rows);
	 * Allow color drawing
	if (img1.type() == CV_8U)
		cv::cvtColor(img1, outImg(rect1), CV_GRAY2BGR);
		cv::cvtColor(img2, outImg(rect2), CV_GRAY2BGR);
	std::vector<cv::Vec<T2,3>> epilines1, epilines2;
	cv::computeCorrespondEpilines(points1, 1, F, epilines1); //Index starts with 1
	cv::computeCorrespondEpilines(points2, 2, F, epilines2);

	CV_Assert(points1.size() == points2.size() &&
			  points2.size() == epilines1.size() &&
			  epilines1.size() == epilines2.size());

	cv::RNG rng(0);
	for(size_t i=0; i<points1.size(); i++)
		if(inlierDistance > 0)
			if(distancePointLine(points1[i], epilines2[i]) > inlierDistance ||
				distancePointLine(points2[i], epilines1[i]) > inlierDistance)
				//The point match is no inlier
		 * Epipolar lines of the 1st point set are drawn in the 2nd image and vice-versa
		cv::Scalar color(rng(256),rng(256),rng(256));

		cv::circle(outImg(rect1), points1[i], 3, color, -1, CV_AA);

		cv::circle(outImg(rect2), points2[i], 3, color, -1, CV_AA);
	cv::imshow(title, outImg);

template <typename T>
static float distancePointLine(const cv::Point_<T> point, const cv::Vec<T,3>& line)
	//Line is given as a*x + b*y + c = 0
	return std::fabsf(line(0)*point.x + line(1)*point.y + line(2))
			/ std::sqrt(line(0)*line(0)+line(1)*line(1));
The most important concepts of epipolar geometry. Author: Arne Nordmann, CC BY-SA 3.0
The most important concepts of epipolar geometry. Author: Arne Nordmann, CC BY-SA 3.0

Hello World for Android computer vision

Every once in a while I start a new computer vision project with Android. And I am always facing the same question: “What do I need again to retrieve a camera image ready for processing?”. While there are great tutorials around I just want a downloadable project with a minimum amount of code – not taking pictures, not setting resolutions, just the continuous retrieval of incoming camera frames.

So here they are – two different “Hello World” for computer vision. I will show you some excerpts from the code and then provide a download link for each project.

Pure Android API

The main problem to solve is how to store the camera image into a processable image format – in this case the android.graphics.Bitmap .

public void surfaceChanged(SurfaceHolder holder, int format, int width,
		int height) {
	if(camera != null) {
		camera = null;
	camera = Camera.open();
	try {
	} catch (IOException e) {
	camera.setPreviewCallback(new PreviewCallback() {

		public void onPreviewFrame(byte[] data, Camera camera) {
			System.out.println("Frame received!"+data.length);
			Size size = camera.getParameters().getPreviewSize();
			 * Directly constructing a bitmap from the data would be possible if the preview format
			 * had been set to RGB (params.setPreviewFormat() ) but some devices only support YUV.
			 * So we have to stick with it and convert the format
			int[] rgbData = convertYUV420_NV21toRGB8888(data, size.width, size.height);
			Bitmap bitmap = Bitmap.createBitmap(rgbData, size.width, size.height, Bitmap.Config.ARGB_8888);
			 * TODO: now process the bitmap

Notice the function convertYUV420_NV21toRGB8888() which is needed since the internal representation of camera frames does not match any supported Bitmap format.

Using OpenCV

This is even more straight-forward. We just use OpenCV’s JavaCameraView. If you are new to Android+OpenCV, here is a good tutorial for you.

cameraView = (CameraBridgeViewBase) findViewById(R.id.cameraView);

OpenCV Image Watch for cv::Matx

When developing for/with OpenCV using Visual Studio, the Image Watch plug-in is very useful. However, it does not support the better-typed cv::Matx types (e.g. cv::Matx33f which is the same as cv::Matx<float,3,3> ). Here is how I made use of Visual Studio’s debugger type visualizers to customize the plugin:

  1. Go to the folder <VS Installation Directory>\Common7\Packages\Debugger\Visualizers\ and create a new file called Matx.natvis
  2. Open the file and insert the following:
    &lt;?xml version="1.0" encoding="utf-8"?&gt;
    &lt;!-- Philipp Hasper, http://www.hasper.info--&gt;
    &lt;AutoVisualizer xmlns="http://schemas.microsoft.com/vstudio/debugger/natvis/2010"&gt;
    &lt;UIVisualizer ServiceId="{A452AFEA-3DF6-46BB-9177-C0B08F318025}" Id="1" MenuName="Add to Image Watch"/&gt;
    &lt;Type Name="cv::Matx&amp;lt;*,*,*&amp;gt;"&gt;
    &lt;UIVisualizer ServiceId="{A452AFEA-3DF6-46BB-9177-C0B08F318025}" Id="1" /&gt;
    &lt;Type Name="cv::Matx&amp;lt;*,*,*&amp;gt;"&gt;
    &lt;DisplayString Condition='strcmp("float", "$T1") == 0'&gt;{{FLOAT32, size = {$T3}x{$T2}}}&lt;/DisplayString&gt;
    &lt;DisplayString Condition='strcmp("double", "$T1") == 0'&gt;{{FLOAT64, size = {$T3}x{$T2}}}&lt;/DisplayString&gt;
    &lt;Synthetic Name="[type]" Condition='strcmp("float", "$T1") == 0'&gt;
    &lt;Synthetic Name="[type]" Condition='strcmp("double", "$T1") == 0'&gt;
    &lt;Item Name="[channels]"&gt;1&lt;/Item&gt;
    &lt;Item Name="[width]"&gt;$T3&lt;/Item&gt;
    &lt;Item Name="[height]"&gt;$T2&lt;/Item&gt;
    &lt;Item Name="[data]"&gt;(void*)val&lt;/Item&gt;
    &lt;Item Name="[stride]"&gt;$T3*sizeof($T1)&lt;/Item&gt;
  3. You do not even have to restart Visual Studio. Just start a new debugging session and you can look at your cv::Matx types in a nice little graphical window.

Image Watch for cv::Matx

More about customizing the Image Watch plug-in can be found on the official Image Watch documentation page.

OpenCV and Visual Studio: Empty Call Stack

For a couple of years I have used OpenCV for Android and developed with Eclipse. But a while back I started a bigger project which will run on stationary machines so I began to learn how to use Visual Studio 2013. The integration of OpenCV 2.4.8 was fairly easy and I was quickly able to run my code.

(Just as a service since the library names on the given site are outdated – here are all the names for easy copying:)


But then I experienced a strange behaviour: Every time an exception or assertion was thrown inside of an OpenCV method, I would have no clue what happened since the call stack had only four entries: Something about KernelBase.dll, msvcr120d.dll, opencv_core248.dll and the last one “Frames below may be incorrect and/or missing, no symbols loaded for opencv_core248d.dll“.

Frames below may be incorrect and/or missing, no symbols loaded for opencv_core248d.dll

Upon further examination (clicking on the opencv_core248d.dll entry) Visual Studio revealed that the .pdb file was missing: it said “opencv_core248d.pdb not loaded” and “opencv_core248d.pdb could not be found in the selected paths“.

opencv_core248d.pdb not loaded. opencv_core248d.pdb could not be found in the selected paths

I quickly found some .pdb files in C:\opencv248\build\x86\vc12\staticlib but since they did not match the .dlls, they did not work either. So what to do? Essentially we have to build OpenCV ourselves but we will leave out any 3rd party libraries since we only want to debug, not to have fast code (of course you can build a complete version but I didn’t do it in order to save time). In the following I will describe only the basic steps, for a full documentation including pictures and how to add performance-improving 3rd party libraries visit the original tutorial.

  1. I assume you have an OpenCV copy, e.g. under C:\opencv248\ . The folder contains a build and a sources folder.
  2. Install CMake
  3. Start CMake (cmake-gui). It should be in your start menu. Enter C:\opencv248\sources in the first field (“Where is the source code:”) and a freely chosen path e.g. C:\opencv248\ownBuild\ in the second one.
  4. Press “Configure” and select your compiler – for Visual Studio 2013 32-bit it would be “Visual Studio 12”. I then ignored a warning about Java AWT and python missing and pressed the “Generate” button.
  5. Wait for the process to finish, then open C:\opencv248\ownBuild\OpenCV.sln. Build both Debug and Release configuration which should take some time.
  6. After the build, go into C:\opencv248\ownBuild\bin. There are two folders containing all files you will need. Now you have two options:
    1. Remove any directory previously leading to the OpenCV dlls from your PATH (e.g. in my case I removed C:\opencv248\build\x86\vc12\bin ) and then add C:\opencv248\ownBuild\bin\Debug and C:\opencv248\ownBuild\bin\Release to your PATH.
    2. Remove any directory previously leading to the OpenCV dlls from your PATH. Then move all .dll and .pdb files from the Debug and Release folder to a “save” place, e.g. C:\opencv248\debuggableDLL. Add this folder to your PATH, then delete the whole C:\opencv248\ownBuild\ folder to free disk space.
  7. Restart Visual Studio and start a debug session. Now the call stack shows exactly what happened: The call stack now shows the correct lines
  8. Remember to switch back to the optimized dlls when doing performance testing!

If you don’t want to build this all by yourself, here is the result of the build process for Visual Studio 2013 32-bit (the original size of >800MB is compressed to 80MB). To download the archive, activate JavaScript, enter “vsopencv” in the following field and then click the download button. Uncompress the archive with 7-zip and then perform step 6.b.

OpenCV: Reading an image sequence backwards

Here is a small code snippet for OpenCV which reads an image sequence backwards. It needs a sequence of images 000.png, 001.png, 002.png, … in the project’s folder.

cv::Mat frame;
cv::VideoCapture capture("000.png");
capture.set(CV_CAP_PROP_POS_AVI_RATIO, 1);
while (true)
	capture >> frame;
	capture.set(CV_CAP_PROP_POS_FRAMES, capture.get(CV_CAP_PROP_POS_FRAMES) - 2);

	cv::imshow("image", frame);

So what does the code do?

  1. Setting the property CV_CAP_PROP_POS_AVI_RATIO to 1 means starting at the end of the sequence (0 = at the beginning).
  2. The property CV_CAP_PROP_POS_FRAMES defines the index of the next image to load. Since it is automatically increased after each image retrieval, we have to decrement it by the value of 2.