Recognizing Pentomino from scanned paper – Part 3

This it the 3rd part of the series on “How to recognize Pentomino from scanned paper“.

In the first part, I had discussed the steps to solve the problem of recognition.

  • Rotate the image to be orthogonal to x-y axis
  • Find the grid
  • For each edge on the grid, decide whether it is “black edge” or “no edge”
  • Break the grid into several parts, and process each one individually.
  • Somehow, by using the prior knowledge on the 12 pentomino pieces, find where they ar

In the second part, I had discussed about how to rotate the image and find the grid lines. The image below summarizes the steps so far, the grid is orthogonal to the image, and most of the lines are found. They are saved into two 1D arrays, “rows” and “cols”

There is also a slight problem, some lines are not detected. I will skip it for now, as it is quite easy to fill them in once you know the mean distance between the lines.

The next problem is to determine whether there is a drawn edge at some location. Just to make sure that I am clear, the following image shows what I call an edge. I marked the edges in green, and the non-edges in red.

The simplest approach I could think of is to find a small ROI around a possible edge, and collect some statistics. The simplest statistic is the mean value of the pixels. This makes sense, since edges should be black, and non-edges should be white. The following code section gets as input the processed image, and 2 arrays with row and column numbers of the lines on the grid. It runs in a loop through all rows and cols, and calls a procedure on a small ROI around the edge that attempts to measure the probability that it is an edge. The results are saved into an edge probability matrix.

function edgeProbabilityMatrix = ExtractHorizontalEdges(im,allCols,allRows)
	DEBUG = false;
	im = 255 - im;

	allRows = round(allRows);
	allCols = round(allCols);

	edgeProbabilityMatrix = zeros( numel(allRows),numel(allCols)-1);
	for i=1:numel(allCols)-1
		firstCol = allCols(i);
		secondCol = allCols(i+1);
		selectedCols = firstCol+2  : secondCol-2;
		selectedCols = putToLimits(selectedCols,1,size(im,2));
		for j = 1:numel(allRows)
			selectedRows = allRows(j)-4 : allRows(j)+4;
			selectedRows = putToLimits(selectedRows,1,size(im,1));
			if DEBUG
				x1 = selectedCols(1);
				x2 = selectedCols(end);
				y1 = selectedRows(1);
				y2 = selectedRows(end);
				plotc([x1 x2 x2 x1],[y1 y1 y2 y2],'k');
				hold on ;
			smallImage = im(selectedRows,selectedCols);
			if ~isempty(smallImage)
				p = AnalyzeHorizontalEdgeProbability(smallImage);
				edgeProbabilityMatrix(j,i) = p;

A few words about software engineering:

  • You might have noticed that I am using a custom function called plotc to debug the code. It draws a closed polygon. I have explained about it on stackoverflow.
  • I am also using another custom function, called putToLimits. It clips the input, so that it doesn’t go above or below some minimal and maximal value, i.e. get out of range. Its output is a clipped variant of the input.
  • I am using loops. There might be a vectorized solution, but in this case I prefer loops, due to the clarity of the code. A code that can be debugged is often worth a more than a bunch of smart-ass one-liners. That depends on the case, but in general there is a tendency to abuse vectorized solutions in Matlab community, as they are often considered to be smart, creative, etc. As Loren said once in her blog: “Vectorize, but sensibly“. I have touched here an issue that I will discuss more thoroughly in the next posts.
  • I wrote two functions – one for detection of vertical edges, and one for horizontal. This is not a good practice. A duplicated code causes maintenance problem in the long term. I should fix it in one of the next coding iterations.

Now, back to the algorithm. First, here is a visualization of the ROIs being taken for the horizontal edges:

Here is a histogram of the mean values in the ROIs, both for the horizontal and the vertical edges.

That looks very good!  You can spot two distributions mixed, with the mean values separated quite well. These kind of histograms are called Bi-Modal histograms. The good thing about them is that there are several solutions in computer vision to the problem of separating them and finding a threshold automatically. One of them is Otsu’s method, which is implemented in Matlab. The relevant function is graythresh. However, it assumes that the images are in the range of [0,1], so I did a scaling. The following function detects the range of the input, and scales it to [0,1]. Then it recalculates the threshold in terms of original input.

function threshold = PickThreshold(vals)
    vals = double(vals);
    a = min(vals);
    b = max(vals) - min(vals);
    vals = (vals - a ) / b;
    threshold = graythresh(vals);
    threshold = threshold * b + a;

Here is the result, after thresholding with Otsu’s method:
You can see that there are some non-edges detected as edges (false positives), and vice versa. This is not perfect, but it is a good start. I will explain in the next post how it can be improved.

How would you improve it? Leave a comment.

Recognizing Pentomino from scanned paper – Part 2

In part 1, we have discussed the steps to find the tiling from the scanned paper. Just to remind you, there were:

  1. Rotate the image to be orthogonal to x-y axis
  2. Find the grid
  3. For each edge on the grid, decide whether it is “black edge” or “no edge”
  4. Break the grid into several parts, and process each one individually.
  5. Somehow, by using the prior knowledge on the 12 Pentomino pieces, find where they are

One small thing, I did show the code of detecting the angle, but did not write the actual rotation command. It is as simple as:

rotatedIm = imrotate(clippedIm,90-angle);

I am skipping a small part of my code that removes white background. We can go back to it later. Here are two images, before clipping and rotation, and after:

You can see that now the lines of the grid are truly vertical.

Now, let’s focus on the second task – “Finding the grid”. Just to be clear, finding the grid means finding a vector of X and a vector of Y that represent the places where the lines in the image are.  The most obvious choice for detecting periodic patterns in image processing is DFT (Discrete Fourier Transform), which is called FFT in Matlab. I am talking about summing up the image in either x or y direction, and using `fft`, not using `fft2`, which is not needed after we found the angle and rotated.

FFT can only be used as initial guess, since it is not exactly accurate. The reason is due to its quantization. For instance, imagine that the distance between the grid lines is about 20 pixels. The size of the image in x direction is 1024. Can you detect it? Well, FFT checks the signal decomposition into several frequencies. We need to find the frequency of 1024/20 = 51.2 , which is between 51 and 52. So how accurate is our estimation? We will either find the maximum correlation between the signal and the corresponding sine at the frequency of 51 or 52. Simple math shows that 1024/51 = 20.0784,  1024/51 = 19.6923.  That means that we will either have a mistake of 0.784 or 0.307 pixels. That doesn’t sound much, right? After all, something less than a pixel must be small. Wrong!  Don’t forget that a small mistake in the the distance is multiplied by the amount of lines, that is about 50. That can easily become as much as 0.784*50 = 39.20 pixels. Now that is a huge error.

Let’s see what happens just for the fun.

function cols = DetectColumnsWithFFT(im)
s = sum(im,1);
f = fft(s);
magnitude = abs(f);
phase = angle(f);

%Remove the DC
magnitude(1) = 0;
[~,maxFreqIndex] = max(magnitude);
%Estimate distance between grid elements
gridLength = size(im,2)/ maxFreqIndex;

gridPhase = phase(maxFreqIndex);
startIndex = 1 + (gridLength / 2) * (gridPhase/pi);
cols = startIndex: gridLength: size(im,2);

Here is the result:

The information from the phase allowed us to find the first column exactly. Yet the frequency cannot be found precisely, and eventually the error starts to accumulate. To sum up, FFT is valuable because it is very fast, robust and gives a good initial estimate. The main problem is that by using it, we cannot find the frequency accurately enough

The way to solve this issue is to look for each of the grid lines independently. If I sum the image row-wise, I will get something like this:

The valleys correspond to the grid lines. I am going to use an ad-hoc heuristical solution. First, find a middle line and threshold by its value

Then, I get a square-like wave that consists of [0,1]. I can find the center of each of the zeros sequence. This will give me almost all of the lines.

There are some missing lines! If you take a look again at the graph above, you will understand why. I will show you next time how to fill them.

Here is the code.It is the same both for rows and columns detection. If I want to find columns, the `dim` parameter should be 1, and in case of rows 2. Can you find out what that small trick that I am using in lines 3-5 does? I will leave it as as something to think about.

function centers = FindFrequencyOnStraight(greenIm,dim)
peakWave = sum(greenIm,dim);
places = linspace(1,numel(peakWave),10);
places = round(places);
peakWaveClipped = peakWave( places(2):places(end-1));
nMax = max(peakWaveClipped);
nMin = min(peakWaveClipped);
nMiddle = (nMax + nMin)/2;
bThresholdedPeakWave = peakWave>nMiddle;
centers = regionprops(~bThresholdedPeakWave,'Centroid');
centers = [centers.Centroid];
if dim==1
centers(2:2:end) = [];
centers(1:2:end) = [];

Next time I am going to show how to find those missing lines, and explain about how to find out the edges of the pieces.

Once again, I would like to remind you that this is one of my first posts and a feedback is very welcome!

Recognizing Pentomino from scanned paper – Part 1

[This is my first post! Please leave a comment, I would love to hear your feedback.]

A few monthes ago, when I visited my parents house, I noticed that my dad had solved a lot of Pentomino puzzles. Also, he had wrote them down in a notebook.
Pentomino is a puzzle game, in which you have to tile a 2D shape from pieces. The pieces resemble Tetris pieces, however, they have 5 squares instead of 4. There are 12 unique pieces, excluding rotations and flips:

A typical tiling puzzle solution looks like in the following way:

Here is a typical page from his notebook, after I scanned it:

As someone who is excited about image processing, I immediately thought about writing a program that will scan the page, and recognize the puzzle solutions. In this post, and the next ones, I will show the steps  that I used in order to accomplish this task. Also, I will try to share my thoughts about development in general.

Here is an overview of the algorithm:

  1. Rotate the image to be orthogonal to x-y axis
  2. Find the grid
  3. For each edge on the grid, decide whether it is “black edge” or “no edge”
  4. Break the grid into several parts, and process each one individually.
  5. Somehow, by using the prior knowledge on the 12 pentomino pieces, find where they are.

Let’s discuss each step individually. First, the rotation. Why do we need to rotate? When I scanned the notebook, I was kind of sloppy, and did not align it precisely on the scanner.

Thus, some compensation for the angle is needed. It is probably possible to solve the entire problem only by knowing the angle, and not rotating the actual image. While it is possible, the fact that all of the edges become either perfectly horizontal or vertical makes the coding easier. Even more than that, the debugging (which usually takes more time than writing the code itself) becomes significantly easier. The actual rotation re-samples the image, which can cause a degradation of quality. Nevertheless, as long as I will be able to extract the relevant information, it will be good enough.

How to find the angle of rotation? Several options popped into my head. Hough transform to recognize the lines sounds like a reasonable choice. I can find two dominant angles in Hough space, and use them. I decided to use Radon transform, which is conceptually very close.  Radon transform scans the image in every angle and sums up the pixels to a 1D array. You can see in the next image a conceptual  explanation of the transform:

I expect to have the highest contrast while scanning the angle of the grid. Why? Because every other angle will be more blurred. Think about it, if we hit the exact grid location, we expect to see something like a wave of peaks. Every time that we hit the grid line, the array value will be very large, and when we don’t it will be small. I don’t need to scan all of the possible angles, I need a small range around 0 and 90 degrees, and fairly good quantization of these small ranges. In order to detect the angle with the maximum contrast, I can calculate standard deviation to approximate the contrast.

There is one inherent flaw in Radon transform. Different orientations result in different 1D array length. Thus, the transform is biased towards some angles, it gives them more weight. In order to overcome this problem, I divide by a Radon transform of array of ones.

Here is the result of the transform, on the image above. You can see that one of the columns has a high contrast – that is the angle! It is close to zero, but not exactly.

Here is the Matlab code that detects the angle:

function angle = DetectAngle(greenIm)
    edgeIm = edge(greenIm,'canny');

    ANGLE_RAD = 5;
    STEP = (ANGLE_RAD*2) / 40;


    In = ones(size(edgeIm));
    %     theta =
    R = radon(edgeIm,theta1)./radon( In,theta1);
    Ro = radon(edgeIm,theta2)./radon( In,theta2);
    %     figure;imagesc(theta1,1:size(R,1),R+Ro);
    Rt = R + Ro;
    Rt(isnan(Rt)) = 0;
    nContrastProfile = std(Rt);
%     figure;plot(theta1,nContrastProfile);
    [~,nMaxContrastLocation] = max(nContrastProfile);
    angle = theta2(nMaxContrastLocation);


In the next post, I will show the next steps, so stay tuned.

[This is my first post! Please leave a comment, I would love to hear your feedback.]