With the lizard hatching season underway I’m starting to get some meaningful data from my Incubator Monitor I blogged about earlier in the year.
To extend the project and to try some machine learning techniques in the “real world” I decided to add a feature where it could keep track of which eggs have hatched and detect a second egg hatching. Typically the second egg of a clutch hatches hours or a up to a few days later than the first.
To that end, I decided to write a python module to detect the eggs in the image. They are light coloured ellipses on a dark background so it shouldn’t be too hard.
Features of the eggs in the images:
- Light eggs on dark background.
- Elliptical shape
- Vary in size based on species, time since laying and fish-eye distortion of the camera.
- Variable orientation (vertical, horizontal, anything in between)
Let’s follow a typical object detection process:
- Feature Extraction
- Train a learning algorithm
- Label things!
Step 1: Preprocessing
Not much. The camera is no mounted to the inside of the incubator lid with a constant IR-led lighting so I just read the images as grey scale and leave it at that. I could add a compensation for the fish-eye and better lighting but this is OK for now.
Step 2: Feature Extraction
To start with I’m using a basic edge-detection to detect features in the image. More modern algorithms exist but this is an easy to understand place to start. These edges really give me two important things:
- A way to iterate through list of potential features. Given the eggs vary in : number, location, size, orientation the edges tell me where to look in the image for interesting things.
- Features for the learning algorithm. The edges can for a basis to extract information about any potential eggs (see below).
I decided to use edge detection and then matching the resultant contours (contiguous edges) to an egg shape. I use the cv2 function fitEllipse to fit an ellipse to each contour. If the contour looks like an egg, the fitEllipse will match it.
edges = cv2.Canny(self.img, self.edge_low_thresh, self.edge_high_thresh, 5) im, contours, heir = cv2.findContours(edges,cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE) fit_ellipse_i = cv2.fitEllipse(contours[i])
I then implement a course filtering on the resultant ellipses. Only those that meet the basic criteria are kept:
- length and width in a defined range,
- The contour and ellipse match well.
This gives me a list of “possible” eggs.
I should note here that I originally though that this might be enough and I could just tune the various parameters to get an accurate list here but the two categories (egg, not_an_egg) were not easily separated using a series of if conditions.
This list of possible eggs provide me with some initial “features” for the machine learning. The algorithm needs be the fed with a list of features that describe the egg detections in such a way that the algorithm can distinguish real eggs from false detections.
What makes an egg an egg?
- Egg shaped
- Within the size range of an egg
feature1 : The error between the detected contour an the fit egg shaped ellipse.
feature2 : The ratio between width and length
feature3 : egg width
feature4 : egg length
Step 3: Train a learning algorithm
I have a list of potential eggs and some features describing them. Now comes the important and tedious part : training the machine learning algorithm with training data. This involved recording a bunch of images with different egg configurations and then classifying each detection manually as an egg or not_an_egg. I took a number of images with each configuration as each gives slightly different data and then fed them through a program that displayed each candidate and let me enter y/n for each to classify them, remembering for a sequence of images where the eggs and not_an_eggs were to reduce the data entry (laziness is the mother of invention).
From this I found 12878 potential detections, 3318 eggs and 9560 not_an_eggs to use to train and test the algorithm.
I used the svm with the default Radial Basis Function (RBF) kernel provided by the sklearn python module to implement a Support Vector Machine learning algorithm.
sklearn SVM example:
I create a feature vector with 80% of the data to train the SVM:
# extract 80% as training and 20% as test data self.train = [raw[i] for i in range(len(raw)) if i%5] self.test = [raw[i] for i in range(len(raw)) if i%5==0] print("Train/Test data ", len(self.train), len(self.test)) #extract feature vectors from the dictionary of classified data self.X_train = [[x["error"], x["ellipse"], x["ellipse"], x["ellipse"]/x["ellipse"]] for x in self.train] #extract classifications Y = [x['classification'] == 'EGG' for x in self.train] self.Y_train = np.array(Y).astype(int) self.X_test = [[x["error"], x["ellipse"], x["ellipse"], x["ellipse"]/x["ellipse"]] for x in self.test] Y = [x['classification'] == 'EGG' for x in self.test] self.Y_test = np.array(Y).astype(int) # scale the data with zero mean and std deviation. self.scaler = preprocessing.StandardScaler().fit(self.X_train) X_scaled = self.scaler.transform(self.X_train) #train an svm self.clf = svm.SVC() self.clf.fit(X_scaled, self.Y_train)
Then running a test with the held back 20%:
X_test = self.scaler.transform(self.X_test) self.pred = self.clf.predict(X_test) self.True_test = [i for i in range(len(self.Y_test)) if self.Y_test[i] > 0] self.False_test = [i for i in range(len(self.Y_test)) if self.Y_test[i] == 0] print("True positive ", np.mean(self.pred[self.True_test])) print("False positive ", np.mean(self.pred[self.False_test]))
To get this result:
Train/Test data 10302 2576 True positive 0.927244582043 False positive 0.0321243523316
This is better that I was getting with a sequence of if conditions and not too bad considering the simplicity of features extracted, the noisey background and that I have done no tuning of the SVM (C or gamma).
4. Label things!
Here is an interesting example. Note green ellipses are predicted eggs an red ellipses are predicted not_an_egg.
There is one false detection where the bottom of the container and the vermiculite seen through the side of the container create a reasonable sized egg that matches the ellipse pretty well.
Some eggs are missed due to the edge detection failing to separate them from the crud/container next to them – it needs enough separation from the background. I can use segments of contours but that creates more false positives at the moment.
It is a good first step with room for a lot of improvement. Stay tuned!
Where to find it
Check out the code here on GitHub: IncubatorMonitor