posted on 2023-05-07 20:16 read(424) comment(0) like(18) collect(0)
Scale Invariant Feature Transform (SIFT, Scale Invariant Feature Transform) is a local feature description algorithm in the field of image processing. This method was proposed by Canadian professor David G.Lowe in 1999 and applied for a patent, which belongs to British Columbia University. The SIFT patent expired after March 17, 2020, and now you only need to update the cv version to use it for free.
The SIFT algorithm not only has scale invariance, but also can get better detection results when the image is rotated, the brightness of the image is changed, and the shooting position is moved.
In fact, in our life, the SIFT algorithm is still applied. For example, when we take a panoramic shot on our mobile phone, when we rotate the mobile phone to shoot, we can get a panoramic picture. Have you ever thought about it? The angle of view of the mobile phone camera Yes, why does the angle become larger when shooting through rotation? In fact, the angle has not changed, but we took a lot of images when we rotated the shooting, and these images have overlapping parts between them. Combine these images together, remove the overlapping parts, and you can get a panorama.
Find keypoints in different scale spaces, and calculate the orientation of keypoints.
The image pyramid is a structure that interprets images with multiple resolutions, and generates N images with different resolutions by performing multi-scale pixel sampling on the original image. The image with the highest level of resolution is placed at the bottom, arranged in a pyramid shape, and a series of images with pixels (size) gradually decrease, until the top of the pyramid contains only one pixel image, which constitutes the traditional meaning Image pyramid on .
Obtaining an image pyramid generally includes two steps:
Smooth an image with a low-pass filter
Sampling (sampling) a smooth image
There are two sampling methods - upsampling (gradually increasing resolution) and downsampling (gradually reducing resolution)
What is an Image Gaussian Pyramid?
Before talking about the Gaussian Pyramid, let’s talk about the human eye first. Our human eyes have two characteristics of the perception of the world: one is near, large and far small: the same object feels larger when viewed close up, and feels larger when viewed from a distance. Relatively small; the second is "fuzzy": more precisely, it should be "thickness". When we look close, we can see the details of the object (people will think it is clearer), such as a leaf, and we can see the texture of the leaf when we look closely. , only the outline of the film can be seen from a distance (people will feel blurred). From the perspective of frequency, the details of the image (such as texture, outline, etc.) represent the high-frequency components of the image, and the smoother areas of the image represent the image low-frequency components.
The image Gaussian pyramid is actually a kind of image scale space (divided into linear and nonlinear spaces, only linear space is discussed here). The concept of scale is used to simulate the distance between the observer and the object. While simulating the distance of the object, it also The thickness of the object has to be considered.
In summary, the scale space of the image is to simulate the distance and blur of the object seen by the human eye.
The image Gaussian pyramid takes these two aspects into consideration: ① the distance of the image; ② the blurring degree of the image (understood as thickness is better).
So how to simulate the distance of the image?
Sampling method (upsampling, downsampling)
For example, for an image, for each row, one pixel is taken every other pixel, then the final image is 1/2 of the row and column of the original image. This is a kind of downsampling.
So how to simulate the thickness of the image ?
The image is smoothed with a Gaussian kernel because the Gaussian convolution kernel is the only linear kernel that implements scale transformation.
Above, we understand the formation process of the Gaussian Pyramid from a perceptual point of view. Now let's rationally analyze the creation process of the Gaussian Pyramid.
The Gaussian pyramid is a concept proposed in the Sift operator. First of all, the Gaussian pyramid is not a pyramid, but consists of many groups (Octave) pyramids, and each group of pyramids contains several layers (Interval).
Gaussian pyramid construction process:
First double the original image and then use it as the first layer of the first group of Gaussian pyramids, and use the first group of first-layer images as the first group of pyramids after Gaussian convolution (in fact, Gaussian smoothing or Gaussian filtering). layer, the Gaussian convolution function is:
For the parameter σ, a fixed value of 1.6 is taken in the Sift operator.
Multiply σ by a scale factor k, wait until a new smoothing factor σ=k*σ, use it to smooth the first group of layer 2 images, and the resulting image as the third layer.
Repeat this way, and finally get L layers of images. In the same group, the size of each layer of images is the same, but the smoothing coefficient is different. Their corresponding smoothing coefficients are: 0, σ, kσ, k 2σ, k 3σ...k^(L-2)σ.
The third-to-last layer image of the first group is down-sampled with a scale factor of 2 , and the obtained image is used as the first layer of the second group, and then Gaussian smoothing with a smoothing factor of σ is performed on the first layer image of the second group to obtain The second layer of the second group, just like in step 2, obtains the L-layer images of the second group in this way, and their sizes in the same group are the same, and the corresponding smoothing coefficients are: 0, σ, kσ, k 2σ ,k 3σ...k^(L-2)σ. But group 2 is half the image of group 1 in terms of size.
Repeated execution in this way, you can get a total of O groups, each group of L layers, a total of O*L images, these images together form a Gaussian pyramid, the structure is as follows:
In the same group, the size of images in different layers is the same, and the Gaussian smoothing factor σ of the image in the latter layer is k times the smoothing factor of the image in the previous layer;
In different groups, the first image of the latter group is half the sample of the penultimate image of the previous group, and the size of the image is half of the previous group;
The effect of the Gaussian pyramid image is as follows, which are the 4 layers of the first group and the 4 layers of the second group:
In formula (1), M is the row height of the original image; N is the column width of the original image; O is the group number of the Gaussian pyramid of the image.
In formula (2), n is the number of images to be extracted; S is the number of layers in each group of the image Gaussian pyramid.
(1) 假设高斯金字塔每组有S = 5层,则高斯差分金字塔就有S-1 = 4,
所以n = 2
(2) 假设高斯金字塔每组有S = 6层,则高斯差分金字塔就有S-1 = 5,
那我们只能在高斯差分金字塔每组的中间3层图像求极值,所以n = 3
(3) 假设高斯金字塔每组有S = 7层,则高斯差分金字塔就有S-1 = 6,
那我们只能在高斯差分金字塔每组的中间4层图像求极值,所以n = 4
For the convenience of calculation, record the number of groups or layers starting from 0.
In (3), o is the group index number, r is the layer index number, and σ (o, r ) is the Gaussian blur coefficient of the corresponding image.
σ 0 σ_0
p0It is the initial value of Gaussian blur. Professor David G.Lowe first set it to 1.6. Considering that the camera has actually blurred the image with σ=0.5, it is actually:
through formula (3), the Gaussian blur coefficient in the corresponding image pyramid can be calculated ,as follows:
group 0, layer 0:
group 0, layer 1:
group 0, layer 2:
Group 1, Layer 0:
Group 1, Layer 1: Group
1, Layer 2:
Group 2, Layer 0:
Group 2, Layer 1: Group
2, Layer 2:
From the above calculations, we know that
① In each group, the Gaussian fuzzy system difference between adjacent layers
2 1 / n 2^{1/n}
② The Gaussian blur coefficients of group 0, layer 0, group 1, layer 0, group 2, layer 0, ... are respectively
σ 0 , 2 σ 0 , 4 σ 0 , . . . σ_0,2σ_0,4σ_0,...
p0,2 p0,4 p0,. . .;
③ The 0th layer of the next group is obtained by down-sampling from the last 3rd layer of the previous group, without Gaussian blur operation.
The overall process, as shown in Figure 2:
The scale space of the image solves the problem of how to describe the image at all scales.
创建好图像高斯金字塔后,每一组内的相邻层相减可以得到高斯差分金字塔(DoG, Difference of Gaussian),是后期检测图像极值点的前提,如图2所示:
其中,T = 0.04,可人为设定其值;n为待提取特征的图像数;abs(val)为图像的像素值. 设定像素阈值,为了去除一些噪点或其它一些不稳定像素点.
如下图所示:在高斯差分金字塔中寻找极值点,除了考虑x,y方向的点,还要考虑σ 方向的点,所以判断一个像素点是否为极值点,要与周围的26个点进行比较.
① 如果高斯差分金字塔每组有3层,则只能在中间1层图像寻 找极值点,
② 如果高斯差分金字塔每组有5层,则只能在中间3层图像寻找极值点.
本质上要去掉DoG局部曲率非常不对称的像素. 一个定义不好的高斯差分算子的极值在横跨边缘的地方有较大的主曲率,而在垂直边缘的方向有较小的主曲率。主曲率通过一个2×2的海森矩阵(Hessian Matrix)H求出,D的主曲率和H的特征值成正比,令α 为较大特征值,β 为较小的特征值.
描述子梯度方向直方图由关键点所在尺度的高斯图像计算产生. 图像区域的半径通过下式(17)计算:
1、分别对模板图(参考图,reference image)和实时图(观测图,observation image)建立关键点描述子集合。目标的识别是通过两点集内关键点描述子的比对来完成。具有128维的关键点描述子的相似性度量采用欧式距离。
import cv2 import numpy as np import matplotlib.pyplot as plt #1、读取图像 img=cv2.imread('cat.jpg') cat=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY) #2、sift关键点检测 #sift实例化对象 sift=cv2.xfeatures2d.SIFT_create() # 2.2关键点检测:kp关键点信息包括方向,尺度,位置信息,des是关键点的描述符 kp,des=sift.detectAndCompute(cat,None) # 2.3在图像上绘制关键点的检测结果 cv2.drawKeypoints(img,kp,img,flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS) #3图像显示 plt.figure(figsize=(8,6),dpi=100) plt.imshow(img[:,:,::-1]),plt.title('sift') plt.xticks([]),plt.yticks([])
Reference article
source:python black hole net
Please indicate the source for any form of reprinting. If any infringement is discovered, it will be held legally responsible.
Comment content: (supports up to 255 characters)
Copyright © 2018-2021 python black hole network All Rights Reserved All rights reserved, and all rights reserved.京ICP备18063182号-7
For complaints and reports, and advertising cooperation, please contact or QQ3083709327
Disclaimer: All articles on the website are uploaded by users and are only for readers' learning and communication use, and commercial use is prohibited. If the article involves pornography, reactionary, infringement and other illegal information, please report it to us and we will delete it immediately after verification!