posted on 2023-06-06 11:41 read(892) comment(0) like(27) collect(5)
Everyone is familiar with yolov5, and it is very versatile, but the effect of detecting some small targets is very poor.
During the training process of the YOLOv5 algorithm, the default image size is 640x640 pixels (img-size). To detect small targets, if the img-size is simply changed to 4000*4000, the required memory will be Get so big it's almost impossible to do.
The following are the results of small target detection training on 6k * 4k pictures, eight pictures and one word: bad
data set (road sign):
The easiest way is to cut this large picture into small pictures. Refer to the open source framework SAHI [1]
for several questions:
1. For simple cutting, it is necessary to ensure that the size of each picture after cutting is the same;
2. The cutting process will inevitably cut off the target, and a "fusion" area needs to be set;
3. The data set after cutting is a data set of small pictures, so the target When detecting, only small pictures can be detected. Then it is necessary to merge the small pictures after detection. (trouble)
General structure diagram:
where the blue and green are 4*4=16 subimages after cutting, the part of the red and blue frame is the fusion image, and the mixing ratio is 0.2
This is simple, refer to the blog python to cut the picture , just use opencv to cut it, pay attention to cutting the fusion part of the picture at the same time.
# 融合部分图片 def img_mix(img, row_height, col_width, save_path, file): mix_num = 3 # 每行的高度和每列的宽度 # 分割成4*4就是有 # 4*3个行融合区域 # 3*4个列融合区域 # 一行的融合 row = 0 for i in range(mix_num + 1): mix_height_start = i * row_height mix_height_end = (i + 1) * row_height for j in range(mix_num): mix_row_path = save_path + '/' + file + '_mix_row_' + str(row) + '.jpg' mix_row_start = int(j * col_width + col_width * (1 - mix_percent)) mix_row_end = int(mix_row_start + col_width * mix_percent * 2) # print(mix_height_start, mix_height_end, mix_row_start, mix_row_end) mix_row_img = img[mix_height_start:mix_height_end, mix_row_start:mix_row_end] cv2.imwrite(mix_row_path, mix_row_img) row += 1 col = 0 # 一列的融合 for i in range(mix_num): mix_col_start = int(i * row_height + row_height * (1 - mix_percent)) mix_col_end = int(mix_col_start + row_height * mix_percent * 2) for j in range(mix_num + 1): mix_col_path = save_path + '/' + file + '_mix_col_' + str(col) + '.jpg' mix_width_start = j * col_width mix_width_end = (j + 1) * col_width # print(mix_col_start, mix_col_end, mix_width_start, mix_width_end) mix_col_img = img[mix_col_start:mix_col_end, mix_width_start:mix_width_end] cv2.imwrite(mix_col_path, mix_col_img) col += 1
I read the target data directly from the xml file. The code: get_xml_data.py is
saved in a txt file format after the read is successful. The stored data is
图片类型(0:子图,1:行融合图,2:列融合图)
小图所处位置(0~15)
小图文件名
大图宽度
大图高度
目标类型
x最小值
x最大值
y最小值
y最大值
The results obtained after reading are as follows
. Next, we need to further analyze the data. Code: txt_to_yolo.py
Now we know: the position of the small image, the width and height of each small image, and the width and height of the large image. Then we can locate the target on the small image For
example: suppose the width and height of the picture below are 100, the upper right small box is in the
center of the upper right part, and the width and height are 10, then the position information of the small box is
xmin=70, xmax=80
ymin=20, ymax=30
On the No. 1 submap (number 0~3),
as far as a small part of the upper right corner is concerned, the position information of the small section is
xmin=20, xmax=30
ymin=20, ymax=30
According to this idea, it can be very Handle other data nicely.
There is nothing to say about this. After the picture is cut, just change the picture training path and detection path of yolov5 to the cut picture.
Note that
there is a fusion map during training, but not during detection (because I did not detect the fusion map, it is easy to overlap with the sub-graph, and the comparison is the result of machine detection) Change the path: directly change the path below, such as detect
.def run()
py:
test results:
This is difficult to say, and it is not easy to say,
the main thing is to have a clear mind
1. It is necessary to locate the position of each picture (for example, cut into 4*4, there are 16 positions in total)
2. According to each position, the content of the detection result (txt file) of each picture is processed accordingly. Into the corresponding position in the large image, for example, the position is the upper right corner (0, 3), then the x value of the detected result in the image should be added (3 * large image width/4), and then re-converted to yolov5 label format
That's about it?
fusion result
As mentioned earlier, the training and detection here are based on small pictures, so it is not easy to directly observe the results (detect the frame on the picture), then you can directly use ImageDraw to draw a frame result
on the original image for the result of the fused txt file
not bad
look at the training results
You can also refer to some similar projects
yolov5-tph: https://github.com/Gumpest/YOLOv5-Multibackbone-Compression
yolov-z
What else can be added to the small target detection layer (it doesn’t feel common, try it out except to increase the training time In addition, the effect is also general)
Configuration file: config.py
Cutting image: cut_image.py
Reading xml data: get_xml_data.py
Cutting label data: txt_to_yolo.py
Fusion picture: joint_image.py
Original picture frame: draw_box.py
Main function: main.py
Download address ①
file Download address②
Author:Sweethess
link:http://www.pythonblackhole.com/blog/article/80311/09760c0d7305b855b731/
source:python black hole net
Please indicate the source for any form of reprinting. If any infringement is discovered, it will be held legally responsible.
name:
Comment content: (supports up to 255 characters)
Copyright © 2018-2021 python black hole network All Rights Reserved All rights reserved, and all rights reserved.京ICP备18063182号-7
For complaints and reports, and advertising cooperation, please contact vgs_info@163.com or QQ3083709327
Disclaimer: All articles on the website are uploaded by users and are only for readers' learning and communication use, and commercial use is prohibited. If the article involves pornography, reactionary, infringement and other illegal information, please report it to us and we will delete it immediately after verification!