Improve of Mask R-CNN in Edge Segmentation
Main Article Content
Abstract
Nowadays, grasping robot plays an important role in many automatic systems in industrial environment. An excellent grasping robot is which can detect, localize, and pick objects accurately. However, to perfectly achieve these tasks, it is still a challenge in computer vison field. Especially, segmentation task, which is understanded as both detection and localization, is the hardest problem. To deal with this problem, the state-of-the-art Mask R-CNN was introduced and obtained an exceptional result. But this superb model does not certainly perform well when working with harsh location of objects. The edge and border regions are usually misunderstood as the background, this leads to the failure in localizing object to submit a good grasping plan. Thus, in this paper, we introduce a novel method that combine original Mask R-CNN pipeline and 3D algorithms branch to preserve and classify the edge region. This results the improvement of the performance of Mask R-CNN in detailed segmentation. Concretely, the significant improvement practiced in harsh situation of object location obviously discussed in experimental result section, especially edge regions. Both IoU and mAP indicator are increased. Specifically, mAP, which directly reflects the semantic segmentation ability of a model, raised from 0.39 to 0.46. This approach opens a better way in determine the object location and grasping plan.
Keywords
detection, edge, 3D segmentation, Mask Region Convolution Neural Network Mask-CNN
Article Details
References
[1]. R. Girshick, J. Donahue, T. Darrell, J. Malik, RegionBased convolutional networks for accurate object detection, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2015, pages 142-158.
[2]. R. Girshick, Fast R-CNN, IEEE International in Conference on Computer Vision (ICCV), 2015. https://doi.org/10.1109/ICCV.2015.169
[3]. S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, in IEEE International Conference on Computer Vision, pages 2980–2988, 2017.
[4]. K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask rcnn, IEEE International Conference on Computer Vision, pages 2980–2988, 2017.
[5]. Z. Huang, L. Huang, Y. Gong, C. Huang, X. Wang, Mask-Scoring R-CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019. https://doi.org/10.1109/CVPR.2019.00657
[6]. Y. Zhang, J. Chu, L. Leng, J. Miao, Mask-Refined RCNN: A Network for refining object details in instance segmentation. Sensors 2020, 20, 1010. https://doi.org/10.3390/s20041010
[7]. Y. Ioannou, B. Taati, R. Harrap, M. Greenspan, Difference of normals as a multi-scale operator in unorganized point clouds. In Proceedings of the 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission, Zurich, Switzerland, 13–15 October 2012. https://doi.org/10.1109/3DIMPVT.2012.12
[8]. pcl.readthedocs.io. Available online: https://pcl.readthedocs.io/en/latest/cluster_extraction.html (accessed on 1 March 2021).
[9]. C. Rother, V. Kolmogorov, A. Blake, GrabCut: Interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 2004, 23, 309–314. https://doi.org/10.1145/1015706.1015720
[10]. X. Wu, S. Wen, Y. Xie, Improvement of MaskRCNN object segmentation algorithm. In ICRIA 2019: Intelligent Robotics and Applications, Springer: Cham, Switzerland, 2019.
[11]. C. Xu, G. Wang, S. Yan, J. Yu, B. Zhang, S. Dai, Y. Li, L. Xu, Fast vehicle and pedestrian detection using improved Mask R-CNN. Math. Probl. Eng. 2020, 2020, 5761414. https://doi.org/10.1155/2020/5761414
[12]. S. Liu, L. Qi, H. Qin, J. S. J. Jia, Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, SU, USA, 18–22 June 2018. https://doi.org/10.1109/CVPR.2018.00913
[13]. C. R. Qi, H. Su, K. Mo, L. J. Guibas, PointNet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017.
[14]. J. R. R. Uijlings, K. E. A. Van de Sande, T. Gevers, A. W. M. Smeulders, Selective search for object recognition. Int. J. Comput. Vis. 2012, 104, 154–171. https://doi.org/10.1007/s11263-013-0620-5
[15]. S. Albawi, T. A. Mohammed, S. A. I-Zawi, Understanding of a convolutional neural network. In Proceedings of the International Conference on Engineering and Technology (ICET), Antalya, Turkey, 21–23 August 2017. https://doi.org/10.1109/ICEngTechnol.2017.8308186
[16]. S. Albawi, T. A. Mohammed, I-Zawi, S.A. Understanding of a convolutional neural network. In Proceedings of the International Conference on Engineering and Technology (ICET), Antalya, Turkey, 21–23 August 2017.
[17]. D. Rao, Q. V. Le, T. Phoka, M. Quigley, A. Sudsang, A. Y. Ng, Grasping novel objects with depth segmentation. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems (IROS), Taipei, Taiwan, 18–22 October 2020.
[18]. A. Uckermann, C. Elbrechter, R. Haschke, H. Ritter, 3D scene segmentation for autonomous robot grasping. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems (IROS), Vilamoura-Algarve, Portugal, 7–12 October 2012. https://doi.org/10.1109/IROS.2012.6385692
[19]. R. Kurban, F. Skuka, H. Bozpolat, Plane segmentation of Kinect point clouds using RANSAC. In Proceedings of the 2015 7th International Conference on information Technology, ICIT, Huangshan, China, 13–15 November 2015. https://doi.org/10.15849/icit.2015.0098
[20]. pcl.readthedocs.io. Available online: https://pcl.readthedocs.io/projects/tutorials/en/latest/don_segmentation.html (accessed on 1 March 2021).
[21]. H. Sarbolandi, D. Lefloch, A. Kolb, Kinect Range Sensing: Structured-Light versus Time-of-Flight Kinect. Comput. Vis. Image Underst. 2015, 139, 1–20. https://doi.org/10.1016/j.cviu.2015.05.006
[22]. T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, C. L. Zitnick, Microsoft COCO: Common objects, in context. In ECCV; Springer: Cham, Switzerland, 2014.
[23]. J. Lundell, F. Verdoja, V. Kyrki, Beyond Top-grasps through scene completion. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 545–551. https://doi.org/10.1109/ICRA40945.2020.9197320
[24]. Gualtieri, M.; Pas, A.t.; Saenko, K.; Platt, R. High precision grasp pose detection in dense clutter. In Proceedings of the International Conference on Intelligent Robots and Systems (IROS), Daejeon, Korea, 9–14 October 2016. https://doi.org/10.1109/IROS.2016.7759114
[2]. R. Girshick, Fast R-CNN, IEEE International in Conference on Computer Vision (ICCV), 2015. https://doi.org/10.1109/ICCV.2015.169
[3]. S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, in IEEE International Conference on Computer Vision, pages 2980–2988, 2017.
[4]. K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask rcnn, IEEE International Conference on Computer Vision, pages 2980–2988, 2017.
[5]. Z. Huang, L. Huang, Y. Gong, C. Huang, X. Wang, Mask-Scoring R-CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019. https://doi.org/10.1109/CVPR.2019.00657
[6]. Y. Zhang, J. Chu, L. Leng, J. Miao, Mask-Refined RCNN: A Network for refining object details in instance segmentation. Sensors 2020, 20, 1010. https://doi.org/10.3390/s20041010
[7]. Y. Ioannou, B. Taati, R. Harrap, M. Greenspan, Difference of normals as a multi-scale operator in unorganized point clouds. In Proceedings of the 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission, Zurich, Switzerland, 13–15 October 2012. https://doi.org/10.1109/3DIMPVT.2012.12
[8]. pcl.readthedocs.io. Available online: https://pcl.readthedocs.io/en/latest/cluster_extraction.html (accessed on 1 March 2021).
[9]. C. Rother, V. Kolmogorov, A. Blake, GrabCut: Interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 2004, 23, 309–314. https://doi.org/10.1145/1015706.1015720
[10]. X. Wu, S. Wen, Y. Xie, Improvement of MaskRCNN object segmentation algorithm. In ICRIA 2019: Intelligent Robotics and Applications, Springer: Cham, Switzerland, 2019.
[11]. C. Xu, G. Wang, S. Yan, J. Yu, B. Zhang, S. Dai, Y. Li, L. Xu, Fast vehicle and pedestrian detection using improved Mask R-CNN. Math. Probl. Eng. 2020, 2020, 5761414. https://doi.org/10.1155/2020/5761414
[12]. S. Liu, L. Qi, H. Qin, J. S. J. Jia, Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, SU, USA, 18–22 June 2018. https://doi.org/10.1109/CVPR.2018.00913
[13]. C. R. Qi, H. Su, K. Mo, L. J. Guibas, PointNet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017.
[14]. J. R. R. Uijlings, K. E. A. Van de Sande, T. Gevers, A. W. M. Smeulders, Selective search for object recognition. Int. J. Comput. Vis. 2012, 104, 154–171. https://doi.org/10.1007/s11263-013-0620-5
[15]. S. Albawi, T. A. Mohammed, S. A. I-Zawi, Understanding of a convolutional neural network. In Proceedings of the International Conference on Engineering and Technology (ICET), Antalya, Turkey, 21–23 August 2017. https://doi.org/10.1109/ICEngTechnol.2017.8308186
[16]. S. Albawi, T. A. Mohammed, I-Zawi, S.A. Understanding of a convolutional neural network. In Proceedings of the International Conference on Engineering and Technology (ICET), Antalya, Turkey, 21–23 August 2017.
[17]. D. Rao, Q. V. Le, T. Phoka, M. Quigley, A. Sudsang, A. Y. Ng, Grasping novel objects with depth segmentation. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems (IROS), Taipei, Taiwan, 18–22 October 2020.
[18]. A. Uckermann, C. Elbrechter, R. Haschke, H. Ritter, 3D scene segmentation for autonomous robot grasping. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems (IROS), Vilamoura-Algarve, Portugal, 7–12 October 2012. https://doi.org/10.1109/IROS.2012.6385692
[19]. R. Kurban, F. Skuka, H. Bozpolat, Plane segmentation of Kinect point clouds using RANSAC. In Proceedings of the 2015 7th International Conference on information Technology, ICIT, Huangshan, China, 13–15 November 2015. https://doi.org/10.15849/icit.2015.0098
[20]. pcl.readthedocs.io. Available online: https://pcl.readthedocs.io/projects/tutorials/en/latest/don_segmentation.html (accessed on 1 March 2021).
[21]. H. Sarbolandi, D. Lefloch, A. Kolb, Kinect Range Sensing: Structured-Light versus Time-of-Flight Kinect. Comput. Vis. Image Underst. 2015, 139, 1–20. https://doi.org/10.1016/j.cviu.2015.05.006
[22]. T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, C. L. Zitnick, Microsoft COCO: Common objects, in context. In ECCV; Springer: Cham, Switzerland, 2014.
[23]. J. Lundell, F. Verdoja, V. Kyrki, Beyond Top-grasps through scene completion. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 545–551. https://doi.org/10.1109/ICRA40945.2020.9197320
[24]. Gualtieri, M.; Pas, A.t.; Saenko, K.; Platt, R. High precision grasp pose detection in dense clutter. In Proceedings of the International Conference on Intelligent Robots and Systems (IROS), Daejeon, Korea, 9–14 October 2016. https://doi.org/10.1109/IROS.2016.7759114