簡易檢索 / 詳目顯示

研究生: 楊惟中
Yang, Wei-Jong
論文名稱: 前瞻電腦視覺於智慧生活與3D視覺系統之研究
Researches on Smart Living and 3D Video Systems with Cutting-edge Computer Vision Methods
指導教授: 詹寶珠
Chung, Paul-Choo
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 139
中文關鍵詞: 深度學習電腦視覺深度包裝
外文關鍵詞: Deep Learning, Computer Vision, Depth Packing
相關次數: 點閱:154下載:20
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著立體顯示與虛擬實境技術的成熟,人們日常生活上對影視娛樂內容的需求大量的增加。近年來,電腦視覺與人工智慧的發展能有效促進智慧生活應用。立體虛擬實境與人工智慧視覺的技術發展將衝擊及轉變傳統需要大量人力的內容製作與生活服務,尤其是近年來深度學習帶來的影響更為巨大,許多傳統電腦視覺難以解決的問題在經過大量資料訓練後,卷積神經網路模型皆能表現更為出色。智慧內容處理與視覺辨識領域的分界也隨著卷積神經網路的使用彈性與擴充性逐漸變得模糊。本篇論文中,我們將會探討新的立體內容與智慧視覺技術開發,針對三維虛擬實境視訊廣播與自動駕駛兩個應用領域分別進行研究,企求對未來多媒體智慧生活帶來增進的影響。

    在立體虛擬實境與三維視訊廣播中,至今有許多值得探討的問題如,缺乏立體內容,立體視訊格式不具彈性、及計算複雜等等。因此,在本論文中,我們針對深度圖生成、立體包裝格式、及創意內容加值上,提出新的方法來改進。首先在彩圖
    加景深圖的立體內容生成上,我們提出兩種深度圖生成的方式,其一是開發基於雙視角的傳統演算法,利用低解析度立體匹配,再加以放大及精緻化演算法,使能在GPU 實現高解析度立體匹配系統。另一個則是基於半監督式學習的單視角深度生成系統,訓練資料為雙視角影像,在無深度圖訓練的特性下,可以訓練成可預測單視角景深圖的網路。同時,利用參考時間資訊提高預測的準確性,其使用情境亦能延伸到自駕車的相對距離估算。對於彩圖加景深圖的立體包裝格式,我們基於影像置中景深包裝(CTDP) 架構下,提出的指定像素的RGB 彩色景深包裝法,不僅能讓3D
    與2D 使用者皆有良好的觀賞體驗,還能解決原始CTDP 在視訊壓縮採YCbCr 4:2:0格式之失真較大的問題。在內容加值上,我們提出一套換臉系統,其系統包含一套人臉校正子系統提高目標的分辨率,並搭配一套基於雙編碼通道的自動編碼機,進行最後的人臉置換。我們提出了上述方法,解決三維視訊廣播所需的內容製作、內容包裝以及內容置換技術以完善三維內容不足的困難及提高系統業者的使用意願。

    在自駕系統上,由於類神經網路對特定使用情境下有極高的優勢,在本論文中,我們開發車道線與物件偵測的類神經網路模組。針對道路使用的特殊情境下,我們延續自動編碼器的架構加以優化其網路架構。我們所提的車道偵測網路採取非對稱的自動編碼器,利用編碼端的採高低階特徵圖相加以及解碼端的簡化加速,能在兼顧速度的情形下提高準確率。另外,物件偵測則是採取多通道的方式將大物件與小物件的特徵分離,並利用非區域塊模組以提升準確度,能比MobileNet +SSD 有更好的準確率。以上這兩個車道線與物件偵測網路皆能在一般的電腦及顯示卡,達到即時運算的速度,充分展現了方法在未來拓展到實際應用的可行性。

    Under mature technologies of 3D video displays and virtual reality (VR), the mass needs of visual entertainment contents will be greatly increased in our daily life. With developments of computer vision and artificial intelligence (AI), the improvement of smart living applications will become also possible to enhance the personal necessities. By using AI technologies, the technologies of 3D virtual reality and intelligent vision will significantly impact the content creations and smart living services, which originally requires a lot of works for productions. Due to the excessive influence of deep learning in recent years, many difficult problems in traditional computer vision can be easily solved by convolution neural network models after being trained by a large amount of data. Also, the demarcation of smart content processing and visual recognition gradually becomes blurred with flexibility and scalability of the convolutional neural networks. In this dissertation, we will study new 3D content and smart vision technologies and focus on 3D VR video broadcasting and autonomous driving applications such that we could improve our smart multimedia daily life in the near future.

    In 3D VR and 3D video broadcasting applications, there are still many unsolved issues, which are worthy for further studying, including 3D content production, flexible 3D packaging format, and computation complexity. Therefore, in this dissertation, we have proposed
    new methods to improve the depth map generation, the 3D packaging format, and value-added creative content. First of all, for 3D content generation, we propose two methods to generated depth information. First, based on traditional stereo matching algorithms, we propose
    a low-computation stereo matching system with GPU acceleration, where we perform the low-resolution
    stereo matching method followed by upsizing and refinement processes to precise and high-resolution
    stereo matching scenarios. Secondly, we propose a monocular depth estimation system, which is based on semi-supervised learning. The model trained with stereo images without using ground truth depth to greatly reduce the training costs. The trained CNN model, which can be used to predict the depths of monocular images. By further including the temporal network, we could improve the accuracy of depth prediction further. The usage scenario can also be extended to estimate the relative distance for self-driving car systems. As for the 3D packaging format, we proposed assigned RGB color depth packing method for the centralized texture and depth packing (CTDP) formats, which not only can exhibit good viewing experiences for both 3D and 2D users but also solve the problem if the CTDP videos are encoded in YCbCr 4:2:0 chroma format. In terms of value-added content
    creation, we propose a face swapping system, which includes a face alignment subsystem to improve the recognition of the target face followed by an autoencoder based on dual-encoding channels to perform the final face replacement. In 3D video broadcasting, we have proposed the above methods to resolve the needs of content production, content packing and content transformation for 3D broadcasters to improve their willingness to use.

    For autonomous driving system, since the convolutional neural networks have very high advantages in specific usage scenarios, we propose lane detection and object detection network models which are targeted for special scenarios to optimize the network architecture for autonomous driving application. For lane detection, we are based on the architecture of the autoencoder but adopt the asymmetrical encoder, whose encoder use addition of high and low level feature maps and the simplified decoder to improve processing speed. With the
    high and level feature adding technique, the proposed lane detection network can improve the accuracy rate while maintain the execution speed. As for object detection, we proposed a multi-path detection network structure to separate the characteristics of large objects and small objects. The proposed detection network also introduced the non-local block to improve network accuracy. The multi-path network achieves better accurate performance than the state-of-art MobileNet + SSD networks. Both lane detection and objection detection
    networks can conduct real-time computation on the ordinary desktop with common graphics card to fully demonstrate the feasibility for real applications in the future.

    摘要 i Abstract ii 誌謝 iv Table of Contents v List of Tables vii List of Figures viii Chapter 1. Introduction 1 1.1 Research Background . . . . . . . . . . . . . . . . 1 1.2 Literature Review . . . . . . . . . . . . . . . . . 2 1.3 Organization of Dissertation . . . . . . . . . . . .6 Chapter 2. Related Work 8 2.1. Image and Video Data Representation . . . . . . . .8 . 2.1.1. Image and Video Data Representation . . . . . .9 . 2.1.2. Centralized Texture Depth Packing Formats (CTDP). . . . . . . . . . . . . . . . . . . . . . . . .11 . 2.1.3. Relationship of 3D Data Representations and Research Tasks . . . . . . . . . . . . . . . . . . . . 14 2.2. Traditional Computer Vision Techniques . . . . . .14 . 2.2.1. Explicit Shape Regression . . . . . . . . . . 14 . 2.2.2. Localbased Stereo Matching . . . . . . . . . . . . . . . . . . . .16 2.3.Convolutional Neural Network with Deep Learning . .18 . 2.3.1. Convolutional Neural Networks . . . . . . . . 19 . 2.3.2. Extended Convolutional Neural Networks . . . .20 . 2.3.3. Object Detection . . . . . . . . . . . . . . .23 . 2.3.4. Semantic Segmentation . . . . . . . . . . . . 24 . 2.3.5. Deep Learning for stereo matching . . . . . . 25 Chapter 3. Stereoview and Monoview Depth Estimation . .27 3.1. Overview . . . . . . . . . . . . . . . . . . . . .27 3.2. Fast Stereo Matching with Depth Upsizing . . . . .27 . 3.2.1. LowResolution Stereo Matching . . . . . . . . 28 . 3.2.2. Cross Aggregation . . . . . . . . . . . . . . 30 . 3.2.3. Recursive Refinement and Depth Upsizing . . . 31 . 3.2.4. Experimental Results . . . . . . . . . . . . .40 3.3. Unsupervised Monocular Depth Estimation . . . . . 41 . 3.3.1. The Proposed MDE System . . . . . . . . . . . 42 . 3.3.2. Training and Simulations . . . . . . . . . . .48 3.4. Chapter Summary . . . . . . . . . . . . . . . . . 50 Chapter 4. An Assigned Color Depth Packing Method with Centralized Texture Depth Packing Formats for 3D VR Broadcasting Services . . . . . . . . . . . . . . . . .52 4.1. Overview . . . . . . . . . . . . . . . . . . . . .52 4.2. Mathematics Expression and Problem Definition . . 54 4.3. Assigned Color Depth Method . . . . . . . . . . . 55 . 4.3.1. Subpixel Assignment for YCbCr 4:2:0 . . . . . 57 . 4.3.2. Subpixel Assignment for YCbCr 4:2:2 . . . . . 60 4.4. Experimental Results . . . . . . . . . . . . . . .63 . 4.4.1.Performances of Uncoded Packing Formats . . . .64 . 4.4.2.Performances of HEVCcoded Packing Formats . . .66 4.5. CTDP based VR Broadcasting System . . . . . . . . 70 4.6.Chapter Summary . . . . . . . . . . . . . . . . . .72 Chapter 5. An AIbased Face Swapping System with Aligned Face Recognition . . 73 5.1. Overview . . . . . . . . . . . . . . . . . . . . .73 5.2.The Proposed Robust Face Recognition System . . . .74 . 5.2.1. Face Alignment Method . . . . . . . . . . . . 76 . 5.2.2. Conditional Cross Warping . . . . . . . . . . 78 . 5.2.3. Experimental Results . . . . . . . . . . . . .81 5.3. Face Swapping Based on Improved Autoencoders . . .87 . 5.3.1. Face Swapping Based on Improved Autoencoders .88 . 5.3.2. Experiment results . . . . . . . . . . . . . .93 5.4.Chapter Summary . . . . . . . . . . . . . . . . . .96 Chapter 6. Multiple Feature Networks for Autonomous Driving . . . . . . . . . . . . . . . . . . . . . . . .98 6.1. Overview . . . . . . . . . . . . . . . . . . . . .98 6.2. Improved Lane Detection Convolutional Neural Network . . . . . . . . . 98 . 6.2.1. The Proposed Lane Detection Network . . . . .100 . 6.2.2. Experimental Results . . . . . . . . . . . . 106 6.3. Multiple Paths Detector . . . . . . . . . . . . .113 . 6.3.1. Multiple Paths Detector . . . . . . . . . . .115 . 6.3.2. Experiment Results . . . . . . . . . . . . . 118 6.4. Chapter Summary . . . . . . . . . . . . . . . . .120 Chapter 7. Conclusion and Future Work 122 References 124 Appendix A. Publications 138

    [1] W. Zhao, R. Chellappa, P. J. Phillips, and A. Rosenfeld, “Face recognition: A literature survey,” ACM Computing Surveys, vol. 35, no. 4, pp. 399–458, 2003.
    [2] Q. Feng, C. Yuan, J. S. Pan, J. F. Yang, Y. T. Chou, Y. Zhou, and W. Li, “Superimposed Sparse Parameter Classifiers for Face Recognition,” IEEE Transactions on Cybernetics, vol. 47, pp. 378–390, feb 2017.
    [3] Y. Wen, K. Zhang, Z. Li, and Y. Qiao, “A discriminative feature learning approach for deep face recognition,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9911 LNCS, pp. 499–515, 2016.
    [4] D. Gerónimo, A. M. López, A. D. Sappa, and T. Graf, “Survey of pedestrian detection for advanced driver assistance systems,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 7, pp. 1239–1258, 2010.
    [5] M. Negru and S. Nedevschi, “Image based fog detection and visibility estimation for driving assistance systems,” Proceedings 2013 IEEE 9th International Conference on Intelligent Computer Communication and Processing, ICCP 2013, pp. 163–168, 2013.
    [6] Y. Kim, H. Jung, D. Min, and K. Sohn, “Deep Monocular Depth Estimation via Integration of Global and Local Predictions,” IEEE Transactions on Image Processing, vol. 27, no. 8, pp. 4131–4144, 2018.
    [7] M. Negru and S. Nedevschi, “Image based fog detection and visibility estimation for driving assistance systems,” Proceedings 2013 IEEE 9th International Conference on Intelligent Computer Communication and Processing, ICCP 2013, pp. 163–168, 2013.
    [8] A. S. Georghiades, P. N. Belhumeur, and D. J. Kriegman, “From few to many: Illumination cone models for face recognition under variable lighting and pose,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp. 643– 660, 2001.
    [9] L. Qiao, S. Chen, and X. Tan, “Sparsity preserving discriminant analysis for single training image face recognition,” Pattern Recognition Letters, vol. 31, no. 5, pp. 422– 429, 2010.
    [10] C. Caraffi, T. Vojir, J. Trefný, J. Šochman, and J. Matas, “A system for realtime detection and tracking of vehicles from a single carmounted camera,” IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC, pp. 975–982, 2012.
    [11] G. David, “Object recognition from local scaleinvariant features,” Proceedings of the IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157, 1999.
    [12] Navneet Dalal and Bill Triggs, “Histograms of Oriented Gradients for Human Detection,”IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 886–893, 2005.
    [13] C. Cortes and V. Vapnik, “Supportvector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, 1995.
    [14] A. Pentland and M. Turk, “Eigenfaces for Recognition,” Journal of Cognitive Neuroscience, vol. 3, no. 1, pp. 71–86, 1991.
    [15] A. M. Martinez and A. C. Kak, “PCA versus LDA,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 228–233, 2001.
    [16] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems, vol. 2, pp. 1097–1105, 2012.
    [17] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. FeiFei,“ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision, vol. 115, no. 3, pp. 211–252, 2015.
    [18] K. Simonyan and A. Zisserman, “Very deep convolutional networks for largescale image recognition,” 3rd International Conference on Learning Representations, ICLR 2015 Conference Track Proceedings, 2015.
    [19] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 0712June2015, pp. 1–9, 2015.
    [20] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016December, pp. 770–778, 2016.
    [21] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” Proceedings 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017January, pp. 2261–2269, 2017.
    [22] L. Bottou, “LargeScale Machine Learning with Stochastic Gradient Descent,” Proceedings of COMPSTAT’2010, pp. 177–186, 2010.
    [23] J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization,” COLT 2010 The 23rd Conference on Learning Theory, pp. 257–269, 2010.
    [24] G. E. Hinton, N. Srivastava, and K. Swersky, “Neural Networks for Machine Learning Lecture 6a Overview of minibatch gradient descent.,” COURSERA: Neural Networks for Machine Learning, p. 29, 2012.
    [25] D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimization,” 3rd International Conference on Learning Representations, ICLR 2015 Conference Track Proceedings, 2015.
    [26] C. P. Papageorgiou, M. Oren, and T. Poggio, “A general framework for object detection,”pp. 555–562, 1998.
    [27] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 0712June2015, pp. 3431–3440, 2015.
    [28] D. Glasner, S. Bagon, and M. Irani, “Superresolution from a single image,” Proceedings of the IEEE International Conference on Computer Vision, pp. 349–356, 2009.
    [29] G. Ross, D. Jeff, D. Trevor, and M. Jitendra, “Regionbased convolutional networks for accurate object detection and segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 1, pp. 142–158, 2016.
    [30] S. Ren, K. He, R. Girshick, and J. Sun, “Faster RCNN: Towards RealTime Object Detection with Region Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017.
    [31] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, realtime object detection,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016December, pp. 779–788, 2016.
    [32] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg, “SSD: Single shot multibox detector,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9905 LNCS, pp. 21–37, 2016.
    [33] J. Kim and M. Lee, “Robust lane detection based on convolutional neural network and random sample consensus,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8834, pp. 454–461, 2014.
    [34] “IEEE Trans. Circuits and Systems for Video Technology, Special Issue on Augmented Video,” 2015.
    [35] J. F. Yang, H. M. Wang, and A. T. Chiang, “2D backwards compatible centralized colordepth packing,” Joint Collaborative Team on 3D Video Coding Extensions of ITUT SG 16 WP 3 and ISO/IEC JTC I/SC 29/WG 11, the 6th Meeting: Document: JCT3VF0087, 2013.
    [36] H. Cho, S. U. Jung, and H. K. Jee, “Realtime interactive AR system for broadcasting,” Proceedings IEEE Virtual Reality, pp. 353–354, 2017.
    [37] Y. C. Fan, Y. C. Chen, and S. Y. Chou, “VividDIBR based 2D3D image conversion system for 3D display,” IEEE/OSA Journal of Display Technology, vol. 10, no. 10, pp. 859–870, 2014.
    [38] A. Smolic, K. Mueller, P. Merkle, C. Fehn, P. Kauff, P. Eisert, and T. Wiegand, “3D video and free viewpoint video Technologies, applications and MPEG standards,”2006 IEEE International Conference on Multimedia and Expo, ICME 2006 Proceedings, vol. 2006, pp. 2161–2164, 2006.
    [39] T. Kanade and M. Okutomi, “A Stereo Matching Algorithm with an Adaptive Window: Theory and Experiment,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 9, pp. 920–932, 1994.
    [40] K. Zhang, J. Lu, and G. Lafruit, “Crossbased local stereo matching using orthogonal integral images,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 19, no. 7, pp. 1073–1079, 2009.
    [41] H. Hirschmuller and D. Scharstein, “Evaluation of stereo matching costs on images with radiometric differences,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 9, pp. 1582–1599, 2009.
    [42] J. Žbontar and Y. Le Cun, “Computing the stereo matching cost with a convolutional neural network,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 0712June2015, pp. 1592–1599, 2015.
    [43] J. Žbontar and Y. Lecun, “Stereo matching by training a convolutional neural network to compare image patches,” Journal of Machine Learning Research, vol. 17, 2016.
    [44] A. Kendall, H. Martirosyan, S. Dasgupta, P. Henry, R. Kennedy, A. Bachrach, and A. Bry, “End to End Learning of Geometry and Context for Deep Stereo Regression,”Proceedings of the IEEE International Conference on Computer Vision, vol. 2017October, pp. 66–75, 2017.
    [45] A. Dosovitskiy, P. Fischery, E. Ilg, P. Hausser, C. Hazirbas, V. Golkov, P. V. D. Smagt, D. Cremers, and T. Brox, “FlowNet: Learning optical flow with convolutional networks,”Proceedings of the IEEE International Conference on Computer Vision, vol. 2015 International Conference on Computer Vision, ICCV 2015, pp. 2758–2766, 2015.
    [46] N. Mayer, E. Ilg, P. Hausser, P. Fischer, D. Cremers, A. Dosovitskiy, and T. Brox,“A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016December, pp. 4040–4048, 2016.
    [47] C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised monocular depth estimation with leftright consistency,” in Proceedings 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017Janua, pp. 6602–6611, 2017.
    [48] Unity Technologies, “Unity.” https://hadoop.apache.org, 20190610.
    [49] Epic Games, “Unreal engine.” https://www.unrealengine.com, 20190425.
    [50] J. Carmigniani, B. Furht, M. Anisetti, P. Ceravolo, E. Damiani, and M. Ivkovic, “Augmented reality technologies, systems and applications,” Multimedia Tools and Applications, vol. 51, no. 1, pp. 341–377, 2011.
    [51] A. Y. Nee, S. K. Ong, G. Chryssolouris, and D. Mourtzis, “Augmented reality applications in design and manufacturing,” CIRP Annals Manufacturing Technology, vol. 61, no. 2, pp. 657–679, 2012.
    [52] A. Shukla, S. S. Gullapuram, H. Katti, K. Yadati, M. Kankanhalli, and R. Subramanian, “Affect recognition in ads with application to computational advertising,” MM 2017 Proceedings of the 2017 ACM Multimedia Conference, pp. 1148–1156, 2017.
    [53] A. Shukla, S. S. Gullapuram, H. Katti, K. Yadati, M. Kankanhalli, and R. Subramanian, “Evaluating contentcentric vs. user-centric ad affect recognition,” ICMI 2017 Proceedings of the 19th ACM International Conference on Multimodal Interaction, vol. 2017January, pp. 402–410, 2017.
    [54] F. Yang and Z. Zhou, “Recovering 3D planes from a single image via convolutional neural networks,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11214 LNCS, pp. 87–103, 2018.
    [55] C. Liu, K. Kim, J. Gu, Y. Furukawa, and J. Kautz, “PlaneRCNN: 3D plane detection and reconstruction from a single image,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2019 June, pp. 4445–4454, 2019.
    [56] T. Sun, Y. Wang, J. Yang, and X. Hu, “Convolution Neural Networks With Two Pathways for Image Style Recognition,” IEEE Transactions on Image Processing, vol. 26, no. 9, pp. 4102–4113, 2017.
    [57] H. Zhang and K. Dana, “Multistyle generative network for realtime transfer,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11132 LNCS, pp. 349–365, 2019.
    [58] H. Liu, P. N. Michelini, and D. Zhu, “ArtsyGAN: A style transfer system with improved quality, diversity and performance,” Proceedings International Conference on Pattern Recognition, vol. 2018August, pp. 79–84, 2018.
    [59] X. Cao, Y. Wei, F. Wen, and J. Sun, “Face alignment by Explicit Shape Regression,”Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2887–2894, 2012.
    [60] Y. Wu and X. Ai, “Face detection in color images using adaboost algorithm based on skin color information,” First International Workshop on Knowledge Discovery and Data Mining (WKDD 2008), pp. 339–342, 2008.
    [61] “Boosting methods for regression,” Machine Learning, vol. 47, no. 23, pp. 153–200, 2002.
    [62] J. H. Freidman, “Greedy Function Approximation: A Gradient Boosting Machine,” Annals of Statistics, vol. 29, no. 5, pp. 1189–1232, 2001.
    [63] Y. KukJin and K. InSo, “Adaptive supportweight approach for correspondence search,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 4, pp. 650–656, 2006.
    [64] K. Zhang, J. Lu, and G. Lafruit, “Crossbased local stereo matching using orthogonal integral images,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 19, no. 7, pp. 1073–1079, 2009.
    [65] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” arXiv preprint, 2017.
    [66] M. Everingham, S. M. Eslami, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman,“The Pascal Visual Object Classes Challenge: A Retrospective,” International Journal of Computer Vision, vol. 111, no. 1, pp. 98–136, 2014.
    [67] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the KITTI vision benchmark suite,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3354–3361, 2012.
    [68] V. Kolmogorov and R. Zabih, “Computing visual correspondence with occlusions using graph cuts,” Proceedings of the IEEE International Conference on Computer Vision, vol. 2, pp. 508–515, 2001.
    [69] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” Proceedings of the IEEE International Conference on Computer Vision, vol. 1, pp. 377–384, 1999.
    [70] J. Sun, N. N. Zheng, and H. Y. Shum, “Stereo matching using belief propagation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 7, pp. 787–800, 2003.
    [71] A. Klaus, M. Sormann, and K. Karner, “Segmentbased stereo matching using belief propagation and a selfadapting dissimilarity measure,” Proceedings International Conference on Pattern Recognition, vol. 3, pp. 15–18, 2006.
    [72] Q. Yang, L. Wang, R. Yang, H. Stewénius, and D. Nistér, “Stereo matching with colorweighted correlation, hierarchical belief propagation, and occlusion handling,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 3, pp. 492– 504, 2009.
    [73] O. Veksler, “Stereo correspondence by dynamic programming on a tree,” Proceedings 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. II, pp. 384–390, 2005.
    [74] Y. KukJin and K. InSo,“Adaptive supportweight approach for correspondence search,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 4, pp. 650–656, 2006.
    [75] K. Zhang, J. Lu, and G. Lafruit, “Crossbased local stereo matching using orthogonal integral images,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 19, no. 7, pp. 1073–1079, 2009.
    [76] A. Hosni, M. Bleyer, M. Gelautz, and C. Rhemann, “Local stereo matching using geodesic support weights,” Proceedings International Conference on Image Processing, ICIP, pp. 2093–2096, 2009.
    [77] X. Mei, X. Sun, M. Zhou, S. Jiao, H. Wang, and X. Zhang, “On building an accurate stereo matching system on graphics hardware,” Proceedings of the IEEE International Conference on Computer Vision, pp. 467–474, 2011.
    [78] Q. Yang, “Stereo matching using tree filtering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 4, pp. 834–846, 2015.
    [79] Hirschmüller H, “Stereo processing by semiglobal matching and mutual information.,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 2, pp. 328–341, 2008.
    [80] C. Cigla and A. A. Alatan, “Information permeability for stereo matching,” Signal Processing: Image Communication, vol. 28, no. 9, pp. 1072–1088, 2013.
    [81] “Middlebury.” http://vision.middlebury.edu/stereo, 2013.
    [82] S. Drouyer, S. Beucher, M. Bilodeau, M. Moreaud, and L. Sorbier, “Sparse stereo disparity map densification using hierarchical image segmentation,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10225 LNCS, pp. 172–184, 2017.
    [83] Hirschmüller H, “Stereo processing by semiglobal matching and mutual information.,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 2, pp. 328–341, 2008.
    [84] C. Cigla and A. A. Alatan, “Information permeability for stereo matching,” Signal Processing: Image Communication, vol. 28, no. 9, pp. 1072–1088, 2013.
    [85] Y. Kim, H. Jung, D. Min, and K. Sohn, “Deep Monocular Depth Estimation via Integration of Global and Local Predictions,” IEEE Transactions on Image Processing, vol. 27, no. 8, pp. 4131–4144, 2018.
    [86] A. AtapourAbarghouei and T. P. Breckon, “RealTime Monocular Depth Estimation Using Synthetic Data with Domain Adaptation via Image Style Transfer,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2800–2810, 2018.
    [87] H. Fu, M. Gong, C. Wang, K. Batmanghelich, and D. Tao, “Deep Ordinal Regression Network for Monocular Depth Estimation,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2002–2011, 2018.
    [88] D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multiscale deep network,” Advances in Neural Information Processing Systems, vol. 3, no. January, pp. 2366–2374, 2014.
    [89] A. Roy and S. Todorovic, “Monocular depth estimation using neural regression forest,”Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016December, pp. 5506–5514, 2016.
    [90] N. Mayer, E. Ilg, P. Hausser, P. Fischer, D. Cremers, A. Dosovitskiy, and T. Brox,“A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016December, pp. 4040–4048, 2016.
    [91] X. Wang, R. Girshick, A. Gupta, and K. He, “Nonlocal Neural Networks,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 7794–7803, 2018.
    [92] A. Odena, V. Dumoulin, and C. Olah, “Deconvolution and Checkerboard Artifacts,”Distill, vol. 1, no. 10, 2017.
    [93] O. Ronneberger, P. Fischer, and T. Brox, “Unet: Convolutional networks for biomedical image segmentation,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9351, pp. 234–241, 2015.
    [94] M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, “Spatial transformer networks,” Advances in Neural Information Processing Systems, vol. 2015January, pp. 2017–2025, 2015.
    [95] H. Zhao, O. Gallo, I. Frosio, and J. Kautz, “Is L2 a Good Loss Function for Neural Networks for Image Processing?,” ISMRM 25th Annual Meeting and Exhibition, pp. 2–3, 2017.
    [96] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004.
    [97] F. Liu, C. Shen, and G. Lin, “Deep convolutional neural fields for depth estimation
    from a single image,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 0712June2015, pp. 5162–5170, 2015.
    [98] T. Zhou, M. Brown, N. Snavely, and D. G. Lowe, “Unsupervised learning of depth and ego-motion from video,” Proceedings 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017January, pp. 6612–6621, 2017.
    [99] Z. Yin and J. Shi, “GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1983–1992, 2018.
    [100] L. Zhang, “Fast stereo matching algorithm for intermediate view reconstruction of stereoscopic television images,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 16, no. 10, pp. 1259–1270, 2006.
    [101] N. Isakova, S. Başak, and A. C. Sönmez, “FPGA design and implementation of a realtime stereo vision system,” INISTA 2012 International Symposium on INnovations in Intelligent SysTems and Applications, 2012.
    [102] K. Zhang, J. Lu, and G. Lafruit, “Crossbased local stereo matching using orthogonal integral images,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 19, no. 7, pp. 1073–1079, 2009.
    [103] S. C. Chan, H. Y. Shum, and K. T. Ng, “Imagebased rendering and synthesis,” IEEE Signal Processing Magazine, vol. 24, no. 6, pp. 22–33, 2007.
    [104] M. Solh and G. Alregib, “Hierarchical holefilling for depthbased view synthesis in FTV and 3D video,” IEEE Journal on Selected Topics in Signal Processing, vol. 6, no. 5, pp. 495–504, 2012.
    [105] T. C. Yang, P. C. Kuo, B. D. Liu, and J. F. Yang, “Depth imagebased rendering with edgeoriented hole filling for multiview synthesis,” 2013 International Conference on Communications, Circuits and Systems, ICCCAS 2013, vol. 1, pp. 50–53, 2013.
    [106] S. Carmichael, “Using 3D Immersive Technologies for Organizational Development and Collaboration,” 2011.
    [107] S. Carmichael, “For more companies, new ways of seeing momentum is building for augmented and virtual reality in the enterprise,” Oragnizational Dynamics Working Papers, University of Pennsylvania, 2017.
    [108] H. Müller, “the Future:Now streaming,” Indian Media and Entertainment Industrial Report, no. March, pp. 1–5, 2016.
    [109] A. Redert, M. O. D. Beeck, C. Fehn, W. Ijsselsteijn, M. Pollefeys, L. V. Gool, E. Ofek, I. Sexton, and P. Surman, “ATTEST : Advanced Threedimensional Television System Technologies,” Data Processing, 2002.
    [110] Fehn, “Depth image based rendering, compression and transmission for a new approach on 3DTV,” Data Processing, 2004.
    [111] C. Fehn, “A 3DTV approach using depthimagebased rendering (DIBR),” Visualization, Imaging, and Image Processing, pp. 482–487, 2003.
    [112] Philip, “Philips 3D Solutions: 3D Interface Specifications,” white paper, 2006.
    [113] “3d hevc test model 3,” 2013.
    [114] J. F. Yang, H. M. Wang, and A. T. Chiang, “2D backwards compatible centralized colordepth packing,” Joint Collaborative Team on 3D Video Coding Extensions of ITUT SG 16 WP 3 and ISO/IEC JTC I/SC 29/WG 11, the 6th Meeting: Document: JCT3VF0087, 2013.
    [115] J. F. Yang, K. Y. Liao, H. M. Wang, and C. Y. Chen, “Centralized texturedepth packing (CTDP) SEI message,” pp. ITU–T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, Do, 2015.
    [116] J.F. and Yang and G. Sullivan, “Conceptual Study of Potential Centralized Texture Depth Packing SEI Message,” pp. document JCTVC–AA1008, ITU–T SG 16 WP 3 and ISO/IE, 2017.
    [117] J. Park, H. Kim, YuWing Tai, M. S. Brown, and I. Kweon, “High quality depth map upsampling for 3DTOF cameras,” Proceedings of the IEEE International Conference on Computer Vision, pp. 1623–1630, 2011.
    [118] J. Kopf, M. F. Cohen, D. Lischinski, and M. Uyttendaele, “Joint bilateral upsampling,” ACM Transactions on Graphics, vol. 26, no. 3, 2007.
    [119] M. Tanimoto, T. Fujii, and K. Suzuki, “View Synthesis Algorithm in View Synthesis Reference Software 2.0 (VSRS2.0),” Iso/Iec Jtc1/Sc29/Wg11, pp. 2–6, 2009.
    [120] G. J. Sullivan, Y.k. Wang, and T. Wiegand, “High Efficiency Video Coding (HEVC) text specification draft 10,” ITUT SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11 12th Meeting: Geneva, CH,, pp. 14–23, 2013.
    [121] C. Rosewarne, K. Sharman, and D. Flynn, “Common test conditions and software reference configurations for HEVC range extensions Output,” International journal of neural systems, vol. 24, no. 2, p. 1403001, 2014.
    [122] D. Rusanovskyy, K. Müller, and A. Vetro, “Common Test Conditions of 3DV Core Experiments,”JCT3VC1100, Joint Collaborative Team on 3D Video Coding Extension Development of ITUT SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 1st Meeting, p. 8, 2013.
    [123] G. Bjontegaard, “Calculation of average PSNR differences between RDcurves,” pp. 2–4, 2001.
    [124] L. A. Gatys, A. S. Ecker, and M. Bethge, “Image Style Transfer Using Convolutional Neural Networks,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016December, pp. 2414–2423, 2016.
    [125] J. Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired ImagetoImage Translation Using CycleConsistent Adversarial Networks,” Proceedings of the IEEE International Conference on Computer Vision, vol. 2017October, pp. 2242–2251, 2017.
    [126] D. Cristinacce and T. Cootes, “Boosted regression active shape models,” BMVC 2007 Proceedings of the British Machine Vision Conference 2007, 2007.
    [127] T. F. Cooles, G. J. Edwards, and C. J. Taylor, “Active appearance models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp. 681– 685, 2001.
    [128] J. Matthews and S. Baker, “Active appearance models revisited,” International Journal of Computer Vision, vol. 60, no. 2, pp. 135–164, 2004.
    [129] P. Dollár, P. Welinder, and P. Perona, “Cascaded pose regression,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1078–1085, 2010.
    [130] M. Valstar, B. Martinez, X. Binefa, and M. Pantic, “Facial point detection using boosted regression and graph models,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2729–2736, 2010.
    [131] J. Breneman, “Kernel Methods for Pattern Analysis,” Technometrics, vol. 47, no. 2, pp. 237–237, 2005.
    [132] S. Mika, G. Ratsch, J. Weston, B. Scholkopf, and K. R. Muller, “Fisher discriminant analysis with kernels,” Neural Networks for Signal Processing Proceedings of the IEEE Workshop, pp. 41–48, 1999.
    [133] I. Naseem, R. Togneri, and M. Bennamoun, “Linear regression for face recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 11, pp. 2106–2112, 2010.
    [134] P. V. Jones and M. J, “Robust RealTime Face Detection,” International journal of computer vision, vol. 57, no. 2, pp. 137–154, 2004.
    [135] P. N. Belhumeur, D. W. Jacobs, D. J. Kriegman, and N. Kumar, “Localizing parts of faces using a consensus of exemplars,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 12, pp. 2930–2940, 2013.
    [136] V. Le, J. Brandt, Z. Lin, L. Bourdev, and T. S. Huang, “Interactive facial feature localization,”Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7574 LNCS, no. PART 3, pp. 679–692, 2012.
    [137] V. Kazemi and J. Sullivan, “One millisecond face alignment with an ensemble of regression trees,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1867–1874, 2014.
    [138] X. P. BurgosArtizzu, P. Perona, and P. Dollar, “Robust face landmark estimation under occlusion,” Proceedings of the IEEE International Conference on Computer Vision, pp. 1513–1520, 2013.
    [139] X. Xiong and F. De La Torre, “Supervised descent method and its applications to face alignment,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 532–539, 2013.
    [140] a. M. Martinez and R. Benavente, “The AR face database,” CVC Technical Report 24, vol. 6, 1998.
    [141] P. J. Phillips, P. J. Flynn, T. Scruggs, K. W. Bowyer, J. Chang, K. Hoffman, J. Marques, J. Min, and W. Worek, “Overview of the face recognition grand challenge,” Proceedings 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. I, pp. 947–954, 2005.
    [142] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210–227, 2009.
    [143] X. He, S. Yan, Y. Hu, P. Niyogi, and H. J. Zhang, “Face recognition using Laplacianfaces,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 3, pp. 328–340, 2005.
    [144] X. He, D. Cai, S. Yan, and H. J. Zhang, “Neighborhood preserving embedding,” Proceedings of the IEEE International Conference on Computer Vision, vol. II, pp. 1208–1213, 2005.
    [145] S. M. Huang and J. F. Yang, “Improved principal component regression for face recognition under illumination variations,” IEEE Signal Processing Letters, vol. 19, no. 4, pp. 179–182, 2012.
    [146] S.M. Huang and J.F. Yang, “Unitary Regression Classification With Total Minimum Projection Error for Face Recognition,” IEEE Signal Processing Letters, vol. 20, no. 5, pp. 443–446, 2013.
    [147] S. M. Huang and J. F. Yang, “Linear discriminant regression classification for face recognition,” IEEE Signal Processing Letters, vol. 20, no. 1, pp. 91–94, 2013.
    [148] Y. T. Chou, S. M. Huang, and J. F. Yang, “Classspecific kernel linear regression classification for face recognition under lowresolution and illumination variation conditions,” Eurasip Journal on Advances in Signal Processing, vol. 2016, no. 1, pp. 1–9, 2016.
    [149] K. Dale, K. Sunkavalli, H. Pfister, M. K. Johnson, W. Matusik, D. Vlasic, and W. Matusik,“Video Face Replacement,” ACM Transactions on Graphics, vol. 30, no. 6, pp. 1–10, 2011.
    [150] I. Korshunova, W. Shi, J. Dambre, and L. Theis, “Fast FaceSwap Using Convolutional Neural Networks,” Proceedings of the IEEE International Conference on Computer Vision, vol. 2017October, pp. 3697–3705, 2017.
    [151] Y. Nirkin, I. Masi, A. T. Tuǎn, T. Hassner, and G. Medioni, “On face segmentation, face swapping, and face perception,” Proceedings 13th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2018, pp. 98–105, 2018.
    [152] P. Baldi, “Autoencoders, Unsupervised Learning, and Deep Architectures,” ICML Unsupervised and Transfer Learning, pp. 37–50, 2012.
    [153] “Deepfake.” https://github.com/joshuawu/ deepfakesfaceswap, 2017.
    [154] W. Shi, J. Caballero, F. Huszar, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, “RealTime Single Image and Video SuperResolution Using an Efficient SubPixel Convolutional Neural Network,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016December, pp. 1874–1883, 2016.
    [155] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, “Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks,” IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499–1503, 2016.
    [156] T. Y. Sun, S. J. Tsai, and V. Chan, “HSI color model based lanemarking detection,”IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC, pp. 1168–1172, 2006.
    [157] Z. Teng, J. H. Kim, and D. J. Kang, “Realtime lane detection by using multiple cues,” ICCAS 2010 International Conference on Control, Automation and Systems, pp. 2334–2337, 2010.
    [158] K. Y. Chiu and S. F. Lin, “Lane detection using colorbased segmentation,” IEEE Intelligent Vehicles Symposium, Proceedings, vol. 2005, pp. 706–711, 2005.
    [159] D. Ding, C. Lee, and K. Y. Lee, “An adaptive road ROI determination algorithm for lane detection,” IEEE Region 10 Annual International Conference, Proceedings/TENCON, 2013.
    [160] P. Chanawangsa and C. W. Chen, “A new colorbased lane detection via gaussian radial basis function networks,” Proceedings 2012 International Conference on Connected Vehicles and Expo, ICCVE 2012, pp. 166–171, 2012.
    [161] Y. He, H. Wang, and B. Zhang, “Colorbased road detection in urban traffic scenes,”IEEE Transactions on Intelligent Transportation Systems, vol. 5, no. 4, pp. 309–318, 2004.
    [162] M. A. Sotelo, F. J. Rodriguez, L. Magdalena, L. M. Bergasa, and L. Boquete, “A color visionbased lane tracking system for autonomous driving on unmarked roads,”Autonomous Robots, vol. 16, no. 1, pp. 95–116, 2004.
    [163] T. T. Tran, C. S. Bae, Y. N. Kim, H. M. Cho, and S. B. Cho, “An adaptive method for lane marking detection based on HSI color model,” Communications in Computer and Information Science, vol. 93 CCIS, pp. 304–311, 2010.
    [164] G. Kaur and D. Kumar, “Lane Detection Techniques: A Review,” International Journal of Computer Applications, vol. 112, no. 10, pp. 975–8887, 2015.
    [165] W.T.~Freeman and E.H.~Adelson, “The design and use of steerable filters,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 9, pp. 891–906, 1991.
    [166] R. Grompone Von Gioi, J. Jakubowicz, J. M. Morel, and G. Randall, “LSD: A fast line segment detector with a false detection control,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 4, pp. 722–732, 2010.
    [167] C. Tu, B. van Wyk, Y. Hamam, K. Djouani, and S. Du, “Vehicle Position Monitoring Using Hough Transform,” IERI Procedia, vol. 4, pp. 316–322, 2013.
    [168] S. Srivastava, M. Lumb, and R. Singal, “Improved Lane Detection Using Hybrid Median Filter and Modified Hough Transform,” International Journal of Advanced Research inComputer Science and Software Engineering, vol. 4, pp. 30–37, 2014.
    [169] X. Pan, J. Shi, P. Luo, X. Wang, and X. Tang, “Spatial as deep: Spatial CNN for traffic scene understanding,” 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, pp. 7276–7283, 2018.
    [170] J. Li, X. Mei, D. Prokhorov, and D. Tao, “Deep Neural Network for Structural Prediction and Lane Detection in Traffic Scene,” IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 3, pp. 690–703, 2017.
    [171] B. Huval, T. Wang, S. Tandon, J. Kiske, W. Song, J. Pazhayampallil, M. Andriluka, P. Rajpurkar, T. Migimatsu, R. ChengYue, F. Mujica, A. Coates, and A. Y. Ng, “An Empirical Evaluation of Deep Learning on Highway Driving,” pp. 1–7, 2015.
    [172] V. Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: A Deep Convolutional EncoderDecoder Architecture for Image Segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 12, pp. 2481–2495, 2017.
    [173] A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, “ENet: A Deep Neural Network Architecture for RealTime Semantic Segmentation,” 2016.
    [174] D. Neven, B. De Brabandere, S. Georgoulis, M. Proesmans, and L. Van Gool, “Towards EndtoEnd Lane Detection: An Instance Segmentation Approach,” IEEE Intelligent Vehicles Symposium, Proceedings, vol. 2018June, pp. 286–291, 2018.
    [175] B. De Brabandere, D. Neven, and L. Van Gool, “Semantic Instance Segmentation with a Discriminative Loss Function,” 2017.
    [176] G. Hinton and L. Maaten, “Visualizing HighDimensional Data Using tSNE,” Journal of Machine Learning Research, vol. 9, no. Nov, pp. 2579–2605, 2008.
    [177] C. Y., “Mean Shift, Mode Seeking and Clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 17, no. 8, pp. 790–799, 1995.
    [178] “Tusimple lane challange.” http://benchmark.tusimple.ai/, 2017.
    [179] “Culane dataset.” https://xingangpan.github.io/projects/CULane.html, 2018.
    [180] Z. Liu, X. Li, P. Luo, C. C. Loy, and X. Tang, “Deep Learning Markov Random Field for Semantic Segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 8, pp. 1814–1828, 2018.
    [181] Y. C. Hsu, Z. Xu, Z. Kira, and J. Huang, “Learning to Cluster for ProposalFree Instance Segmentation,” Proceedings of the International Joint Conference on Neural Networks, vol. 2018July, 2018.
    [182] R. Girshick, “Fast rcnn,” 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448, 2015.
    [183] J. R. Uijlings, K. E. Van De Sande, T. Gevers, and A. W. Smeulders, “Selective search for object recognition,” International Journal of Computer Vision, vol. 104, no. 2, pp. 154–171, 2013.
    [184] C.Y. Fu, W. Liu, A. Ranga, A. Tyagi, and A. C. Berg, “DSSD : Deconvolutional Single Shot Detector,” arXiv preprint, 2017.
    [185] Z. Li and F. Zhou, “FSSD: Feature Fusion Single Shot Multibox Detector,” arXiv preprint, 2017.
    [186] “Icme2020 grand challenge pair competition.” https://pairlabs.ai/zh/icme2020grandchallengepaircompetition/, 2020.

    下載圖示 校內:2024-12-31公開
    校外:2025-07-01公開
    QR CODE