What is

  • Make a computer understand images and “tell a story” like human beings.
  • To bridge the gap between pixels and “meaning”.
  • Vision as a source of semantic information.

Philosophy in human vision system

  1. 等级制的—->多尺度融合
  2. 中心偏置的—->正则化
  3. 显著性—->显著性检测

Covers

  • compute 3D structure (shape and motion capture)
  • recognition (对象检测、语义分割、图像描述、行为识别)
  • Image enhancement (背景模糊、超分辨率重建、去噪、阴影去除、去模糊)
  • Image editing (风格迁移、图像生成、图像修复、图像填补)

Application

  • OCR (Optical Character Recognition)
  • Face detection and analysis (smile detection…)
  • Fingerprint/ face unlock
  • recreation(例如,张嘴吐口红)
  • Google maps: Annotate all houses and streets
  • Amazon Go (supermarket)
  • tracking 追踪
  • autonomous vehicles
  • robotics
  • medical diagnosis
  • vision-based interaction and games (运动手环)
  • Augmented Reality (AR 增强现实)
  • Virtual Reality

Challenges

  • view variance
  • weak lighting
  • scale discrepancy
  • Intra-class variance
  • motion
  • cluttered background
  • occlusion
  • blur