Introduction to Augmented Reality

Fivos DOGANIS

Course contents

whoami

💼 linkedin.com/in/fivosdoganis
📧 fivos.doganis@gmail.com
🐦 @fdoganis

photo de couverture: www.instagram.com/steveroe_

University of Hull

  • Master of Science by Research (2001)
    Augmented Reality in Archaeology: Registration Issues

IRCAD (2002 - 2003)

  • Institut de Recherche contre les Cancers de l'Appareil Digestif
  • Startup
    • Virtual-Surg team
  • Augmented Reality Research Engineer

Dassault Systèmes (2003+)

  • 3D Visualization Engineer
    • Scenegraph
    • Materials
    • Geometry, Tessellation
  • Virtual and Augmented Reality (XR) Engineer
  • XR Research Engineer
  • XR Research Manager

Dassault Systèmes

From Shape to Life

Course audience

  • Anyone looking for a simple introduction to Augmented Reality
  • Computer Science students and engineers looking for a way to create cross-platform XR prototypes or even full-fledged apps

➡️ Feel free to skim through technical sections and use this course as future reference

Course prerequisites

  • Math
  • Programming
    • JavaScript notions, or any similar language (HTML kept minimal)
  • 3D Web API
    • THREE.js notions strongly recommended (see Web 3D course)
    • alternatives: Babylon.js, WebGL, WebGPU
  • 3D Software (Blender, Unity, Unreal Engine, Godot Engine)

📅 Planning

  • Day 1 (6 hours)

    • 📖 Theory
    • 🍜 Lunch
    • 💻 Exercises
    • ⚗️ Explore examples + choose a personal project
  • Day 2 (6 hours)

    • 🖊️ Evaluation: Quizz 20 questions / ~20 min
    • ⭐ Personal project / game jam ➡️ bonus points!

Project evaluation criteria

Bonus points for:

  • originality 👀
  • interactions 👋
  • physics 💥 / animations 🏃 / sounds 🎶 / eye-candy 🎆
  • GIS 🌍
  • code quality ✨, tricks 😏, performance ⏱️
  • fun 🎉
  • clever use of AR 📱 🥽

AR Applications

Consumer applications

Pokemon GO AR+

Minecraft Earth

SnapChat

IKEA Place

HomeByMe

IGN Time Machine

Professional applications

  • Industry
  • Healthcare
  • Marketing

Renault Trucks

Zeal AR

Alain Afflelou

Definitions

Definitions

Milgram, Paul; H. Takemura; A. Utsumi; F. Kishino (1994). "Augmented Reality: A class of displays on the reality-virtuality continuum".

Google's version

Properties of an AR system ⭐

(according to Azuma, 2001)

  • combines real and virtual objects in a real environment
  • runs interactively, and in real time
  • registers (aligns) real and virtual objects with each other.

Not AR:

  • special effects in movies
    • technology close to AR
    • not real time ❌
    • not in a real environment ❌
  • Google Glass
    • combines real and virtual objects in a real environment ✅
    • no registration ❌
    • it's a HUD (Head-Up Display)
      • can still be useful! (maintenance, sports etc.)

Google Glass concept video (2012)

This is not AR!!!

Definitions ⭐

  • VR : Virtual Reality Jaron Lanier, 1987

  • AR : Augmented Reality Thomas P. Caudell, 1990

  • MR : Mixed Reality

    • marketing term used by Microsoft
    • ⚠️ no clear definition! ➡️ Term must be defined before use!
  • XR : X = { eXtended / Cross (+) / Any (*) / A+V } Reality

    • recent generic term which encompasses AR and VR

AR | VR

AR | VR

AR or VR?

  • Similar technologies

    • 3D rendering
    • Tracking
    • Immersive interactions
  • Different effects on the user

Effects of VR

  • Isolates the user from the real world
  • Teleports the user to another world, which is entirely virtual

Tiltbrush

ITI

The limits of VR

  • Reminder: continuum!
    • No clear boundaries
  • When the whole world is modeled and registered in 3D,
    is it still VR?
  • Photogrammetry / Lightfields
    • VR but immersion in a world entirely rebuilt in 3D

Greg Madison @ Unity

VersaillesVR

Effects of AR

  • The user stays in the real world
  • AR enhances the real world with contextual information
  • Augmented user: acquires new senses!
  • Information becomes visible
    • spatialized information overlaid on top of the real world

Google Maps AR

The browser of the future? 🧐

Audi AR manual (Metaio, 2012)

The limits of AR

  • Reminder: continuum!
    • No clear boundaries
  • When more virtual elements than real ones: Augmented Virtuality
    • Window to the real world
    • Real users visible

Augmented Virtuality

Varjo Teleport

video

Dangers of AR ⭐

  • Information overload: Hyper-reality
  • Excessive assistance, altered behaviors, surveillance
    • Black Mirror: Nosedive
  • Digital divide
    • Some people will feel handicaped, missing a sense, daltonians
  • Privacy: Cloud Wars
MAMAA = Meta Alphabet Microsoft Apple Amazon

Hyper-reality (concept)

Black Mirror (fiction)

Scene Responsiveness (Meta, 2023)

paper, video

Takeaways ⭐

  • VR immerses the user in a virtual wolrd
  • AR brings virtual objects into the real world

Choosing the right paradigm

  • Immersion useful ?

    • Yes ➡️ VR
    • No ➡️ 3D
  • Immersion and real environment useful ?

    • Yes ➡️ AR
    • No ➡️ VR
  • Keep in mind continuum to pick the right paradigm to create the best possible experience

Choosing the right paradigm

History

Understand technological evolutions to anticipate the future

History

Key milestones

Prehistory (1966) ⭐

Markers (1999)

  • Monochrome markers
  • ARToolkit created by Hirokazu Kato

  • Alternatives: ARTag, ArUco
  • PC + Webcam

NFT, GPS (2005)

  • NFT: Natural Feature Tracking
    • Color photo tracking
  • Wikitude, Layar (GPS)
    • no image processing needed with GPS!
  • Vuforia
  • Marketing use-cases
  • PC, mobile phones, tablets

SLAM, 3D (2015)

  • 3D environment tracking
  • SLAM: Simultaneous Location And Mapping ⭐
  • 3D object tracking
  • Deep Learning
  • Occlusion 3D
  • ARKit, ARCore
  • Smartphones, HoloLens, Azure Kinect

Azure Kinect + HoloLens 2

HoloLens 2

Apple LiDAR

iPad Pro 2020, iPhone 12 Pro

Apple LiDAR vs FaceID

Near future

  • Form-factor: glasses 😎
  • AI
    • contextual assistance
    • understands both environment and user
  • 5G
    • application and information streaming (Edge Computing)
  • Spatialized Web: AR Cloud

Far future

  • AR will replace or complement smartphones
    • users will raise their heads again
      • but will they see better?
  • Contact Lens (Mojo Vision)
  • Ambient Computing
  • Ubiquitous Computing
  • Smart Cities

Gartner Hype Cycle

Where do we stand now?

2023

MAMAA Strategies

Meta, Alphabet, Microsoft, Apple, Amazon
and the others!

AR [is for] adding shared meaning in the interaction between people.
Johnny Lee, Google I/O 2017

  • Dropped mobile VR (Cardboard 💀, Daydream 💀)
  • Dropped Tango 💀, to reach more devices: rely on RGB camera + AI
  • API ARCore, competes with Apple's ARKit
  • Google wants to provide cross-platform AR services
  • Google + Qualcomm + Samsung XR Headset coming in 2024

The Web connects the world's information, and AR connects information with the physical world. So together they can be applied to solve real life problems.
Andrey Doronichev, Google I/O 2017

Google Glass Enterprise Edition 2

Google Glass killed, for the second time, in 2023 💀

Gorillaz mobile AR app

I’m excited about AR [...] My view is it’s the next big thing, and it will pervade our entire lives.
Tim Cook, Apple CEO, 2020, via Silicon Republic

Hardware

  • Adds LiDAR for a robust SLAM (e.g. white walls scenario)

  • Extends its 'wearables' category

    • AirPods
    • Apple Watch
    • Apple Vision Pro
      • unveiled in June 2023
      • released in February 2024

People Occlusion + Scene Understanding (iOS)

fill

Eye tracking (visionOS)

"Spatial Computing", "EyeSight", Real Virtual Continuum

Avatars

Collaboration

fill

Oculus Infinite Office

fill

Oculus Meta Quest 2

Meta Quest Pro

Meta Quest 3: focus on AR

Reverse Passthrough prototype (CAD render)
video

fill

Michael Abrash in 2019

Project Aria

Next?

I might get myself in trouble for saying this; I think it might be the most advanced piece of technology on the planet in its domain. In the domain of consumer electronics, it might be the most advanced thing that we’ve ever produced as a species.
Andrew 'Boz' Bosworth, Meta CTO, January 2024

  • Amazon focuses on e-commerce and its Web Services
  • AR View to see a product at home before buying it
  • Offers Sumerian as a paid tool via AWS (Amazon Web Services) to create XR experiences
  • Pushes machine learning, smart assistants (Alexa)
  • Bets on AR on demand via 5G with its Wavelength Project
    • 5G + Edge computing
    • AWS

Other players

Takeaways

  • Big tech companies invest massively in AR, which they see as a promising technology evolving fast
    • hardware
    • algorithms
    • services, data
  • Many players try to bring their users into their closed ecosystem (hardware, app store, cloud)
  • Others focus ont the openness of the Web to create and share open AR experiences
    • ➡️ ultimate goal of this course! 🎉

Further reading

3 Types of AR ⭐

  • Video
    • e.g.: smartphone,
      Meta Quest 3, Apple Vision Pro, Lynx-R1*
  • Optical
  • Projective
Lynx-R1: see next page ⬇️

Lynx-R1 (video)

PAUSE

30' ⌛

Required technologies for AR

Calibration
Tracking
Interactions
Rendering

Calibration

Goal: overlay accurately the virtual rendering and the real image

Optical AR calibration

  • very complex
  • hardware dependent
    • projection and image formation systems
  • depends on the body metrics of the user
  • made and provided by the AR hardware manufacturer
    • possible adjustments for each user, cf. eye calibration in HoloLens

Video camera calibration

  • Goal: compute the optical parameters of the real camera
    • focal length
    • radial distortion, lens imperfections
  • Method:
    • capture images of known patterns (grids, calibration patterns) with a real camera
  • ⚠️ the focal length may be variable (autofocus)
    • update calibration data for each frame
    • calibration data is provided by the API (ARKit, ARCore, WebXR)

Video camera calibration method

scale

Pinhole camera model

Extrinsic and intrinsic parameters

3D coordinates ➡️ Camera 3D coordinates ➡️ Image coordinates

Projection ⭐

s  p=A[Rt]Ps \; p = A [R|t] P

s[uv1]=[fx0cx0fycy001][r11r12r13txr21r22r23tyr31r32r33tz][XwYwZw1]s \begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = \begin{bmatrix} {f_x} & {0} & {c_x} \\ {0} & {f_y} & {c_y} & \\ {0} & {0} & {1} \end{bmatrix} \begin{bmatrix} r_{11} & r_{12} & r_{13} & t_x \\ r_{21} & r_{22} & r_{23} & t_y \\ r_{31} & r_{32} & r_{33} & t_z \end{bmatrix} \begin{bmatrix} X_w \\ Y_w \\ Z_w \\ 1 \end{bmatrix}

(Xw,Yw,Zw)(X_w, Y_w, Z_w) 3D world coordinates OwO_w
(u,v)(u, v) projected coordinates (pixels)
[Rt][R|t] extrinsic matrix, AA intrinsic matrix
(cx,cy)(cx, cy) principal point (pixels), center of the image in the ideal case
fxf_x et fyf_y focals along x and y (pixels), equal in the ideal case

Non linear radial distortion

  • due to the lens,approximated by a polynomial expression
    xdistorted=x(1+k1r2+k2r4+k3r6)x_{distorted} = x(1 + k_1 r^2 + k_2 r^4 + k_3 r^6)
    ydistorted=y(1+k1r2+k2r4+k3r6)y_{distorted} = y(1 + k_1 r^2 + k_2 r^4 + k_3 r^6)

scale

Registration

Goal: find the rigid transformation [Rt][R|t] between a 3D point in the world and the center of the camera

Pose estimation

• Computed from 2D/3D pairs of points
• Optimization: projection error minmization between transformed 3D points ViV_i et image 2D points viv_i

arg minR,tiP(RVi+t)vi\argmin_{R,t}\displaystyle\sum_{i} ||P(R V_i + t) - v_i ||

PP: projection function
RR: rotation matrix
tt: translation vector

Tracking

after initial registration

Tracking

  • Degrees Of Freedom (DOF):
    • 0 DOF
      • no tracking!
      • simple information overlay, cf. HUD
    • 3 DOF
      • rotation only (gyroscope, accelerometer, compass)
        • limited experience (can be good enough, cf. planetarium)
    • 6 DOF
      • rotation + position

3 DOF

6 DOF

Tracking techniques ⭐

GPS
Marker
NFT
SLAM
3D

Tracking techniques

  • GPS
    • global, satellite based, no network connectivity required ✅
    • no image processing ✅
    • outdoors only ❌
    • slow ❌
    • not very accurate ❌

Tracking techniques

  • Marker
    • accurate, fast ✅
    • tangible, printable ✅
    • need to display a marker to enable AR ❌
    • non-aesthetic ❌
    • can be hard to detect (low lighting, motion blur, occlusions) ❌
  • NFT: same but
    • more aesthetic, easier to embed in the real world (ads) ✅
    • more robust to occlusions ✅

Valve VR HMD early prototype

Tracking techniques

  • SLAM ⭐ : NFT evolution + reconstruction
    • more natural markerless experience ✅
    • partial scene reconstruction ✅
      • allows advanced functionalities (occlusions, collisions etc.)
    • not very accurate ❌
      • drift, loop closure
      • scene reconstructed and refined in real-time
      • difficult to define the origin of the scene
        • stable anchor points required

Tracking techniques

  • 3D object detection in a real scene
    • using computer vision (lighting, edges, silhouette)
      • generic algorithm ✅
      • but slow, especially during initial registration ❌
    • using Deep Learning
      • faster initial detection ✅
      • more robust regarding occlusions and lighting changes ✅
      • not generic: requires per model training ❌

Tracking techniques

Conclusion

  • No tracking technique is ideal
  • Keep them all in mind and choose the right one according to:
    • the scenario of the AR experience
      • industrial context, consumer, generic or specific
    • constraints
      • indoor, outdoor, mobile

Rendering

Rendering

  • Realistic or not
  • Lighting
    • detect the direction and intensity of real lights
    • fast environment reconstruction to simulate reflections (SLAM + AI)
  • Occlusions

Interactions

  • The missing part of the equation
  • Often neglected (cf. NReal)
  • Myth of the dying mouse (p. 17)
    • each form factor has an optimal interaction technique
    • most headsets handle hand tracking, but also offer controller, keyboard and mouse support!
  • The XR equivalent of the mouse has not been invented yet!

Interaction techniques

  • Screen, when using a smartphone 📱
    • not very immersive but accurate, and provides tactile feedback
  • Controllers with buttons 🎮
    • great haptic feedback but not immersive
  • HoloLens GGV : Gaze, Gesture, Voice 👀 ✋ 👄
    • natural interactions, with no external hardware
    • great but tiring, lacks privacy ("hey Cortana!"), and accuracy
  • Tangible interactions ✏️ 🔲
    • markers or accessories to add some tactile feedback ⬇️

Interactions

Conclusion

  • Immersive AR interactions have yet to be invented!
  • No interaction paradigm has become a standard yet
  • We must guide the users and try to understand their intent

End

of part 1!

Questions?

LUNCH BREAK

🍜

back at 1:30 PM

Extra :)

Reconstruction 3D

Links

http://www.ign.fr/institut/innovation/minecraft-a-carte

http://lsc.univ-evry.fr/~didier/home/lib/exe/fetch.php?media=cours:ra:ra.pdf

Photo Credits

https://unsplash.com/photos/RgPVZvA4wBM

https://unsplash.com/photos/r2CAjGQ0gSI

www.instagram.com/steveroe_

# ![height:110px](https://www.isep.fr/wp-content/themes/isep/img/logo_isep.svg)

![bg right](https://d201n44z4ifond.cloudfront.net/wp-content/uploads/sites/6/2019/10/16165332/Screen-Shot-2019-10-15-at-12.56.22-PM.png)

![bg left:50%](https://events.3ds.com/sites/default/files/styles/playground_experience_medium/public/2019-03/immersive-collaboration-experience-tribe-2-experience-1_0.PNG)

![bg](https://media.wired.com/photos/59267509f3e2356fd80094d1/master/pass/Singapore_Bishan_windsim_HP.jpg)

![bg fit](https://mms.businesswire.com/media/20141112006720/en/441072/4/Dassault_Systemes_Living_Heart_1.jpg)

![bg 110%](http://www.cao.fr/images_cp/plein/6772.jpg)

https://marknb00.medium.com/what-is-mixed-reality-60e5cc284330

fr

![bg fit](https://i0.wp.com/www.createursdemondes.fr/wp-content/uploads/2017/03/arPrincipe.jpg)

![bg right](https://thumbor.sd-cdn.fr/hvoAvvyScQUi2YBbwZMOdoskUTY=/fit-in/540x1000/cdn.sd-cdn.fr/wp-content/uploads/2019/12/cropped_cat_on.gif)

nreal

edge computing

![bg](https://thumbs.gfycat.com/GeneralSmugAmericanwarmblood-size_restricted.gif)

![bg right](https://thumbs.gfycat.com/AromaticSardonicFirebelliedtoad-size_restricted.gif)

![bg right]( https://www.maddyness.com/wp-content/uploads/2015/04/photo-diotasoft-laval-virtual.jpg)

pause

pause

TODO: Schemas: pinhole, formules, eqautions, damiers, lien vers Matlab, lien vers cours ENSG

voir mozvr ou GIF

à illustrer!

openvslam https://github.com/xdspacelab/openvslam/issues/108

![bg 140%](https://i.ytimg.com/vi/jNbYcw_dmcQ/maxresdefault.jpg)

Cover

https://unsplash.com/photos/muiuZ6cKtlA https://unsplash.com/photos/6Avhuh6UP2Y https://unsplash.com/photos/UVP-NlZEf0Y https://unsplash.com/photos/Ib2e4-Qy9mQ https://unsplash.com/photos/3MjyZPUZKIQ

Project cover

https://unsplash.com/photos/msnyz9L6gs4 https://unsplash.com/photos/T6BsBZdGwbg https://unsplash.com/photos/8r3Otv1zy0s https://unsplash.com/photos/eft_khJJgug https://unsplash.com/photos/qnBMlkav-j8 https://unsplash.com/photos/QJv-TlL1T9M https://unsplash.com/photos/KBDTG8IvlpI https://unsplash.com/photos/beIw89byFlw https://unsplash.com/photos/bs4qtd2NsGI https://unsplash.com/photos/lPbq-op9zno https://unsplash.com/photos/6vEqcR8Icbs https://unsplash.com/photos/qRkImTcLVZU https://unsplash.com/photos/Evp4iNF3DHQ https://unsplash.com/photos/7wBFsHWQDlk https://unsplash.com/photos/Vq2HnMA0Bp4 https://unsplash.com/photos/V_7xg72F3ls https://unsplash.com/photos/RPFL38ZZikA https://unsplash.com/photos/Ksn5ggA3L8s https://unsplash.com/photos/9Eheu3sIgrM