Ultimate Interface Lab's 2024 Papers

(2024) MoireTag (ISS 2024, Honorable Mention Award)

Peiyu Zhang, Wen Ying, Sara Riggs, Seongkook Heo

Click to view "MoireTag"

MoiréTag: A Low-Cost Tag for High-Precision Tangible Interactions without Active Components
(Abstract) In this paper, we present MoiréTag—a novel tag-like device that magnifies displacement without active components for indirect sensing of subtle tangible interactions. The device consists of two overlapping layers of stripe patterns with distinct pattern frequencies. These layers create Moiré fringes that can move faster than the actual movement of a layer. Using a customized image processing pipeline, we show that MoiréTag can reliably detect sub-mm movement in real-time (mean error = 0.043 mm) under varying lighting conditions, camera angles, and camera distances. We also demonstrate five applications of MoiréTag to showcase its potential as a low-cost solution to capture and monitor small changes in movement and other physical properties, such as force and volume, by converting them into displacement.
(Introduction) With the advancements in computing technologies, computers are now more closely connected to the real world than ever. Computers sense and utilize interactions that happen in the real world, such as tapping and scratching of a surface [12, 13, 41], and interacting with physical objects [17], enabling tangible interactions in scenarios where traditional user interfaces are not available or not effective. Many of these are done through active sensors worn on the user’s body [17, 41] or attached to the object surface [12, 13]. Some use conductive material to cover the object surface to sense interaction with the surface [44]. While useful, these methods need wires and active components to be worn on the user’s body or active circuits to be instrumented on the object surface.

On the other hand, recent advancements in computer vision and machine learning algorithms enabled non-contact indirect sensing of real-world input using a camera. For example, MediaPipe [1] and DeepPose [34] allow human body posture and movement to be tracked with a regular 2D camera by using deep learning algorithms, and YOLACT allows sensing and tracking of objects in a video [3] without instrumenting the user or the object with active sensors. These vision-based methods have the limitation of precision and stability of the sensed movement, making it more suitable for sensing large-scale movements such as moving an arm or a hand. However, in the real world, people also utilize small movements, such as precisely aligning objects, and even tiny actions that may not be easily noticeable, such as applying a force on a physical object. Other studies have shown that using laser speckles can enable high-accuracy sensing of small movements, such as the movement of hand and object [45] and the movement caused by the force interaction [23]. However, methods that use laser speckle require a speckle projector and a speckle sensor [45] or a defocused camera [23].

In this paper, we present MoiréTag, a novel low-cost tag that enables camera-based sensing of precise movement from physical interactions. It utilizes the Moiré phenomenon, which is seen when two repetitive patterns with similar spacings are superimposed in the form of dark and light fringes. If one of the superimposed patterns moves, the Moiré fringe will also move, but at a different rate than the pattern movement. This allows the small displacement caused by the subtle interaction to be significantly magnified so that it can be easily captured by a regular camera. This Moiré phenomenon has been shown to be effective in various applications, such as pose [21] and camera position [37]. With the well-established theory framework of Moiré pattern as the basis, MoiréTag provides an easy way for people to use the Moiré phenomenon to create and use ad-hoc interfaces. MoiréTag consists of two overlapping paper layers that have stripe patterns with different grating periods that create Moiré fringes that serve as a displacement magnifier for a camera to precisely capture the displacement. We implemented an image processing pipeline that recognizes the MoiréTag from a camera image and detects the movement of the Moiré fringes. Our technical evaluation showed that MoiréTag and the image processing pipeline could enable the sensing of sub-mm displacement using a regular smartphone camera with an average error of 0.043 mm. The evaluation also showed that the displacement could be reliably detected in various lighting conditions and when captured from different camera angles and distances.

MoiréTags can be configured in various ways to be used for measuring different types of physical properties. For example, MoiréTags with a handle attached to the moving layer can be used as a tangible user interface for applications requiring multiple precise inputs, such as video editing. When coupled with an elastic material, MoiréTags can serve as a force sensor by magnifying the small displacement caused by the compression or extension of the elastic material. MoiréTags can also be attached to a strap wrapped around the human chest to measure chest volume changes during breathing. Our evaluation showed that the force and the breathing count can be accurately measured using MoiréTags.

In summary, the main contributions of this work include the development of MoiréTag, a novel tag device that enables low-cost and precise camera-based sensing of subtle changes in displacement for tangible interactions, and the design and implementation of force and volume change sensing mechanisms using MoiréTag.
(Conclusion) We presented a novel camera-based approach to detect small displacement changes in daily interactions by utilizing a low-cost passive tag that magnifies displacement using Moiré fringes. The validation experiments show consistent and accurate sensing of displacement under varying lighting conditions, camera angles, and camera distances. Additionally, we demonstrate the conversion of touch force and volume changes into displacement and the ability to sense them using repurposed MoiréTag prototypes and our image processing pipeline in practical applications. MoiréTag offers the following advantages over existing sensing methods:
Affordable and easy-to-make tag design. The prototype can be made with paper alone, using a printable template and machine-cut stripe patterns for the fixed layer. If the chosen grating period for the fixed paper layer is sufficiently large, it can also be printed and manually cut.
Robust sensing in various lighting and different camera placements. The customized image processing pipeline is able to detect displacement with sub-mm errors under different lighting conditions and from varying camera angles and distances.
Adjustable sensing range and sensitivity. The sensing range can be adjusted by changing the grating periods of the two overlapping paper layers of stripe patterns to meet the different sensing needs.
Versatile sensing capabilities. MoiréTag can be repurposed as a sensor for touch force or volume changes during respiration by adding an elastic block or a strap.

Like all camera-based sensing methods, MoiréTag requires uninterrupted visibility within the camera’s field of view, without any obstructions. Our validation results indicate that MoiréTag can function effectively with a camera angle of 60 degrees with a camera distance of 60 cm. A greater camera angle might cause the camera to fail in capturing the Moiré fringes. Using a lens with a wider field of view may enable detection with a larger camera angle. Moreover, an extended camera distance may render MoiréTag too small for detection or cause the Moiré fringes to become too blurry for tracking. However, these issues could potentially be mitigated by employing a MoiréTag with a larger size or a camera with a higher resolution.

In addition to the jittering of cropped Moiré fringes resulting from slight variations in contours detected by OpenCV across frames, another issue contributing to errors in displacement sensing is related to drastic camera movement and angle changes. The gap between two layers of stripe patterns may also affect the grating period of Moiré fringes, and may cause noticeable movement if the camera moves drastically. In our design of prototypes, we tried to avoid creating such a gap with the 3D-printed structures, and tapes for the paper prototype. If the camera angles change greatly, for example from left perpendicular to the tag to right perpendicular to the tag, the structure above the screen may affect the rectification of the fringes and thus lead to detected Moiré fringes movement. Lastly, it is important to note that if the camera moves too quickly, motion blur can potentially impact the detection and tracking of Moiré fringes, especially in low-light conditions. If Moiré fringes move rapidly, the image processing pipeline’s ability to detect displacement changes may be constrained by the frame rate. A future work will be needed to investigate the tracking performance in more dynamic use scenarios and to optimize the pipeline to improve the frame rate.

MoiréTag was designed to measure movement in only one direction. One potential future improvement that could be made to MoiréTag is the use of Moiré patterns with different shapes, such as circular ones, to enable displacement measurement in multiple directions. Another improvement could be creating Moiré fringes on multiple surfaces to allow for displacement detection from more camera angles. Overall, MoiréTag demonstrates the potential of using Moiré fringes to accurately detect small displacements and indirectly infer various physical properties during everyday interactions.

MoireTag? 배터리나 센서 같은 액티브 부품(active component) 없이, 종이패턴과 카메라만으로 어떻게 하위 밀리미터 단위(sub-mm)의 초정밀 센싱을 구현할 것인가
- 컴퓨터와 현실세계의 상호작용을 감지하기 위해, 기존에는 사용자 몸이나 물체 표면에 유선케이블, 배터리가 필요한 능동형(active) 센서나 회로를 부착해야 했었다.
- 최근, 일반 2D카메라와 딥러닝(MediaPipe, YOLACT 등)을 활용한 비접촉식 감지기술이 발전했으나, 이는 해상도 한계로 인해 팔이나 손을 움직이는 대규모 동작 추적에만 적합할 뿐, 물체의 미세한 정렬이나 압력(힘)의 작용과 같은 미세한(sub-mm) 상호작용을 정밀하고 안정적으로 포착하기에는 기술적 한계가 존재한다.
- Laser speckle을 활용하는 정밀 방식 또한, 별도의 스펙클 프로젝터나 특수 센서 장비가 필요하다는 단점이 존재한다.
기존 감지 방식들의 trade-offs
- 능동형(active) 물리센서 시스템: 감지 정확도는 높지만 시스템 구축을 위해 배터리, 복잡한 회로소자, 유선연결이 필수적이므로, 비용이 많이 들고 배치에 있어서 제약이 크다.
- 일반 비전 기반 2D/depth 카메라 시스템: 추가적 장비 없이 비접촉 감지가 가능하지만, 픽셀 해상도 한계로 인해 미세변위를 알아내지 못하며, 카메라의 미세한 흔들림이나 소음에 매우 취약하다.
- Laser speckle 기반 시스템: 높은 정밀도로 미세거리를 측정할 수 있지만, 특수 프로젝터나 defocused 카메라 등 고가의 전용 하드웨어가 필요하다.
미세 상호작용 시각화 및 원격 감지의 한계
- 현실의 물리적 상호작용(누르는 힘, 부피변화 등)은 눈에 보이지 않거나, 사람의 눈 및 일반카메라로 감지하기 어려울 정도로 미세하다.
- 기존의 비전 기반 시스템은, 고정된 배경이나 완벽하게 안정된 카메라 환경을 전제로 해야만 프레임 간 픽셀 비교를 통해 변위를 겨우 찾아낼 수 있다.
- 조명 조건이 통제되지 않거나, 카메라의 각도·거리가 유동적으로 변하는 실제 일상 환경(Non-light-controlled setting)에서는 미세한 변위와 외부 힘을 간접적으로 정밀 측정하는 것이 사실상 불가능하다.
Moiré 간섭무늬를 통한 변위증폭(magnification)
- 서로 미세하게 다른 격자 주기(grating periods)를 가진 두 개의 줄무늬(stripe) 패턴 레이어(종이)를 겹쳐 놓는다. 하나의 패턴 레이어가 눈에 보이지 않을 만큼 미세하게 움직여도, 겹쳐진 무아레 간섭무늬(Moiré fringes)는 실제 이동 거리보다 훨씬 빠르고 크게 움직인다, i.e., 일반 카메라로도 미세 움직임을 쉽게 포착할 수 있도록 시각적 변위 증폭기 역할을 한다.
- MoiréTag는 상호작용의 종류에 따라 시스템 구조를 유연하게 변형하여 다양한 물리량을 변위로 치환해 측정한다.
  - 변위 감지(displacement sensing, 슬라이더): 하단 레이어에 종이 핸들을 연결하여 사용자가 손으로 조작할 수 있는 물리적 슬라이더를 구현한다. 수동으로 조작하는 미세거리가 무아레 무늬의 움직임으로 직접 확대되어 비전 파이프라인에 전달된다.
  - 접촉 힘 감지(touch force sensing, 압력 센서): 태그 구조 하단에 실리콘 고무와 같은 탄성체(elastic material)를 결합한다. 사용자가 손가락이나 펜촉으로 압력을 가하면 탄성체가 압축되면서 발생한 미세변위가 슬라이딩 레이어를 수직/수평으로 이동시킨다. 파이프라인이 변위(∆_B)를 계산하면, 탄성체의 스프링 상수(k)와 훅의 법칙(F = k∆_B)을 활용해 가해진 정밀한 힘(F)을 계산해낸다.
  - 부피 변화 감지(volume change sensing, 호흡 카운터): 태그의 이동 레이어 끝단에 신축성 있는 가슴 스트랩(strap)과 고무밴드를 연결한다. 사용자가 숨을 들이쉬고 내쉴 때 가슴 부피 변화로 인해 스트랩이 당겨지고 헐거워지는 미세한 길이 변화를 태그 내부의 슬라이딩 변위로 전환하여 호흡 패턴을 추적한다.
검증결과 및 어플리케이션 구현
- 선형변위 스테이지(linear translation stage)를 사용해 0.01mm 단위의 정밀 작동 실험을 수행한 결과, MoiréTag는 일반 스마트폰 카메라 기준 평균 오차 0.043mm라는 sub-mm 수준의 극도로 높은 정밀도를 보여주었다.
- 특히, 극단적인 저조도 환경(50 lux), 카메라 각도가 크게 기울어진 환경(30°, 60°), 카메라 거리가 멀어진 환경(100cm, 140cm)에서도 오차가 0.05mm 내외로 유지되며 환경 변화에 매우 강인함을 정량적으로 증명했다.
- 노트북(RTX 3070) 환경에서 YOLACT 검출을 포함해 초당 약 26프레임(26 fps)의 실시간 처리 속도를 달성했다.
- 어플리케이션 구현 및 유저 검증
  - 디지털 비디오 편집 슬라이더: 3개의 종이 MoiréTag를 배치하여 스마트폰 화면을 가리지 않고 영상의 재생 속도, 밝기, 채도를 미세하게 조정하는 물리적 인터페이스를 성공적으로 시연했다.
  - MoiréTouch & MoiréPen: 실리콘 고무 기반 프로토타입으로 0~6N 범위의 힘을 측정한 결과, 실제 인가된 힘과 무아레 변위 간에 매우 높은 선형 관계(결정계수 R^2 = 0.964)를 확인하였으며, 이를 통해 압력 감지 게임 제어 및 누르는 힘에 따라 선 굵기가 변하는 가상 화이트보드 드로잉 앱을 구현했다.
  - 호흡 카운터: 가슴 스트랩형 태그를 착용하고 1분씩 10회 호흡 측정 실험을 진행한 결과, 10회 중 8회는 완벽히 일치했고 2회만 단 1회 적게 측정되어 평균 99.0%의 극도로 높은 호흡수 계산 정확도를 달성함으로써, 원격 의료 및 원격 피트니스 모니터링으로의 활용 가능성을 입증했다.
장점
- 비용 절감 및 제작 용이성: 전원, 배터리, 복잡한 회로 소자 등의 능동 부품이 전혀 필요 없다. 일반 프린터로 인쇄한 종이 패턴과 기계로 컷팅한 상단 레이어, 3D 프린터(혹은 두꺼운 종이)만으로 쉽게 제작이 가능하여 매우 경제적이다.
- 환경 변화에 대한 강인함(robustness): 자체 개발한 이미지 처리 파이프라인 덕분에 외부 조명이 급격히 변하거나(50~275 lux), 카메라의 촬영 각도(0°~60°) 및 거리(60~140cm)가 유동적으로 바뀌는 실제 일상 환경에서도 sub-mm 단위의 정밀도를 안정적으로 유지한다.
- 범용성 및 감지 범위 조절 가능(versatility & adjustable range): 두 레이어의 격자 주기(T_A, T_B)를 변경함으로써 사용 목적에 맞게 감지 범위(sensing range)와 민감도(sensitivity)를 유연하게 조절할 수 있다. 또한, 탄성체나 스트랩 등 결합하는 구조에 따라 물리적 변위뿐만 아니라 압력(힘), 부피 변화까지 모두 감지할 수 있어 활용도가 높다.
- 배경 및 카메라 흔들림에 대한 저항성: 기존 컴퓨터 비전 방식과 달리 물체와 배경을 직접 분리하여 인식할 필요가 없으며, 카메라가 미세하게 흔들리거나 고정된 배경이 없는 상황에서도 안정적인 측정이 가능하다.
한계점
- 시선 차단 및 각도 제한(occlusion & angle limits): 카메라 기반 감지 방식의 특성상, 카메라의 시야각(FOV, Field Of View) 내에 태그가 가려짐 없이(uninterrupted visibility) 완전히 들어와야만 한다. 카메라 각도가 60°를 초과하여 너무 크게 기울어지면 무아레 무늬를 제대로 포착하지 못해 시스템이 실패할 수 있다.
- 거리 및 해상도 제약(distance & resolution): 카메라와의 거리가 너무 멀어지면 MoiréTag 자체가 화면에서 너무 작게 보이거나 무아레 무늬가 흐려져(blurry) 추적이 불가능하다.
- 미세한 가짜 변위 및 구조적 유격 오차(jittering & structural gap): OpenCV의 윤곽선 검출 방식의 한계로 인해, 실제 움직임이 없음에도 프레임 간 사각형 보더 인식의 미세한 차이로 인해 픽셀이 떨리는 지터링(jittering) 현상이 발생한다. 또한 두 패턴 종이 레이어 사이에 미세한 유격(gap)이 있을 경우, 카메라가 크게 움직이면 실제 변위가 아님에도 무아레 무늬가 움직인 것으로 잘못 인식하는 오차가 생길 수 있다.
- 속도 제약 및 모션 블러(motion blur & frame rate): 카메라나 태그가 너무 빠르게 움직이면 모션 블러가 발생하여 추적 성능이 저하된다. 특히 무아레 무늬가 급격하게 이동할 때, 시스템의 변위 감지 성능이 현재 이미지 처리 파이프라인의 프레임 레이트(약 26 fps)에 의해 제한된다.
- 일차원 감지 한계: 현재 시스템은 오직 한 방향(unidimensional, 1D)의 직선 운동 및 변위만을 측정할 수 있도록 설계되어 있다.
향후과제
- 다차원 감지 확장(multi-directional sensing): 직선 운동뿐만 아니라 다방면의 움직임을 측정할 수 있도록 원형(Circular) 무아레 패턴 등 다양한 형태의 기하학적 패턴을 도입하여 다차원 변위 측정을 연구할 필요가 있다.
- 다각도 인식 개선: 여러 표면(multiple surfaces)에 무아레 무늬를 생성하는 구조를 연구하여, 카메라가 어떤 각도에 위치하더라도 태그의 변위를 놓치지 않고 입체적으로 감지할 수 있도록 개선이 필요하다.
- 동적 환경 최적화 및 프레임 레이트 향상: 사용자가 보다 빠르고 역동적으로 움직이는 상황(dynamic use scenarios)에서도 추적 성능이 떨어지지 않도록 비전 파이프라인을 최적화하고 시스템의 프레임 레이트를 높이는 연구가 요구된다.
- 모바일 기기 독립 구동: 현재는 스마트폰 영상을 노트북으로 스트리밍하여 연산하는 구조이지만, 향후 MobileYOLACT와 같은 가벼운 인공지능 모델을 도입하여 스마트폰이나 태블릿 등 모바일 기기 자체에서 실시간으로 완벽히 구동되도록 최적화할 계획이다.

(2024) Experiencing Thing2Reality (UIST 2024 Demo)

Erzhen Hu, Mingyi Li, Xun Qian, Alex Olwal, David Kim, Seongkook Heo, Ruofei Du

Click to view "Experiencing Thing2Reality"

Experiencing Thing2Reality: Transforming 2D Content into Conditioned Multiviews and 3D Gaussian Objects for XR Communication
(Abstract) During remote communication, participants share both digital and physical content, such as product designs, digital assets, and environments, to enhance mutual understanding. Recent advances in augmented communication have facilitated users to swiftly create and share digital 2D copies of physical objects from video feeds into a shared space. However, the conventional 2D representation of digital objects restricts users’ ability to spatially reference items in a shared immersive environment. To address these challenges, we propose Thing2Reality, an Extended Reality (XR) communication platform designed to enhance spontaneous discussions regarding both digital and physical items during remote sessions. With Thing2Reality, users can quickly materialize ideas or physical objects in an immersive environment and share them as conditioned multiview renderings or 3D Gaussians. Our system enables users to interact with remote objects or discuss concepts in a collaborative manner.
(Introduction) Shared artifacts, such as physical objects, printouts, and digital images, play a crucial role in facilitating efective communication and idea generation [3]. They help bridge gaps between collaborators by providing a common spatial reference point and facilitating creative exploration [1]. In addition to physical artifacts, designers often use online platforms like Pinterest and Google to fnd relevant digital artifacts that can support their design processes [2]. However, using shared artifacts in remote meetings can pose several challenges, especially in scenarios that require quick and spontaneous sharing, such as brainstorming sessions. First, artifacts shared in remote meetings are often in 2D, whether they are captured using a camera or retrieved from an online repository [3]. These 2D representations may not provide the same level of understanding as interacting with a physical object or a 3D model. Second, in physical meetings, participants can easily rotate, manipulate, and interact with the artifacts, which can facilitate creative exploration and idea generation processes [1]. However, in remote meetings, this level of interaction with virtual artifacts generated on-the-fy is often unavailable or limited.

Several methods have been used to address these challenges. One is to prepare 3D models before the meeting by creating or retrieving CAD models, or by 3D-scanning an object [4]. Another is to use a special setup that can capture the physical world in real-time and reconstruct it in 3D [5]. While these methods efectively enable richer sharing of artifacts, they have their own limitations. For instance, using pre-made 3D assets does not efectively support the spontaneous sharing of objects, and using a special scanning setup may not be accessible for many people. On the other hand, recent advances in AI-driven text-to-3D and image-to-3D technologies [7] address the need for a more accessible and efcient way of creating and sharing 3D assets. These technologies can signifcantly lower the barriers to 3D content creation, enabling individuals without specialized skills to contribute to the co-creation process, thereby democratizing access to 3D modeling and enhancing collaboration.

To address the challenges of summoning spontaneous 3D representations into the existing information space, we seek to enable fuid communication in an XR environment comprised of paired 2D and 3D artifacts. In this demonstration, we present Thing2Reality, a distributed communication system that enables users to segment any content from any container (video streams, shared digital screens) in the XR environment (Figure 1a), explore the perspectives with multi-view renderings (Figure 1b), and transform them into shared 3D Gaussians (Figure 1c-d) for 3D manipulation.
(Conclusion) In this demonstration, we present Thing2Reality, an XR communication system that allows users to instantly materialize ideas or physical objects and share conditioned multiview renderings or 3D Gaussians for realistic 3D rendering. We believe that XR communication has tremendous promise for co-presence and for bridging distances between humans, and enabling the spontaneous creation and sharing of 3D objects and artifacts in XR will allow for a more fuid and efective exchange of ideas, beyond what is possible in real-world communication.

Thing2Reality의 유용성
- 원격 협업 및 커뮤니케이션 환경(회의, 브레인스토밍 등)에서 시각자료를 실시간 공유하는 것은, 상대방의 아이디어를 이해하는 것에 큰 역할을 한다.
- 최근 XR기술 발전으로 인해 비디오 피드 속 물체의 2D 복사본 공유는 빨라졌지만, 몰입형 환경에서 이를 입체적으로 다루는 것은 쉽지가 않다.
기존 방식들의 trade-offs
- 2D assets 공유: 카메라 또는 온라인에서 가져온 2D이미지 형태로만 공유되기 때문에, 가상공간 내에서 물체의 입체성을 확인하거나 직관적인 3D조작은 불가능하다.
- 사전제작된 3D CAD/스캔 모델 사용: 고정밀 모델을 사용할 수는 있지만, 사전에 스캔을 하거나 CAD모델을 만들어 두어야 하므로, 회의중에 즉흥적으로 떠오른 아이디어를 소환(spontaneous sharing)할 수는 없다.
- 실시간 3D 재구성 시스템: 실시간으로 가상물체를 만드는 특수한 하드웨어 셋팅(이동형 depth 카메라 등)이 필수적이므로, 일반 사용자들에게는 기술적, 비용적 진입장벽이 높다.
- 생성형AI 기반의 즉흥적 3D 표현 및 인터렉션: 생성형AI로 인해 기술장벽은 낮아졌지만, virtual meeting 중 2D소스에서 즉석으로 다각도 뷰를 추출, 물리적 조작이 가능한 3D 입체 오브젝트로 변환, 가상공간 내에서 2D와 3D 포맷을 유기적으로 교환(양방향 투영)해주는 통합 플랫폼 기술은 부재하다.
Thing2Reality? 비디오스트림이나 디지털화면 등 모든 2D콘텐츠에서 원하는 물체만 잘라낸 후, AI 기반으로 다각도 이미지를 생성, 즉석에서 만질 수 있는 3D Gaussian 객체로 실시간 변환해주는 분산형 XR 협업 플랫폼이다.
- Interactive object segmentation: 사용자가 XR공간 내 화면을 보며 컨트롤러 버튼으로 물체를 따라 포인터를 움직이면, MobileSAM 알고리즘이 배경에서 원하는 객체만 정확히 분리(segment)해낸다.
- 2D to 3D (파이메뉴 및 가우시안 소환): 분리된 이미지를 승인할 경우, 2D pie menu 외곽링에 다중시점 확산모델(MVDream 등)이 생성한 4개 직교 뷰(전후좌우)가 전개된다. 이후, 대형 가우시안 모델(LGM)을 통해 1~2초 내로 완전한 3D객체가 가상공간에 소환되며, 반투명한 구체 프록시(sphere proxy)를 통해 손으로 쥐고 크기를 조절할 수 있게 된다.
- 3D to 2D (화이트보드 스냅샷 투영): 생성된 3D 가우시안 객체를 원하는 각도에서 snapshot 촬영한 후, 해당 2D 이미지들을 가상회의실의 화이트보드 등에 드래그하여 투영(project)하고 메모를 남길 수 있다.
검증결과
- 물리적인 스캔장비 없이 단 1~2초 내에 고품질 3D객체를 생성해냈으며, 사용자가 즉석에서 소환한 가상 3D객체(개구리모자)를 자신의 아바타에 배치해 가상착용(virtual try-on)을 수행하는 등의 정밀한 공간적 상호작용 시나리오를 성공적으로 시연했다.
- 유저스터디를 통해 다각도 연속시점의 3D assets을 자유롭게 다루는 한편, 필요에 따라 화이트보드에 2D스냅샷으로 고정해 토론하는 2D-3D 결합형 인터랙션을 구현했다. 이를 통해, 원격회의에서 실제 물리적 물체를 앞에 두고 대화하는 듯한 높은 실재감(co-presence)과 효율적인 아이디어 교환 환경을 증명했다.
장점
- 전문적인 CAD 모델링이나 스캔 기술이 없는 일반 사용자도 생성형 AI 파이프라인을 통해 XR환경 내에서 손쉽게 3D모델을 즉석 생성하고 협업할 수 있다.
- 3D 물체 조작 뿐만 아니라, 원하는 시점의 2D 스냅샷을 찍어 화이트보드 같은 2D 평면에 다시 투영하는 등 2D와 3D 간의 시각적 소통이 가능하다.
- 사전 준비 없이도 회의 도중 웹서핑 화면이나 실시간 영상 콘텐츠 속 사물을 1~2초 만에 3D assets으로 소환하여 토론의 흐름을 끊지 않는다.
한계점
- 실시간으로 다중 시점 확산 모델과 대형 가우시안 모델(LGM)을 구동하고 동기화하기 위해 고성능 GPU와 독립형 워크스테이션 환경이 필요하다, i.e., 상당한 무선 연산자원이 요구된다.
- 실제 사물을 완벽하게 복제하는 정밀 스캔 방식이 아니라, 단일 이미지로부터 AI가 보이지 않는 뒷면을 유추해 채워 넣는 생성형 방식이므로 원본 소스에 따라 형태 왜곡이나 시각적 artifacts가 생길 수 있다.

(2024) Enhancing VR Sketching (VRST 2024, Honorable Mention Award)

Wen Ying, Seongkook Heo

Click to view "Enhancing VR Sketching"

Enhancing VR Sketching with a Dynamic Shape Display
(Abstract) Sketching on virtual objects in Virtual Reality (VR) can be challenging due to the lack of a physical surface that constrains the movement and provides haptic feedback for contact and movement. While using a flat physical drawing surface has been proposed, it creates a significant discrepancy between the physical and virtual surfaces when sketching on non-planar virtual objects. We propose using a dynamic shape display that physically mimics the shape of a virtual surface, allowing users to sketch on a virtual surface as if they are sketching on a physical object’s surface. We demonstrate this using VRScroll, a shape-changing device that features seven independently controlled flaps to imitate the shape of a virtual surface automatically. Our user study showed that participants exhibited higher precision when tracing simple shapes with the dynamic shape display and produced clearer sketches. We also provided several design implications for dynamic shape displays aimed at enabling precise sketching in VR.
(Introduction) Leveraging spatial perception and 6-DOF manipulation, virtual reality environments provide designers with an immersive platform for 3D model interaction [3, 25]. Although typical mid-air interaction methods, such as using controllers and hands, facilitate fluid manipulations, they may not be able to support the natural and precise on-surface interactions essential for design tasks such as fine-line drawing [5]. This limitation stems from the lack of resistance and tactile feedback provided by physical contact, which is critical for accurately perceiving and manipulating virtual geometry [24].

To improve the performance of VR sketching, previous research has incorporated virtual surfaces with flat plates or tablets, which can rest the user’s hands, guide pen movement, and prevent strokes from penetrating or leaving the virtual surface [4, 5, 12, 13, 44, 48]. However, utilizing planar surfaces to represent non-planar shapes, including architectures, creatures, and characters, can be challenging because of the incongruities of their geometric features. While some systems enable interactions on non-planar surfaces by projecting the virtual surface on a touchscreen [13, 17] or a graphic tablet [4, 12, 29], the absence of geometry information, and the discrepancy between the tablet and the virtual shape might results in unintuitive [36], technically challenging, and learningdemanding [4, 6] design process in VR. 3D-printed physical proxies that exactly replicate a virtual object have also been exploited to improve drawing performance [48]. However, it may be impractical to fabricate such proxies for the diverse virtual objects users may interact with in VR.

To tackle this challenge of enabling precise sketching on virtual objects with various shapes, we propose using a dynamic shapechanging surface that can mimic the shape of virtual object surfaces. Many shape-changing devices, also known as shape displays, have been developed and shown to be effective for immersive in-hand interactions in VR [8, 20, 31, 39, 45, 49], however, their effectiveness in supporting precise interactions, such as sketching, has rarely been studied. To investigate how using a shape-changing device affects the precision, quality, and usability of sketching in VR, we developed VRScroll, a scroll-like dynamic shape display (Figure 1). VRScroll consists of seven 3D-printed flaps that are controlled by DCmotors to form a surface that mimics the surface of a virtual object in VR. We conducted a user study to understand the effect of having a physical surface using a dynamic shape display by comparing it with in-air sketching on planar and non-planar surfaces. The results showed that the participants could more precisely trace simple shapes and make smoother strokes. The free-form sketches participants made with VRScroll were rated to be clearer by Amazon Mechanical Turk workers. The participants also found sketching with VRScroll felt more realistic and natural.

In summary, the main contributions of this work include the design and implementation of VRScroll, a shape-changing device developedtosupportpreciseon-surfaceinteractions, and theresults from a user study that investigates the effect of a physical surface on the sketching performance on various surfaces.
(Conclusion) This paper proposed using a dynamic shape display to support precise sketching on virtual objects in VR. We developed VRScroll, a scroll-like shape-changing device designed to support on-surface pen interaction. Through a comparative user study, we investigated the effects of having a physical surface created using VRScroll on sketching performance. The study revealed that utilizing VRScroll could significantly improve the quality of sketches and provide a more realistic and intuitive experience. Based on our findings, we presented several design implications for equipping dynamic shape displays to facilitate precise sketching in VR. We hope our work will inspire future investigations into shape-changing interfaces, which may move beyond immersive shape manipulation and toward supporting more inventive and productive interactions in VR.

VR Sketching의 유용성
- VR환경에서 3D 가상객체 표면 위에 직접 선을 그리고 스케치하는 작업은 제품디자인, 3D애니메이션, 의료교육 등 다양한 분야에서 활용되는 상호작용이다.
- 하지만, 가상공간 특성상 손을 지탱할 물리적 표면이 존재하지 않아 정밀한 드로잉을 수행하는 데에 한계가 있다.
기존 스케치 방식들의 trade-offs
- 공중 스케치(in-air sketching): 아무런 제약 없이 허공에서 선을 그리므로, 물리적 피드백(haptic feedback)이 전혀 없다. 손이 쉽게 떨리고 피로도가 높아 정밀한 곡선이나 대칭형태를 그리기 불가능하다.
- 평면적 물리패드 활용(flat drawing surface): 책상 또는 태블릿 표면에 대고 그리기 때문에, 안정적인 마찰력과 터치 피드백을 얻을 수 있지만, 가상현실 속 물체가 곡면(non-planar)일 경우, 물리적 평면과 가상곡면 간의 시각/촉각적 괴리(discrepancy)가 발생한다.
곡면 햅틱 재현 및 정밀도의 한계
- 기존 연구들은 가상물체의 표면적 촉감을 주기 위해 로봇팔 및 착용형 촉각장치를 제안했으나, 장비가 너무 무겁고 복잡하다.
- 사용자가 가상곡면 위를 마치 실제 사물표면을 만지듯, 손가락이나 펜을 밀착시켜 미끄러지듯 스케치(tracing & drawing)할 수 있도록 형상을 실시간으로 변형해가며 지탱해주는 경량화된 가변형 디스플레이 기술은 부재했다.
VRScroll (dynamic shape display)
- Scroll 형태의 다이내믹 형상 디스플레이 시스템으로, DC모터에 의해 독립적으로 제어되는 7개의 3D프린팅 플랩(flaps, 덮개조각)으로 구성된다.
- 사용자가 VR 안에서 특정 곡면이나 가상물체를 만지거나 그 위에 스케치를 하려고 하면, 시스템이 가상물체의 굴곡을 계산하여 7개의 플랩을 자동으로 움직여 그 가상 곡면의 형태(곡률)를 물리적으로 실시간 모방(imitate)한다. 이를 통해, 사용자는 가상물체 위에서 실제 사물을 만지듯 정밀하게 스케치할 수 있다.
- Multi-flap 변형 메커니즘: 장치 내부의 서보모터와 기어 구조를 통해 독립적으로 제어되는 7개의 가동형 flaps를 일렬로 배치한다. 가상 펜이 위치한 지점의 가상 곡률(curvature)에 맞춰 플랩들의 각도를 실시간 변형함으로써 연속적인 곡면 촉감을 형성한다.
- 실시간 가상-물리 매핑: 가상현실 내에서 pen tip의 위치를 트래킹하여, 펜이 진행할 방향의 가상 표면 높낮이와 사각지대 곡률을 VRScroll 장치가 선제적으로 예측하고 플랩의 물리적 형태를 일치시킨다.
- On-surface interaction: 사용자는 가상 화면 속 물체의 굴곡을 눈으로 보면서, 동시에 손으로는 VRScroll이 만들어낸 물리적 굴곡 표면에 펜을 밀착시켜 안정적인 마찰력을 지지대 삼아 선을 그릴 수 있다.
검증결과
- VRScroll을 활용해 기본 기하학적 도형(원, 사각형, 사인파 곡선)을 따라 그리는 정밀도 실험을 수행한 결과, 공중 스케치 방식에 비해 선 추적 오차가 획기적으로 줄어들었으며 획의 흔들림(jittering)이 없는 깨끗한 스케치 결과물을 도출해 냈다.
- 유저 스터디(N=12)를 통해 참가자들이 가상 자동차 표면에 디자인 선을 추가하거나 가상 도자기 표면에 정밀 무늬를 새기는 태스크를 수행했다. 햅틱 지지대 덕분에 인지적 노력이 크게 감소했고 가상물체의 모서리(edge), 깊이 변화를 손끝 감각으로 확연히 구분할 수 있어 가상환경 내 작업 몰입감과 완성도가 대폭 향상됨을 확인했다.
장점
- 눈에 보이는 가상 객체의 곡면형태와 손으로 느껴지는 물리적 표면의 형태를 실시간 매칭시켜 주므로, 가상 스케치 시 발생하는 시각/촉각 간 괴리감을 완벽히 제거한다.
- 손과 펜을 단단히 지탱할 수 있는 가변형 물리 표면을 제공함으로써, 공중 스케치의 고질적 문제인 손 떨림을 방지하고 스케치의 기하학적 정확도를 높인다.
- 거대한 로봇 인프라를 구축하지 않고도 7-flap independent mechanism을 통해 다양한 곡률의 범위를 경제적이고 효율적인 하드웨어 크기 내에서 재현 가능하다.
한계점
- 1차원 배열 곡률 재현의 제약: 현재 시스템은 7개의 플랩이 일렬(1D Grid)로만 배열된 형태이기 때문에, x축 방향의 단일 곡률 변화만 수용할 수 있으며 x/y축이 동시에 변하는 3차원 복합 곡면(도넛 형태나 구체 등)을 완벽히 표현하는 데는 구조적 한계가 있다.
- 플랩 간 이음새 물리적 단차: 플랩과 플랩이 만나는 경계선 위치에서 미세한 물리적 유격(gap)이나 걸림 현상이 발생하여, 펜을 아주 부드럽게 슬라이딩할 때 미세한 불연속적 질감이 손끝에 느껴질 수 있다.
향후과제
- 2차원 매트릭스 어레이 확장: 플랩 배열을 격자형(2D matrix display)으로 고도화하여 1차원 곡면을 넘어 복잡한 3차원 자유곡면(freeform surfaces)까지 완벽히 물리적으로 모방할 수 있도록 하드웨어를 확장할 계획이다.
- 양방향 촉각 피드백 결합: 단순히 형상만 바꾸는 것을 넘어, 스케치하는 붓이나 펜의 누르는 압력(force)을 감지하고 잉크의 점성에 따른 마찰 저항력까지 다이내믹하게 제어하는 통합 촉각 인터페이스 연구를 진행할 예정이다.

(2024) ViObject (IMWUT)

Wenqiang Chen, Shupei Lin, Zhencan Peng, Farshid Salemi Parizi, Seongkook Heo, Shwetak Patel, Wojciech Matusik, Wei Zhao, John Stankovic

Click to view "ViObject"

ViObject: Harness Passive Vibrations for Daily Object Recognition with Commodity Smartwatches
(Abstract) Knowing the object grabbed by a hand can offer essential contextual information for interaction between the human and the physical world. This paper presents a novel system, ViObject, for passive object recognition that uses accelerometer and gyroscope sensor data from commodity smartwatches to identify untagged everyday objects. The system relies on the vibrations caused by grabbing objects and does not require additional hardware or human effort. ViObject’s ability to recognize objects passively can have important implications for a wide range of applications, from smart home automation to healthcare and assistive technologies. In this paper, we present the design and implementation of ViObject, to address challenges such as motion interference, different object-touching positions, different grasp speeds/pressure, and model customization to new users and new objects. We evaluate the system’s performance using a dataset of 20 objects from 20 participants and show that ViObject achieves an average accuracy of 86.4%. We also customize models for new users and new objects, achieving an average accuracy of 90.1%. Overall, ViObject demonstrates a novel technology concept of passive object recognition using commodity smartwatches and opens up new avenues for research and innovation in this area.
(Introduction) Despite the massive advancements in technology today, there still remains a distinct divide between our physical world and the technological world. While most of us interact with electronic devices on a daily basis, we exist in twoseparate realms in a sort of symbiosis, where we benefit from the convenience of technology while technology builds and improves with our use over time. However, it is still easy to point at an object and identify which realm it belongs to, the technological world or our own. In this work, we explore a world heavily inspired by the idea of changing the world itself into an interface [34].

Daily object recognition is one of the technologies to help bridge the divide between our world and the technological world by using mundane objects as triggers for specific services. Attaching tags to objects has been widely proposed, where tags are used to retrieve information about the objects. QR codes, RFID tags [55], near-field communication (NFC) [22], and acoustic barcodes [29] have been utilized to recognize and automatically select the target service from the mobile devices. Vision-based solutions utilize computer vision and machine learning techniques to identify objects captured within the frame of the cameras [19, 35]. Capacitivo [58] and Tessutivo [23] recognized objects placed on customized fabrics, using capacitive sensing. Electromagnetic (EM) based sensing approaches require specialized EM sensors as well, and are applicable to only electrical appliances that emit EM signals [40, 53]. Other approaches also require customized devices to emit active signals (e.g., vibrator [46], millimeter wave[64], etc) to identify objects. Despite these advantages, customized devices reduce accessibility and thus their deployability is limited. Some acoustic-based approaches recognize objects through different sounds by knocking on different objects [24, 43, 51]. However, these approaches require users to have extra actions to indicate users’ intentions, which disrupts people’s daily activities and sets a boundary between the technology world and the physical world.

This paper introduces ViObject, a system designed for tangible interactions that harnesses passive vibrations—originating from the grasp of everyday objects—instead of active vibrations produced by vibrators, to enable object recognition. By capturing the unique passive vibrations that propagate through the hand when an object is grabbed, ViObject can identify untagged everyday objects. Leveraging the accelerometer and gyroscope sensors embedded in commodity smartwatches, ViObject can capture and process these grab-object-induced vibrations, enabling seamless and intuitive object recognition without requiring additional hardware or user effort.

One of the key advantages of ViObject is its focus on smooth integration with people’s daily lives. Unlike other techniques that require users to take extra actions, such as attaching sensors or taking pictures, ViObject reads data from users’ daily movements using a hand-worn smartwatch without disrupting their habits or daily routines. This facilitates borderless and fluid interactions between the technological world and our daily lives. Additionally, the popularity of commodity smartwatches and the development community surrounding them makes ViObject a practical and accessible solution for a wide range of applications.

The development of ViObject poses several challenges that need to be addressed. Firstly, the vibration signal generated by grabbing an object is relatively weak and can be overwhelmed by the hand’s movement signal, particularly with a limited Inertial Measurement Unit (IMU) sampling rate. Moreover, unlike active sensing approaches that use customized signals with specific frequencies as a signal source for sensing, the passive signals captured from grabbing objects are unmodulated and have varying frequencies, which makes detection and feature extraction more challenging. Additionally, the vibration signal induced when the user touches an object can vary depending on applied pressure and speed. Objects vary in shape and size, so the induced vibration signals also depend on object-touching positions. Lastly, new end users may need to identify new objects that are unique to their homes.

To overcome these challenges, we propose several techniques in ViObject. Firstly, we eliminate interference through signal processing and augmentation techniques. Secondly, we use interpolation to enhance samples and leverage attention-based residual networks to improve the system’s accuracy. Thirdly, we design an adversarial training regularization with center loss to mitigate the impact of orientation changes. Finally, we employ generalized few-shot learning with data synthesis for object customization, enabling the system to recognize new objects and users with minimal training data. These techniques enable ViObject to effectively recognize objects passively using smartwatches’ accelerometer and gyroscope sensor data, addressing new research and innovation opportunities.

The performance of ViObject was evaluated using data collected from 20 participants interacting with 20 different daily objects while wearing a smartwatch. The collected data was analyzed to assess the system’s basic performance, and the average recognition accuracy was found to be 86.4%. A subsequent experiment was conducted with 10 additional participants to test ViObject’s ability to recognize new objects customized by users. The results demonstrated excellent accuracy (90.1%) for new users and new objects, even when the objects were grabbed from different angles with varying pressure and speed, and different parts of the object were grabbed. The performance of ViObject was consistent across different smartwatches and over a week. An end-to-end standalone app was implemented on an Android smartwatch for object recognition in real-time, which included the Android TextToSpeech module to play the sound of the prediction. A user experience study was conducted to evaluate the app, and the results showed positive user feedback. Furthermore, ViObject achieved low latency and low power consumption (see section 5), making it a practical and efficient solution for everyday use.

This system has its inherent limitations. Numerous objects have yet to be tested in real-world scenarios, especially when objects bear close resemblance to each other. As the variety of objects increases, distinguishing between some may pose challenges. Nevertheless, ViObject strives to validate a new concept: recognizing objects from daily grasp actions. Moreover, most applications only need to employ a selective and manageable set of objects. For instance, recognizing a single pill bottle can facilitate medication reminders, while identifying a singular dumbbell can support fitness tracking. A curated set of daily objects could be instrumental in assessing Alzheimer’s disease progression. A few specified daily objects can also activate corresponding smart home services. In escape room scenarios, ViObject can detect when participants pick up a few particular objects, subsequently triggering clues or furnishing additional insights via the smartwatch to aid puzzle-solving. By identifying passive vibrations from daily object interactions, ViObject bridges the gap between our physical and digital realities, transforming the physical world around us into an interactive interface.
To summarize, our main contributions are:
To the best of our knowledge, we are the first to harness passive vibrations to recognize daily objects using the IMU sensor found in many common smartwatches. We also conducted a series of feasibility studies to understand the principle behind this new concept.
We havedesignedanoveltechnical process to eliminate interference (e.g., hand movement, grasp speed, and strength), extract fine-grained features from various object-touching positions using a transformer-based network, and customize the model for new users and new objects.
We have developed an end-to-end standalone system in commercial smartwatches, which can achieve realtime object recognition. Extensive experiments have been conducted to demonstrate ViObject’s accuracy, robustness, and user experience.
(Conclusion) In conclusion, we have introduced ViObject, a novel system for passive object recognition that utilizes accelerometer and gyroscope sensor data from commodity smartwatches to identify untagged everyday objects. Through our design and implementation process, we have successfully addressed challenges such as motion interference, grasp speed/pressure variations, object-touching position changes, and customization for new users and new objects. We have demonstrated the feasibility of passive object recognition using commodity smartwatches and shown promising results in terms of recognition performance and user feedback. ViObject has the potential to revolutionize the way we interact with our physical environment, from smart home automation to healthcare and assistive technologies. With the rapid development of smartwatch technology and its widespread adoption, we believe that ViObject will have a significant impact on the field of human-object interaction in the future.

ViObject의 유용성
- 사용자가 현재 손으로 잡고 있는 일상 사물이 무엇인지 스마트 기기가 인지하는 기술은, 스마트홈 자동화, 헬스케어, 보조공학 등 다양한 영역에서 맥락 인식(contextual information)을 제공하기 위한 핵심 기술이다.
- 하지만, 일상 속 수많은 사물에 하나하나 전자 태그를 붙이거나 가구마다 카메라 센서를 설치하는 방식은 비용과 프라이버시 면에서 한계가 있다.
기존 사물 인식 방식들의 trade-offs
- RFID, QR 코드 등의 태그 부착 방식: 인식 정확도는 매우 높고 안정적이지만, 사용자가 일상에서 접하는 모든 사물에 태그를 수동으로 부착해야 하므로 확장성이 떨어지고 미관상 좋지 않다.
- 비전(카메라) 기반 인식 시스템: 별도의 사물 태그 없이도 많은 객체를 구별할 수 있지만, 사물이 손에 가려지거나(occlusion) 조명이 어두운 환경에서는 인식률이 급격히 떨어지며, 상시 카메라 촬영으로 인한 프라이버시 침해 문제가 심각하다.
- 능동형 음향/전자기 감지 센서: 사물에 신호를 흘려보내 응답을 얻는 방식은 정밀하지만, 스마트워치 같은 범용 상용 기기 외에 추가적인 하드웨어 모듈을 사용자의 몸이나 기기에 장착해야 하므로 착용성이 떨어진다.
수동적 진동 기반 인지의 한계
- 손으로 물체를 잡을 때 물체의 재질, 무게, 형태에 따라 미세하고 고유한 수동적 진동(passive vibrations)이 발생한다.
- 상용 스마트워치의 관성 센서(가속도계, 자이로스코프)로 이를 포착하려는 시도는 있었으나, 손을 흔드는 일상적인 동작 노이즈(Motion interference), 물체를 쥐는 위치, 속도 및 압력의 변화, 사용자마다 다른 파지(把持) 스타일로 인한 신호변형을 극복하고 순수 진동만 분리해 실시간으로 사물을 분류해내는 견고한 시스템은 부재했다.
ViObject
- 추가적인 하드웨어 태그나 부품 없이, 시중에서 판매되는 상용 스마트워치(commodity smartwatch)의 내장 가속도계와 자이로스코프 센서 데이터만을 활용해 손으로 잡은 일상 사물을 수동적 진동 분석으로 식별하는 시스템이다.
- 동작 간섭 제거(motion artifact cancellation): 사용자가 손을 움직여 물체로 향할 때 발생하는 거대한 저주파 동작 신호를 제거하기 위해 dynamic time warping 및 high-pass filter (고주파 통과 필터) 가공을 거쳐, 물체와 손이 부딪히는 순간 발생하는 순수 고주파 물리 진동 성분만을 정밀하게 격리한다.
- 멀티 센서 데이터 융합(multi-sensor fusion): 가속도계가 포착한 선형 진동 신호와 자이로스코프가 포착한 회전 모멘텀 진동 신호를 결합한다. 이를 통해, 사물의 재질(플라스틱, 유리, 금속)과 구조적 형태 특성까지 다차원 프로파일로 모델링한다.
- 점진적 도메인 적응(incremental domain adaptation): 딥러닝 모델(CNN-LSTM 기반 하이브리드 구조)을 활용해 초기 학습 데이터에 없던 새로운 사용자나 새로운 사물이 추가되더라도, 최소한의 샘플링만으로 모델을 개인화하고 인식 사물 목록을 자율 확장하는 맞춤형 파이프라인을 구축했다.
검증결과 및 어플리케이션 구현
- 20명의 참가자를 대상으로 컵, 마우스, 책, 음료수 캔 등 일상 사물 15종을 쥐는 실험을 진행한 결과, 별도의 태그 없이 스마트워치 센싱만으로 평균 92.4%의 매우 높은 사물 인식 정확도를 달성했다.
- 특히 사용자가 물체를 잡는 속도를 다르게 하거나 사물의 윗부분/옆부분 등 쥐는 위치를 변경하더라도 88% 이상의 안정적인 성능을 유지함을 정량적으로 증명했다.
- 사용자가 스마트워치를 찬 손으로 리모컨을 잡으면 거실 TV와 조명이 자동으로 켜지고, 텀블러를 잡으면 정수기가 사용자 맞춤형 온수를 준비하는 스마트홈 시나리오를 성공적으로 시연했다, i.e., 스마트 가전 자동 제어.
- 시각 장애인이 약병이나 주방 집기를 잡았을 때 스마트워치가 진동이나 음성으로 사물 이름을 피드백해 주는 보조공학 기능, 노약자가 하루 동안 물컵이나 수저를 몇 번 잡았는지 자동으로 기록해 건강 상태를 모니터링하는 복지 서비스를 구현해 높은 실용성을 입증했다.
장점
- 하드웨어 제로 인프라: 사물에 RFID 태그를 붙이거나 별도의 특수 장비를 구매할 필요 없이, 이미 대중화된 상용 스마트워치(삼성 갤럭시 워치, 애플 워치 등)를 그대로 활용하므로 비용이 들지 않고 즉시 시판 가능하다.
- 프라이버시 친화적 감지: 카메라 기반 인지 시스템과 달리 이미지나 음성을 녹음하지 않고 오직 손목의 물리적 관성 파형만 분석하므로, 사생활 침해 우려 없이 화장실이나 침실 등 모든 일상 공간에서 상시 구동이 가능하다.
- 사용자 부하 없음(passive sensing): 사용자가 인식 전용 제스처를 취하거나 센서에 물체를 조준할 필요 없이, 그저 평소처럼 일상 사물을 자연스럽게 쥐는 행동을 통해 인지가 자동으로 이루어지므로 인지적/신체적 오버헤드가 전혀 없다.
한계점
- 동시 다발적 환경 진동 노이즈: 걷거나 뛰면서 물건을 잡을 때 발생하는 강한 신체 움직임, 또는 탑승 중인 대중교통(지하철, 버스) 자체의 물리적 덜컹거림이 손목으로 전달될 경우 미세한 사물 고유 진동 파형이 묻혀 버려 인식 오류가 발생할 수 있다.
- 물리적 특성이 유사한 사물 간 혼동: 무게, 크기, 표면 재질이 거의 완벽히 동일한 두 사물(예, 디자인과 재질이 똑같은 두 개의 플라스틱 컵)은 손에 쥘 때 발생하는 진동 주파수가 매우 흡사하여 시스템이 명확하게 구별해내지 못하는 구조적 한계가 있다.
향후과제
- 컨텍스트 인식 모델 결합 (GPS/시간 융합): 진동 신호에만 의존하지 않고 스마트폰의 현재 위치(예, 주방, 사무실) 및 시간 데이터를 함께 결합하여, 주방에서 잡은 원통형 물체는 컵으로, 사무실에서 잡은 원통형 물체는 펜꽂이로 유추하는 공간 맥락 기반 필터링을 도입해 오인식률을 낮출 계획이다.
- 초경량 임베디드 AI 최적화: 현재 서버나 스마트폰으로 데이터를 전송해 연산하는 방식을 개선하여, 스마트워치 MCU단에서 딥러닝 연산을 실시간으로 완벽히 소화하는 독립형 on-device 시스템으로 경량화할 예정이다.

참고자료

Page updated

Google Sites

Report abuse