Human activity recognition is a core task in computer vision. It has broad applications in video games, surveillance, gesture recognition, behavior analysis, etc. However, traditional camera-based activity recognition systems are intrinsically limited by occlusions, i.e., the subjects have to be visible to the cameras to recognize their activities. Previous works mitigate this problem by changing camera viewpoint or interpolating frames over time. Such approaches, however, often fail when the camera is fixed or the person is fully occluded for a relatively long period, e.g., the person walks into another room.
Intrinsically, cameras suffer from the same limitation we, humans, suffer from: our eyes sense only visible light and hence cannot see through walls and occlusions. Yet visible light is just one end of the frequency spectrum. Radio signals in the WiFi frequencies can traverse walls and occlusions. Further, they reflect off the human body. If one can interpret such radio reflections, one can sense human’s activities even through walls and occlusions.
In this demo, we will present the first system that can estimate people's 3D poses and recognize their activities through walls and occlusions in real time. Our system takes radio frequency (RF) signals as input, uses a deep neural network approach that parses such radio signals to generates 3D human skeletons, and recognizes actions and interactions of multiple people over time based on the generated skeletons. Our system can achieve comparable accuracy to camera-based pose estimation and activity recognition systems in visible scenarios, yet continues to work accurately when people are not visible (in dark or through-wall scenarios), hence addressing scenarios that are beyond the limit of today’s vision-based human sensing systems. We believe our system could provide a significant leap in human activity analysis and enable multiple new applications in gaming, healthcare, and smart homes.