# Efficient Computing For Low-Energy Robotics

Vivienne Sze ( @eems\_mit)

**Massachusetts Institute of Technology** 

In collaboration with Luca Carlone, Yu-Hsin Chen, Joel Emer, Sertac Karaman, Tushar Krishna, Thomas Heldt, Theia Henderson, Peter Li, Fangchang Ma, James Noraky, Soumya Sudhakar, Amr Suleiman, Diana Wofk, Nellie Wu, Tien-Ju Yang, Zhengdong Zhang

Slides available at

https://tinyurl.com/SzeMITDL2020



### **Computing Challenge for Self-Driving Cars**

JACK STEWART TRANSPORTATION 02.06.18 08:00 AM

### SELF-DRIVING CARS USE CRAZY AMOUNTS OF POWER, AND IT'S BECOMING A PROBLEM



Shelley, a self-driving Audi TT developed by Stanford University, uses the brains in the trunk to speed around a racetrack autonomously.

FOI NIKKI KAHN/THE WASHINGTON POST/GETTY IMAGES



(Feb 2018)

Cameras and radar generate ~6 gigabytes of data every 30 seconds.

Self-driving car prototypes use approximately 2,500 Watts of computing power.

Generates wasted heat and some prototypes need water-cooling!

### **Existing Processors Consume Too Much Power**







< 1 Watt

> 10 Watts

### **Transistors Are Not Getting More Efficient**



### Slowdown of Moore's Law and Dennard Scaling

General purpose microprocessors are not getting faster or more efficient

#### Slowdown

Need specialized /
domain-specific hardware for
significant improvements in speed
and energy efficiency

### **Efficient Computing with Cross-Layer Design**

#### **Algorithms**



#### **Systems**



#### **Architectures**



#### **Circuits**



### **Energy Dominated by Data Movement**



Memory access is **orders of magnitude** higher
energy than compute

### **Autonomous Navigation Uses a Lot of Data**

#### **Semantic Understanding**

- High frame rate
- Large resolutions
- Data expansion

#### **Geometric Understanding**

Growing map size



2 million pixels





### **Visual-Inertial Localization**

Determines location/orientation of robot from images and IMU (also used by headset in Augmented Reality and Virtual Reality)



### **Localization at Under 25 mW**

*First chip* that performs *complete* Visual-Inertial Odometry

#### Front-End for camera

(Feature detection, tracking, and outlier elimination)

#### Front-End for IMU

(pre-integration of accelerometer and gyroscope data)

**Back-End Optimization of Pose Graph** 

Consumes **684× and 1582×**less energy than
mobile and desktop CPUs,
respectively



[Zhang et al., RSS 2017], [Suleiman et al., VLSI 2018]

### **Key Methods to Reduce Data Size**

**Navion:** Fully integrated system – no off-chip processing or storage



Use **compression** and **exploit sparsity** to reduce memory down to 854kB



### **Understanding the Environment**

#### **Depth Estimation**



Semantic Segmentation



### **Low Power 3D Time of Flight Imaging**

- Pulsed Time of Flight: Measure distance using round trip time of laser light for each image pixel
  - Illumination + Imager Power: 2.5 20 W for range from 1 8 m
- Use computer vision techniques and passive images to estimate changes in depth without turning on laser
  - CMOS Imaging Sensor Power: < 350 mW</li>



Real-time Performance on Embedded Processor VGA @ 30 fps on Cortex-A7 (< 0.5W active power)

### **Results of Low Power Depth ToF Imaging**



**RGB** Image

Depth Map

Ground Truth

Depth Map **Estimated** 

**Mean Relative Error**: 0.7%

**Duty Cycle (on-time of laser)**: 11%

### **Understanding the Environment**

**Depth Estimation** 



input layer hidden layer

**Semantic Segmentation** 







State-of-the-art approaches use

Deep Neural Networks, which
require up to several hundred
millions of operations and
weights to compute!

>100x more complex than video compression

### **Deep Neural Networks**

Deep Neural Networks (DNNs) have become a cornerstone of AI

#### **Computer Vision**



**Game Play** 



#### **Speech Recognition**



Medical



### **Properties We Can Leverage**

- Operations exhibit high parallelism
  - → high throughput possible
- Memory Access is the Bottleneck



Worst Case: all memory R/W are **DRAM** accesses

Example: AlexNet has 724M MACs

→ 2896M DRAM accesses required



### **Properties We Can Leverage**

- Operations exhibit high parallelism
  - → high throughput possible
- Input data reuse opportunities (up to 500x)



#### **Convolutional Reuse**

(Activations, Weights)
CONV layers only
(sliding window)



#### **Fmap Reuse**

(Activations)
CONV and FC layers



#### **Filter Reuse**

(Weights)
CONV and FC layers
(batch size > 1)

### **Exploit Data Reuse at Low-Cost Memories**





<sup>\*</sup> measured from a commercial 65nm process



### Weight Stationary (WS)



- Minimize weight read energy consumption
  - maximize convolutional and filter reuse of weights
- Broadcast activations and accumulate partial sums spatially across the PE array
- Examples: TPU [Jouppi, ISCA 2017], NVDLA



### **Output Stationary (OS)**



- Minimize partial sum R/W energy consumption
  - maximize local accumulation
- Broadcast/Multicast filter weights and reuse activations spatially across the PE array
- Examples: [Moons, VLSI 2016], [Thinker, VLSI 2017]

### **Row Stationary Dataflow**



- Maximize row convolutional reuse in RF
  - Keep a filter row and fmap sliding window in RF
- Maximize row psum accumulation in RF



### **Row Stationary Dataflow**



### **Dataflow Comparison: CONV Layers**



RS optimizes for the best **overall** energy efficiency



### **Deep Neural Networks at Under 0.3W**



Exploits data reuse for 100x reduction in memory accesses from global buffer and 1400x reduction in memory accesses from off-chip DRAM

Overall >10x energy reduction compared to a mobile GPU (Nvidia TK1)

Eyeriss Project Website: <a href="http://eyeriss.mit.edu">http://eyeriss.mit.edu</a>

**Results for AlexNet** 



### Features: Energy vs. Accuracy



not include data, classification energy, augmentation and ensemble, etc.

- DPM v5 [Girshick, 2012]
- Fast R-CNN [Girshick, CVPR 2015]

### **Energy-Efficient Processing of DNNs**

A significant amount of algorithm and hardware research on energy-efficient processing of DNNs



http://eyeriss.mit.edu/tutorial.html



V. Sze, Y.-H. Chen, T-J. Yang, J. Emer, "Efficient Processing of Deep Neural Networks: A Tutorial and Survey," Proceedings of the IEEE, Dec. 2017

We identified various limitations to existing approaches



### **Design of Efficient DNN Algorithms**

Popular efficient DNN algorithm approaches

#### **Network Pruning**



#### **Efficient Network Architectures**



**Examples:** SqueezeNet, MobileNet

... also reduced precision

- Focus on reducing number of MACs and weights
- Does it translate to energy savings and reduced latency?

### **Number of MACs and Weights are Not Good Proxies**

# of operations (MACs) does not approximate latency well



Source: Google

(https://ai.googleblog.com/2018/04/introducing-cvpr-2018-on-device-visual.html)

# of weights **alone** is not a good metric for energy (All data types should be considered)



https://energyestimation.mit.edu/

[**Yang**, *CVPR* 2017]

### **Energy-Aware Pruning**

## Directly target energy and incorporate it into the

optimization of DNNs to provide greater energy savings

- Sort layers based on energy and prune layers that consume the most energy first
- Energy-aware pruning reduces AlexNet energy by 3.7x w/ similar accuracy
- Outperforms magnitude-based pruning by 1.7x

[**Yang**, *CVPR* 2017]

#### Normalized Energy (AlexNet)



Pruned models available at <a href="http://eyeriss.mit.edu/energy.html">http://eyeriss.mit.edu/energy.html</a>



### **NetAdapt: Platform-Aware DNN Adaptation**

- Automatically adapt DNN to a mobile platform to reach a target latency or energy budget
- Use empirical measurements to guide optimization (avoid modeling of tool chain or platform architecture)
- Few hyperparameters to reduce tuning effort
- >1.7x speed up on MobileNet w/ similar accuracy



[Yang, *ECCV* 2018]

Code available at <a href="http://netadapt.mit.edu">http://netadapt.mit.edu</a>



### **FastDepth: Fast Monocular Depth Estimation**

Depth estimation from a single RGB image desirable, due to the relatively low cost and size of monocular cameras.



Configuration: Batch size of one (32-bit float)





~40fps on an iPhone

Models available at <a href="http://fastdepth.mit.edu">http://fastdepth.mit.edu</a>

[Wofk\*, Ma\*, ICRA 2019]

### **DNN Accelerator Evaluation Tools**

- Require systematic way to
  - Evaluate and compare DNN accelerators
  - Rapidly explore design space
- Accelergy [Wu, ICCAD 2019]
  - Early stage estimation tool at the architecture level
    - Estimate energy based on architecture level components (e.g., # of PEs, memory size, on-chip network)
  - Evaluate architecture level impact of emerging devices
    - Plug-ins for different technologies
- Timeloop [Parashar, ISPASS 2019]
  - DNN mapping tool
  - Performance Simulator → Action counts



Open-source code available at:

http://accelergy.mit.edu



### **Accelergy Estimation Validation**

- Validation on Eyeriss [chen, ISSCC 2016]
  - Achieves 95% accuracy compared to post-layout simulations
  - Can accurately captures energy breakdown at different granularities



Open-source code available at: <a href="http://accelergy.mit.edu">http://accelergy.mit.edu</a>



### **Accelergy Infrastructure**

Open-source code available at:

http://accelergy.mit.edu



### **Accelergy Infrastructure**

Open-source code available at:

http://accelergy.mit.edu



### **Accelergy Infrastructure**

Open-source code available at:

http://accelergy.mit.edu





## **Accelergy Infrastructure**

Open-source code available at:

http://accelergy.mit.edu



[**Wu**, *ICCAD* 2019]



## **In-Memory Computing (IMC)**

Activation is input voltage (V<sub>i</sub>) Weight is resistor conductance (G<sub>i</sub>)



Image Source: [Shafiee, ISCA 2016]

- Reduce data movement by moving compute into memory
- Compute MAC with memory storage element

#### Analog Compute

- Activations, weights and/or partial sums are encoded with analog voltage, current, or resistance
- Increased sensitivity to circuit non-idealities
- A/D and D/A circuits to interface with digital domain
- Leverage emerging memory device technology

## **Accelergy for IMC**

http://accelergy.mit.edu



#### **Accelergy for IMC**

#### Open-source code available at:

http://accelergy.mit.edu







[Wu, ISPASS 2020]

## **Designing DNNs for IMC**

- Designing DNNs for IMC may differ from DNNs for digital processors
- Highest accuracy DNN on digital processor may be different on IMC
  - Accuracy drops based on robustness to nonidealities
- Reducing number of weights is less desirable
  - Since IMC is weight stationary, may be better to reduce number of activations
  - IMC tend to have larger arrays → fewer weights may lead to low utilization on IMC





l'lliT

## Where to Go Next: Planning and Mapping

**Robot Exploration:** Decide where to go by computing Shannon Mutual Information



## **Information Theoretic Mapping**



$$H(M|Z) = H(M) - I(M;Z)$$
spective updated Current map Mutual

Perspective updated map entropy

Current map entropy

Mutual information

## **Experimental Results (4x Real Time)**



Exploration with a mini race car using motion capture for localization

## **Building Hardware to Compute MI**

**Motivation:** Compute MI faster for faster exploration!

Approximate FSMI 
$$I(M;Z) = \sum_{j=1}^{n} \sum_{k=j-\Delta}^{j+\Delta} P(e_j) C_k G_{k,j}$$
 Evaluate MI for in entire beam altogether remainder into a numerical integral integral integral in the second content of the content of the

Evaluate MI for all cells altogether **removes** numerical integration

Algorithm is *embarrassingly* parallel! High throughput *should* be possible with multiple cores.



Process beams in parallel with multiple cores



## **Challenge is Data Delivery to All Cores**

Power consumption of memory scales with number of ports.

Low power SRAM limited to two-ports!



Data delivery, specifically memory bandwidth, limits the throughput (not compute)



## **Specialized Memory Architecture**

Break up map into separate memory banks and novel storage pattern to minimize read conflicts when processing different beams in parallel.



Compute the mutual information for an **entire map** of 20m x 20m at 0.1m resolution in under a second  $\rightarrow$  a 100x speed up versus CPU for  $1/10^{th}$  of the power

[**Li**, *RSS* 2019]

## **Experimental Results**



Specialized banking, efficient memory arbiter and packing multiple values at each address results in throughput achieves 94% of theoretical limit (unlimited bandwidth)

#### **Extend FSMI to 3D Environments**

Computing MI on a **3D map** requires
significant amounts of storage and compute



## **Compress map**with OctoMap

[Hornung, et al., Autonomous Robots, 2013]



## **Experiments of 3D FSMI (4x Real Time)**



## **Experiments of 3D FSMI**



We achieve an average compression ratio of around  $18\times$ , with an acceleration ratio of  $8\times$ 

#### **FCMI: Fast Continuous Mutual Information**

Reformulate with a *continuous* occupancy map framework and exploit recursive structure when computing MI across *entire* map  $\rightarrow$  *two orders of magnitude speed up over FSMI!* 



[Henderson, ICRA 2020]



## **Balancing Actuation and Computing Energy**

#### **Motion Planning**

Find a feasible (obstacle-free) path [typically optimize for shortest path]

#### **Low-power Robotics**

Actuation and computing energy are similar order of magnitude



#### Energy to move 1 more meter $(P_a/v [W/(m/s)])$



Energy to compute 1 more second ( $P_c[W]$ )



## **Robots Consuming < 1 Watt for Actuation**















#### **Low Energy Robotics**

- Miniature aerial vehicles
- Lighter than air vehicles
- Micro unmanned gliders
- Miniature satellites

## **Balancing Actuation and Computing Energy**

#### Baseline

(compute 20,000 samples)







#### Compute Energy Included Motion Planning (CEIMP)

A framework to balance the energy spent on computing a path and the energy spent on moving along that path (Don't think too hard!)

## **Summary**

- Efficient computing is critical for advancing the progress of autonomous robots, particularly at the smaller scales. → Critical step to making autonomy ubiquitous!
- In order to meet computing demands in terms of power and speed, need to redesign computing hardware from the ground up → Focus on data movement!
- Specialized hardware opens up new opportunities for the co-design of algorithms and hardware → Innovation opportunities for the future of robotics!





## Acknowledgements





Joel Emer



Sertac Karaman

Research conducted in the MIT Energy-Efficient Multimedia Systems Group would not be possible without the support of the following organizations:





























## Low-Energy Autonomy and Navigation (LEAN) Group



A broad range of next-generation applications will be enabled by low-energy, miniature mobile robotics including insect-size flapping wing robots that can help with search and rescue, chip-size satellites that can explore nearby stars, and blimps that can stay in the air for years to provide communication services in remote locations. While the low-energy, miniature actuation, and sensing systems have already been developed in many of these cases, the processors currently used to run the algorithms for autonomous navigation are still energy-hungry. Our research addresses this challenge as well as brings together the robotics and hardware design communities.

We enable efficient computing on various key modules of other autonomous navigation systems including perception, localization, exploration and planning. We also consider the overall system by considering the energy cost of computing in conjunction with actuation and sensing.



#### **Motion Planning**

Many motion planning and control algorithms aim to design trajectories and controllers that minimize actuation energy. However, in low-energy robotics, computing such trajectories and controls themselves may consume a large amount of energy. We develop algorithms that optimize this trade-off.



#### Mutual Information for Exploration

Computing mutual information between the map and future measurements is critical to efficient exploration. Unfortunately, mutual information computation is computationally very challenging. We develop new algorithms and hardware for efficient computation of mutual information, and demonstrate real-time computation for the whole map in a reasonably-sized map.



#### Depth Sensing and Perception

Depth sensing is a critical function for robotic tasks such as localization, mapping and obstacle detection. State-of-the-art single-view depth estimation algorithms are based on fairly complex deep neural networks that are too slow for real-time inference on an embedded platform, for instance, mounted on a micro aerial vehicle. We address the problem of fast depth estimation on embedded systems.



#### Localization and Mapping

Autonomous navigation of miniaturized robots (e.g., nano/pico aerial vehicles) is currently a grand challenge for robotics research, due to the need for processing a large amount of sensor data (e.g., camera frames) with limited on-board computational resources. We focus on the design of a visual-inertial odometry (VIO) system in which the robot estimates its ego-motion (and a landmark-based map) from on-board camera and IMU data.



Group Website: <a href="http://lean.mit.edu">http://lean.mit.edu</a>

## **Book on Efficient Processing of DNNs**



## Part I Understanding Deep Neural Networks Introduction Overview of Deep Neural Networks

# Part II Design of Hardware for Processing DNNs Key Metrics and Design Objectives Kernel Computation Designing DNN Accelerators Operation Mapping on Specialized Hardware

# Part III Co-Design of DNN Hardware and Algorithms Reducing Precision Exploiting Sparsity Designing Efficient DNN Models Advanced Technologies

https://tinyurl.com/EfficientDNNBook

#### **Excerpts of Book**

CHAPTER 3

#### Key Metrics and Design Objectives

Over the past few years, there has been a significant amount of research on efficient processing of DNNs. Accordingly, it is important to discuss the key metrics that one should consider when comparing and evaluating the strengths and weaknesses of different designs and proposed techniques and that should be incorporated into design considerations. While efficiency is often only associated with the number of operations per second per Watt (e.g., floating-point operations per second per Watt as FLOPS/W), it is actually composed of many more metrics including accuracy, throughput, latency, energy consumption, power consumption, cost, flexibility, and scalability. Reporting a comprehensive set of these metrics is important in order to provide a complete picture of the trade-offs made by a proposed design or technique.

In this chapter, we will

- · discuss the importance of each of these metrics;
- breakdown the factors that affect each metric. When feasible, present equations that describe the relationship between the factors and the metrics;
- describe how these metrics can be incorporated into design considerations for both the DNN hardware and the DNN model (i.e., workload); and
- · specify what should be reported for a given metric to enable proper evaluation.

Finally, we will provide a case study on how one might bring all these metrics together for a holistic evaluation of a given approach. But first, we will discuss each of the metrics.

#### 3.1 ACCURACY

Accuracy is used to indicate the quality of the result for a given task. The fact that DNNs can achieve state-of-the-art accuracy on a wide range of tasks is one of the key reasons driving the popularity and wide use of DNNs today. The units used to measure accuracy depend on the task. For instance, for image classification, accuracy is reported as the percentage of correctly classified images, while for object detection, accuracy is reported as the mean average precision (mAP), which is related to the trade off between the true positive rate and false positive rate.

CHAPTER 10

#### **Advanced Technologies**

As highlighted throughout the previous chapters, data movement dominates energy consumption. The energy is consumed both in the access to the memory as well as the transfer of the data. The associated physical factors also limit the bandwidth available to deliver data between memory and compute, and thus limits the throughput of the overall system. This is commonly referred to by computer architects as the "memory wall."

To address the challenges associated with data movement, there have been various efforts to bring compute and memory closer together. Chapters 5 and 6 primarily focus on how to design spatial architectures that distribute the on-chip memory closer to the computation (e.g., scratch pad memory in the PE). This chapter will describe various other architectures that use advanced memory, process, and fabrication technologies to bring

First, we will describe efforts to bring the off-chip l closer to the computation. These approaches are often ret near-data processing, and include memory technologies s stacked DRAM.

Next, we will describe efforts to integrate the comp approaches are often referred to as *processing in memory* of memory technologies such as Static Random Access Me

Access Memories (DRAM), and emerging non-volatile memory (NVM). Since these approaches rely on mixed-signal circuit design to enable processing in the analog domain, we will also discuss the design challenges related to handling the increased sensitivity to circuit and device non-idealities (e.g., nonlinearity, process and temperature variations), as well as the impact on area density, which is critical for memory.

Significant data movement also occurs between the sensor that collects the data and the DNN processor. The same principles that are used to bring compute near the memory, where the weights are stored, can be used to bring the compute near the sensor, where the input data is collected. Therefore, we will also discuss how to integrate some of the compute into the sensor.

Finally, since photons travel much faster than electrons and the cost of moving a photon can be *independent* of distance, processing in the optical domain using light may provide significant improvements in energy efficiency and throughput over the electrical domain. Accordingly, we will conclude this chapter by discussing the recent work that performs DNN processing in the optical domain, referred to as *Optical Neural Networks*.

<sup>1</sup>Specifically, the memory wall refers to data moving between the off-chip memory (e.g., DRAM) and the processor.

Available on DNN tutorial website <a href="http://eyeriss.mit.edu/tutorial.html">http://eyeriss.mit.edu/tutorial.html</a>

#### **Additional Resources**

#### Talks and Tutorial Available Online

https://www.rle.mit.edu/eems/publications/tutorials/





YouTube Channel
EEMS Group – PI: Vivienne Sze





#### Efficient Processing for Deep Neural Networks

- Project website: <a href="http://eyeriss.mit.edu">http://eyeriss.mit.edu</a>
- Y.-H. Chen, T.-J Yang, J. Emer, V. Sze, "Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices,"
   IEEE Journal on Emerging and Selected Topics in Circuits and Systems (JETCAS), Vol. 9, No. 2, pp. 292-308, June 2019.
- Y.-H. Chen, T. Krishna, J. Emer, V. Sze, "Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks," IEEE Journal of Solid State Circuits (JSSC), ISSCC Special Issue, Vol. 52, No. 1, pp. 127-138, January 2017.
- Y.-H. Chen, J. Emer, V. Sze, "Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks,"
   International Symposium on Computer Architecture (ISCA), pp. 367-379, June 2016.
- Y.-H. Chen\*, T.-J. Yang\*, J. Emer, V. Sze, "Understanding the Limitations of Existing Energy-Efficient Design Approaches for Deep Neural Networks," SysML Conference, February 2018.
- V. Sze, Y.-H. Chen, T.-J. Yang, J. Emer, "Efficient Processing of Deep Neural Networks: A Tutorial and Survey," Proceedings of the IEEE, vol. 105, no. 12, pp. 2295-2329, December 2017.
- Y. N. Wu, J. S. Emer, V. Sze, "Accelergy: An Architecture-Level Energy Estimation Methodology for Accelerator Designs,"
   International Conference on Computer Aided Design (ICCAD), November 2019. <a href="http://accelergy.mit.edu/">http://accelergy.mit.edu/</a>
- Y. N. Wu, V. Sze, J. S. Emer, "An Architecture-Level Energy and Area Estimator for Processing-In-Memory Accelerator Designs," to appear in IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), April 2020.
- A. Suleiman\*, Y.-H. Chen\*, J. Emer, V. Sze, "Towards Closing the Energy Gap Between HOG and CNN Features for Embedded Vision," IEEE International Symposium of Circuits and Systems (ISCAS), Invited Paper, May 2017.
- Hardware Architecture for Deep Neural Networks: <a href="http://eyeriss.mit.edu/tutorial.html">http://eyeriss.mit.edu/tutorial.html</a>



#### Co-Design of Algorithms and Hardware for Deep Neural Networks

- T.-J. Yang, Y.-H. Chen, V. Sze, "Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning," IEEE
   Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- Energy estimation tool: <a href="http://eyeriss.mit.edu/energy.html">http://eyeriss.mit.edu/energy.html</a>
- T.-J. Yang, A. Howard, B. Chen, X. Zhang, A. Go, V. Sze, H. Adam, "NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications," European Conference on Computer Vision (ECCV), 2018. <a href="http://netadapt.mit.edu">http://netadapt.mit.edu</a>
- D. Wofk\*, F. Ma\*, T.-J. Yang, S. Karaman, V. Sze, "FastDepth: Fast Monocular Depth Estimation on Embedded Systems,"
   IEEE International Conference on Robotics and Automation (ICRA), May 2019. <a href="http://fastdepth.mit.edu/">http://fastdepth.mit.edu/</a>
- T.-J. Yang, V. Sze, "Design Considerations for Efficient Deep Neural Networks on Processing-in-Memory Accelerators," IEEE
   International Electron Devices Meeting (IEDM), Invited Paper, December 2019.

#### Low Power Time of Flight Imaging

- J. Noraky, V. Sze, "Low Power Depth Estimation of Rigid Objects for Time-of-Flight Imaging," IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2019.
- J. Noraky, V. Sze, "Depth Map Estimation of Dynamic Scenes Using Prior Depth Information," arXiv, February 2020.
   https://arxiv.org/abs/2002.00297
- J. Noraky, V. Sze, "Depth Estimation of Non-Rigid Objects For Time-Of-Flight Imaging," IEEE International Conference on Image Processing (ICIP), October 2018.
- J. Noraky, V. Sze, "Low Power Depth Estimation for Time-of-Flight Imaging," IEEE International Conference on Image Processing (ICIP), September 2017.



#### Energy-Efficient Visual Inertial Localization

- Project website: <a href="http://navion.mit.edu">http://navion.mit.edu</a>
- A. Suleiman, Z. Zhang, L. Carlone, S. Karaman, V. Sze, "Navion: A Fully Integrated Energy-Efficient Visual-Inertial Odometry Accelerator for Autonomous Navigation of Nano Drones," IEEE Symposium on VLSI Circuits (VLSI-Circuits), June 2018.
- Z. Zhang\*, A. Suleiman\*, L. Carlone, V. Sze, S. Karaman, "Visual-Inertial Odometry on Chip: An Algorithm-and-Hardware Codesign Approach," Robotics: Science and Systems (RSS), July 2017.
- A. Suleiman, Z. Zhang, L. Carlone, S. Karaman, V. Sze, "Navion: A 2mW Fully Integrated Real-Time Visual-Inertial Odometry Accelerator for Autonomous Navigation of Nano Drones," IEEE Journal of Solid State Circuits (JSSC), VLSI Symposia Special Issue, Vol. 54, No. 4, pp. 1106-1119, April 2019.



#### Fast Shannon Mutual Information for Robot Exploration

- Project website: <a href="http://lean.mit.edu">http://lean.mit.edu</a>
- Z. Zhang, T. Henderson, V. Sze, S. Karaman, "FSMI: Fast computation of Shannon Mutual Information for information-theoretic mapping," IEEE International Conference on Robotics and Automation (ICRA), May 2019.
- P. Li\*, Z. Zhang\*, S. Karaman, V. Sze, "High-throughput Computation of Shannon Mutual Information on Chip," Robotics:
   Science and Systems (RSS), June 2019
- Z. Zhang, T. Henderson, S. Karaman, V. Sze, "FSMI: Fast computation of Shannon Mutual Information for information-theoretic mapping," to appear in International Journal of Robotics Research (IJRR). <a href="http://arxiv.org/abs/1905.02238">http://arxiv.org/abs/1905.02238</a>
- T. Henderson, V. Sze, S. Karaman, "An Efficient and Continuous Approach to Information-Theoretic Exploration," IEEE
   International Conference on Robotics and Automation (ICRA), May 2020.

#### Balancing Actuation and Computation

- Project website: <a href="http://lean.mit.edu">http://lean.mit.edu</a>
- S. Sudhakar, S. Karaman, V. Sze, "Balancing Actuation and Computing Energy in Motion Planning," IEEE International Conference on Robotics and Automation (ICRA), May 2020

