Abstract—With steps, with the first step being the recognition

Abstract—With enhancement in Neural Networks and DeepLearning technologies, the focus on problems like Face Recognition,live scene detection and more have started facing thelimelight.

From simple ANN to GANs, neural networks can beused to solve simple to complex problems. One of them is AlwaysOnHaar-Like Face Detector, implemented in wearable devices.While wearable devices are small and can not withstand largepower consumption and large chip designs to support advancedCNNs for face detection techniques, this paper introduces anultra-low power CNN FR Processor and a CIS Integrated alwaysonHaar Like Face Detector, which can be used and implementedfor smart wearable devices. Earlier work on the same hasproduced less efficient results, resulting in less than 10 hoursoperation time with 190mAh coin battery.Index Terms—face recognition; feedforward neural nets; learning(artificial intelligence); low-power electronics; microprocessorchips; power aware computing; CIS; CNN processors; alwaysonHaar-like face detector; convolutional neural network;deeplearning;device intelligence; Face recognition; Power demand;Program processorsI.

INTRODUCTIONRecent trends of Artificial Intelligence and Deep Learninghave focused on training the machines to take decisionsthemselves, without any human interference. This remainsan open problem, while neural network models like CNN(Convolutional Neural Networks), RNN (Recurrent NeuralNetworks), ANN (Artificial Neural Networks), GANs etcetera. To implement these models in wearable devices (likesmart watches, smart glasses etc.), power efficiency andprocessing speed plays vital roles. Wearable devices havelimited battery capacity and high recognition accuracy isrequired along with less power consumption. To overcomethese hurdles, ultra low-power CNN Face Recognition (FR)processor and CIS Integrated with an always-on Haar LikeFace Detector has been introduced, for smart wearabledevices.This works in three steps, with the first step being therecognition algorithm and haar cascade learning xml filebeing generated, to produce efficient and accurate results.

Haar features needed for face recognition are shown in Fig.1.1 while the results of a simple application of haar cascadeclassifier are shown in Fig.

1.2Fig. 1. “Haar Features”Fig.

2. “Output of Haar Classifier”The other steps include an ultra-low-power CNNP with wideI/O local distributed memory (DM-CNNP), and a separablefilter approximation for convolutional layers (SF-CONV) anda transpose-read SRAM (T-SRAM) for low-power CNN processing1. Fig. 3 shows the overall proposed FR system,consisting of two chips, Face Image Sensor (FIS) and CNNP.Once face detection is done using Haar Cascade Classifier, FISwill transmit only the face image to CNNP and then CNNPcompletes Face Recognition task.

Fig. 3. “Overall Architecture as proposed.”It’s very important to learn the importance of hardware acceleration,as most of the models may give higher accuracy butmay not be able to work on wearable devices.

Thus, the givenarchitecture gives interest about power efficiency and accuracybeing taken care off or face recognition. The outputs of theface recognition task has been given in Fig. 4. Architecture ofFig. 4. “The result of the proposed architecture.

“DM-CNNP containing the necessary components, in order tospeed up MAC operations is shown in Fig. 5. The process isfollowed in the following steps: Each PE will fetch 32 wordsper cycle, from local T-SRAM to support 4 convolution units.Where each unit has a 64 MAC array. Thus, the CNNP with 16PEs (4 ? 4) will be able to fetch 512 words per cycle (16 ? 32),from the wide I/O local distributed memory.

This will be ableto execute 1, 024 MAC operations/cycle simultaneously. Thismuch wider memory bandwidth and huge parallel MAC operationsper cycle enable high throughput operation, with lessclock frequency (5MHz), at near threshold voltage (NTV),0.46mV . Thus, when a convolution operation is performed,the MAC input registers shift the words by one column ateach cycle to accumulate the partial sums in the accumulationregisters.

The PEs connected to the same row can transfer datato other PEs to reduce the overhead in processing cycles dueto inter-PE data communication. Also, the MAC units can beclock-gated with mask bits to reduce the unnecessary powerconsumption.Fig. 5. “CNN Architecture”II. CONCLUSIONWe can conclude from the literature survey that along withalgorithms, power efficiency and hardware acceleration playmajor roles.

From 1980s to 2017, there have been majoradvances in the hardware circuity and this progress enablestechnology to enhance further. CNN Processor, Face ImageSensor, and circuits like Analog Haar-like filtering circuit arenew to us, and gives us good information on the trendingtechnologies.