KPU

KPU is a general-purpose neural network processor, which can realize convolutional neural network calculations with low power consumption, obtain the size, coordinates, and types of detected objects from time to time, and detect and classify faces or objects.

  • KPU has the following characteristics:
    • Support fixed-point models trained by mainstream training frameworks according to specific restriction rules
    • There is no direct limit on the number of network layers, and each layer of convolutional neural network parameters can be configured separately, including the number of input and output channels, input and output row width and column height
    • Support two convolution kernels 1x1 and 3x3
    • Support any form of activation function
    • When working in real time, the maximum supported neural network parameter size is 5.5MiB to 5.9MiB
    • The maximum supported network parameter size during non-real-time work is (Flash capacity-software volume)

Routine

Run face detection

Model download address: http://dl.sipeed.com/MAIX/MaixPy/model, download face_model_at_0x300000.kfpkg

Complete example: face_find

Running feature map

Model download address: http://dl.sipeed.com/MAIX/MaixPy/model, download face_model_at_0x300000.kfpkg

The model is an 8bit fixed-point model, about 380KB in size, the layer information is:

1 2 :160x120
3 4 5 6 :80x60
7 8 9 10: 40x30
11~16 :20x15
import sensor
import image
import lcd
import KPU as kpu
index=3
lcd.init()
sensor.reset()
sensor.set_pixformat(sensor.RGB565)
sensor.set_framesize(sensor.QVGA)
sensor.run(1)
task=kpu.load(0x300000)
img=image.Image()
info=kpu.netinfo(task)
layer=info[index]
w=layer.wo()
h=layer.ho()
num=int(320*240/w/h)
list=[None]*num
x_step=int(320/w)
y_step=int(240/h)
img_lcd=image.Image()
while True:
    img=sensor.snapshot()
    fmap=kpu.forward(task,img,index)
    for i in range(0,num):
        list[i]=kpu.fmap(fmap,i)
    for i in range(0,num):
        list[i].stretch(64,255)
    for i in range(0,num):
        a=img_lcd.draw_image(list[i],((i%x_step)*w,(int(i/x_step))*h))
lcd.display(img_lcd)
   kpu.fmap_free(fmap)

Module method

load

Load model from flash or file system

KPU.load(offset, file_path)

Parameters

The offset and file_path parameters can only choose one of the two, no keywords are required, just pass the parameters directly

  • offset: the offset size of the model in the flash. For example, 0xd00000 means the model is burned at the beginning of 13M, and 0x300000 means at the place of Flash and 3M
  • file_path: The model is the file name in the file system, such as “/sd/xxx.kmodel”
Return

If it is loaded correctly, the return value will be returned, otherwise an error will be thrown. Please see the error message thrown. In addition, please refer to here

If the error code is found to be less than the value of 2000, the firmware version is too low, and the firmware version needs to be updated

  • kpu_net: kpu network object

load_flash

Same function as load method,

kpu.load_flash(model_addr, is_dual_buf, batch_size, spi_speed)

Parameters

  • model_addr: Flash addr's preprocessed model burned to the offset address in flash. Note that the model file description needs to be preprocessed here.
  • is_dual_buf: 0, single buffer loading, using less RAM and slower speed to dynamically load the model file; 1, enabling double buffer loading, requires larger RAM, and running speed is relatively fast .
  • batch_size: When setting is_dual_buf to 1, load batch_size needs to be set. The recommended value is 0x4000~0x10000, which can test the best value of the model. If is_dual_buf is 0 then set to 0
  • spi_speed: When using SPI flash to load the model file, we will temporarily set the flash to high-speed mode and set the required spi clock frequency. The value should be <= 80000000 (the actual frequency, the set value may not be equal to the actual frequency.)

return value

  • kpu_net: kpu network object

init_yolo2

Pass in initialization parameters for the yolo2 network model, only used when yolo2 is used

KPU.init_yolo2(kpu_net, threshold, nms_value, anchor_num, anchor)

such as:

import KPU as kpu
task = kpu.load(0x300000)
anchor = (1.889, 2.5245, 2.9465, 3.94056, 3.99987, 5.3658, 5.155437, 6.92275, 6.718375, 9.01025)
kpu.init_yolo2(task, 0.5, 0.3, 5, anchor)

Parameters

  • kpu_net: kpu network object, that is, the loaded model object, the return value of KPU.load()
  • threshold: Probability threshold, only if the probability of this object is greater than this value will the output result, value range: [0, 1]
  • nms_value: box_iou threshold, in order to prevent the same object from being framed in multiple boxes, when two boxes are framed on the same object, the intersection area of ​​the two boxes accounts for the proportion of the total occupied area of ​​the two boxes. When it is less than this value, take the box with the highest probability
  • anchor_num: the number of anchor points, fixed here as len(anchors)//2
  • anchor: The anchor point parameters are consistent with the model parameters. This parameter of the same model is fixed and bound to the model (it is determined when the model is trained) and cannot be changed to other values.

return value

  • success: bool type, success

deinit

Release the memory occupied by the model and release it immediately, but the variables are still there, you can use the way of del kpu_net_object to delete,
In addition, you can also just use del kpu_net_object to mark that the object has been deleted. The next time GC performs memory collection or manually calls gc.collect(), the memory will be automatically released

KPU.deinit(kpu_net)

such as:

import KPU as kpu
import gc
task = kpu.load(0x300000)
kpu.deinit(task)
del task
gc.collect()

or:

import KPU as kpu
import gc
task = kpu.load(0x300000)
del task
gc.collect()

Parameters

kpu_net: kpu_net object returned by KPU.load()

return value

  • success: bool type, success

run_yolo2

import KPU as kpu
import image
task = kpu.load(offset or file_path)
anchor = (1.889, 2.5245, 2.9465, 3.94056, 3.99987, 5.3658, 5.155437, 6.92275, 6.718375, 9.01025)
kpu.init_yolo2(task, 0.5, 0.3, 5, anchor)
img = image.Image()
kpu.run_yolo2(task, img) #This is wrong, please refer to the routine

Parameters

  • kpu_net: kpu_net object returned by kpu_load
  • image_t: the image collected from the sensor
Return
  • list: list of kpu_yolo2_find

forward

Calculate the loaded network model to the specified number of layers, and output the feature map of the target layer

fmap=KPU.forward(kpu_net, img, end_layer)
import KPU as kpu
task = kpu.load(offset or file_path)
...
fmap=kpu.forward(task,img, 3)

Parameters

  • kpu_net: kpu_net object
  • img: image image.Image object
  • end_layer: Specify which layer is calculated to the network, the value starts from 0
Return
  • fmap: Feature map object, containing feature maps of all channels in the current layer

fmap

Take the specified channel data of the feature map to the image.Image object

img=KPU.fmap(fmap, channel)

Parameters

  • fmap: Feature map object
  • channel: Specify the channel number of the feature map, starting from 0
Return
  • img: The grayscale image generated by the feature image corresponding to the channel, type image.Image

fmap_free

Release feature map object

KPU.fmap_free(fmap)

Parameters

  • fmap: Feature map object
Return
  • None

netinfo

Get the network structure information of the model

info_list = kpu.netinfo(task)

Parameters

  • kpu_net: kpu_net object, KPU.load() return value
Return
  • info_list: information list of all layers, including information:
    • index: the number of the current layer in the network
    • wi: input width
    • hi: input height
    • wo: output width
    • ho: output height
    • chi: Number of input channels
    • cho: Number of output channels
    • dw: Whether it is a depth wise layer
    • kernel_type: Convolution kernel type, 0 is 1x1, 1 is 3x3
    • pool_type: Pooling type, 0 no pooling; 1: 2x2 max pooling; 2:...
    • para_size: the number of bytes of the convolution parameter of the current layer

set_outputs

success = set_outputs(kput_net, out_idx, width, height, channel)

Manually set the shape of the output layer. For the V4 kmodel model converted from nncase v0.2.0,
After load, you need to call this function to manually set the output layer shape, V3 model does not need

Parameters

  • kpu_net: kpu_net object
  • out_idx: The following table of the output layer, starting from 0, for example, the first output layer is 0
  • width: layer width, if it is a one-dimensional output, it is 1
  • height: layer height, if it is a one-dimensional output, it is 1
  • channnel: The number of layer channels, if it is a one-dimensional output, then here is the length of the one-dimensional output
Return
  • success: Whether the setting is successful, if not, please pay attention to the output prompt information, refer to error code

memtest

Print memory usage, including GC memory and system heap memory

  • Note that executing this function will automatically execute gc.collect() to collect memory once, and then print the remaining memory of GC
  • The system heap memory is for reference only and may not be accurate. Sometimes it may appear that the memory has been released, but the display is still not released. The actual memory can be allocated to prevail.
KPU.memtest()

face_encode

Quantify the feature map returned by forward. For more details, please see: kpu issue

feature = kpu.face_encode(fmap[:])

Parameters

fmap[:]: list type, convert the return value of the forward function into a list

return value

feature: list type, quantified list

face_compare

Compare the quantized value returned by face_encode with the entered face

score = kpu.face_compare(record_ftrs[j], feature)

Parameters

record_ftrs[j]: list type, with recorded face data
feature: list type, face data to be compared, return value of face_encode

return value

score: int type, compare score (0~100), the higher the score, the greater the similarity