Table Of Contents


Introduction


EZVIZ BD-2402B1 is a surveillance system DVR which uses hisilicon SoC’s. My goal was to repurpose the hardware to run a custom fork of snes9x. In order to achieve this a serious amount of reverse engineering, and learning was required. Not only was this a hardware based project, but it was also a software heavy project.

I decided I would try and run my own snes9x fork on this hardware rather than do security related research after reviewing many articles detailing serious security vulnerabilities in products using hisilicon SoC’s. I felt that there were already enough eyes on the software running on these boards that another set of eyes would not add anything of value. In addition, hisilicon has been banned from the USA recently and so the amount of people using these camera systems is in decline. The chances of me finding a new vulnerability combined with the declining impact (usage) alluded to repurposing the hardware.

So, instead, I decided I would take the opportunity to expand my hardware hacking skill set and familiarise myself with an snes emulator ive used for many years (snes9x).

Hardware Details


To begin, the components/peripherals on the board should be described. Only the components which are of interest to porting snes9x and obtaining code execution on the device will be described, the rest of the components are well documented in the processor data sheet PDF hosted here.

The pin headers which are on the board are NC (no connect) as their resistors have been removed from the PCB. Some pins are not NC, such as ground, since they are directly connected to a ground plain.

Notable Components


A full, detailed list of the other I/O can be found in the Brief data sheet (which is publicly available). You can find a re-hosted version of that PDF here.

SoC - Hi3520DV300


“Hi3520D V300 is a professional SoC targeted for the multi-channel HD (1080p/720p) or SD (D1/960H) DVR. Hi3520D V300 provides an ARM A7 processor, a high-performance H.264 video encoding/decoding engine, a high-performance video/graphics processing engine with various complicated graphics processing algorithms, HDMI/VGA HD outputs, and various peripheral interfaces. These features enable Hi3520D V300 to provide high-performance, high-picture-quality, and low-cost analog HD/SDI solutions for customers' products while reducing the eBOM cost”

TDE - Two Dimensional Engine


The TDE is a graphics processing engine which runs outside of the arm processor. This allows for offloading of complex bitmap operations to happen without using the main processor. This engine is documented in the TDE programmers user manual. This engine helps in accelerating operations such as copying a bitmap into another and respecting the pitch (number of bytes per horizontal lines of pixels). It also can do pixel format conversions (which as you will see later in the post are required). Bitmap scaling is also done in this engine.

UART - Universal asynchronous receiver-transmitter


The uboot on the board when I received it configured the UART baud rate to be 115200 baud. The TX pin was intact/connected. I was able to interrupt uboot easily, which allowed me to set rdinit=/bin/sh for the kernel cmdline (bootargs).

I was setting init=/bin/sh and seeing no effect for a while because I didn’t realise the kernel was using an initramfs/initrd configuration. The entire file system is contained inside of the kernel image and is uncompressed and represented in memory. This also implies no persistent changes to the root filesystem were possible without recompiling the entire linux kernel and flashing it.

setenv bootargs mem=128M console=ttyAMA0,115200n8 rdinit=/bin/sh

This presented me with a root shell after booting the kernel as opposed to the PSH shell. The PSH shell is a restricted shell which doesn’t allow you to interact with the operating system normally. It’s more about checking statuses and such.

Having root access to the device gave me useful information such as the SoC product number. This information allowed me to locate leaked source code pertaining to this SoC.

JTAG - Hardware Debugging


The SoC does have JTAG however all pin headers are not connected as resistors have been removed by the manufacturer. The SoC uses pins 85-91 for JTAG related functionality. This is something of further interest, however it’s not directly useful for this project as I already have full control over everything.

Hisilicon Source Code


Which can be found here, rehosted on my servers…

Note:

If Hisilicon reaches out to me and requests I remove such content I shall. If the link is dead then you can assume the former has happened.

Some parts of the SDK are already on github.

These repos contain almost everything:

This project would not have been possible if it were not for a Chinese friend of mine who was able to download the entire leaked SDK off of https://www.csdn.net. This website requires a Chinese phone number (non-VOIP) to register an account, and you also need an account to download source code. His contribution to the project is invaluable. For his sake (and by his request) he will remain unnamed.

The leaked source code contains uboot, linux kernel 3.10, extremely detailed documentation for the SoC, schematics, MPP source code (which is publicly available), and much more. This SDK is mentioned in the Brief Data Sheet. Without this source code nor documentation of the gpio/pinmux memory layout, communication over UART, as well as many other peripherals (including HDMI output) could not be configured.

Bricking The Device & Bus Pirate


The SDK comes with precompiled binaries for two platforms. I wanted to ensure that I found the correct SDK pertaining to my hardware so I decided I would overwrite uboot with one of the precompiled uboot images found in the SDK. I used some of the commands provided by uboot to do this.

These commands are used as so:

  • Read the pre-compiled uboot image over ethernet using tftp into memory at 0x82000000
  • Probe the first SPI device and set it as the default (initialise functionality such as sf read and sf write)
  • Write the uboot image from memory to the flash memory at offset 0. The arm processor will begin execution at address 0 (reset vector). Finally reset the processor.
tftp uboot-from-sdk.bin 0x82000000 
sf probe 0
sf write 0x82000000 0 0x100000 
reset

After executing these uboot commands I received nothing out of UART and so I concluded I had flashed the wrong uboot image. There were actually two pre-compiled uboot images in the SDK. I then started working with the SPI NOR flash memory directly using my bus pirate.

However the firmware on my bus pirate has known issues with flashrom…

During the firmware upgrade and bootloader upgrade process of the bus pirate I unplugged it as I do this to close the screen I open to view the Bus Pirate.

# if you unplug the bus pirate the screen program terminates
screen /dev/ttyUSB0 115200

So, my bad habit of unplugging the device to close the screen ended up costing me 35$ and a few days of time on the project.

Flashing SPI NOR Flash Chip (MX25L12835F)


Connections must be made to VCC, CS#, SO, SI, and GND between the bus pirate and flash chip. The WP# pin must be held high in order for write protection to be disabled. This will allow flashrom to send commands to change the status register of the flash which can then allow flashrom to disable write protection on all blocks of memory.

After configuration it should look something like this:

Finally flashing the second of the precompiled uboot images onto the nor flash chip I was able to see UART output. This confirmed my original suspicion which was that I unintentionally flashed the wrong precompiled uboot image onto the chip.

Now that I know I can flash uboot images to the NOR flash it’s time to start developing a custom uboot and kernel image.

Custom UBoot & Custom Kernel


In order for me to load a custom kernel image I require more control over the boot sequence. Therefore I found it necessary to edit a few variables in the uboot configuration. Firstly, I will not be storing a CRAMFS filesystem on the flash chip, rather I will store the kernel at a predefined location in the flash memory. The only reason for recompiling the kernel is because the file system in which linux uses will be stored inside of the linux kernel. Init.d scripts are thus compiled into the kernel and so any changes to the file system require recompiling the entire kernel.

Uboot is only ~200KB so for rounding sake it will be truncated to 1MB. The resulting custom kernel image will then be truncated to 15MB. Both uboot and the linux kernel will be concatenated together as one large 16MB file. Finally the resulting file (u-boot-done.bin) is flashed.

truncate –size=1M u-boot.bin
truncate –size=15M uImage
cat u-boot.bin uImage > u-boot-done.bin
flashrom -V -p buspirate_spi:dev=/dev/ttyUSB0,spispeed=2M \
	-c MX25L12835F/MX25L12845E/MX25L12865E -w u-boot-done.bin

The custom boot command for uboot is the following:

sf probe 0 # sf initialization
sf read 0x82000000 0x100000 0x700000 # reads 7mb from offset=1MB into address 0x82000000
bootm 0x82000000 # boot the custom kernel

This allows me to remove the need for a CRAMFS file system or any file system stored on the flash memory itself.

System Startup


The SDK contains a preconfigured busybox which is essentially all executables required to operate the system stored in a single executable. Busybox will operate as the init program, which is the first usermode executable to run (PID 1). Init will parse and execute commands from the /etc/inittab file.

/etc/inittab executes the bash script /etc/init.d/rcS. /etc/init.d/rcS will execute mount -a and then execute the rest of the bash scripts in the same folder. These other bash scripts must be of the naming convention: SXX where XX is a number, the number represents precedence in execution.

System Startup Overview

  • Uboot
  • uImage loaded
  • Kernel decompressed
  • Control passed to the kernel
  • Kernel startup
  • Execution passed to /init
  • /init parses and executes commands from /etc/inittab
  • /etc/inittab says to execute /etc/init.d/rcS
  • /etc/init.d/rcS executes mount -a and then executes other bash scripts.
  • mount -a will mount file systems according to /etc/fstab.
  • echo /sbin/mdev > /proc/sys/kernel/hotplug
  • mdev -s
  • pinmux’s are configured and hisilicon kernel modules are loaded
  • Lastly a getty executable is launched and the root user is logged in without authentication.
  • Root user /root/.profile configures environment variables and lastly executes the snes9x fork.

The first USB plugged into the device will be /dev/sda with the first recognizable partition /dev/sda1. The startup scripts will mount the USB via mount /dev/sda1 /mnt/usb-drive/. The last command executed by my startup scripts executes a script by the file name “snes9x” on the USB. This will allow me to run stuff at init.d without recompiling and reflashing the kernel image to the flash.

Configuring Pinmux’s & Loading Kernel Modules


When the system is brought up, certain pins are configured in a manner which will render kernel module functionality inoperable. Such is the case with HDMI output. The SDK contains configuration shell scripts that set pinmux configurations so that when kernel modules are loaded their functionality is successful. These are important so I will list them all here:

# pinmux configuration
#I2C
himm 0x120F00E0 0x1; # 0:GPIO12_6  1:I2C_SDA
himm 0x120F00E4 0x1; # 0:GPIO12_7  1:I2C_SCL

#VICAP
himm 0x120F0000 0x2;  # 00:GPIO5_7  01:VI0_CLK   10:VI_ADC_REFCLK0

himm 0x120F0004 0x1;  #  0:GPIO1_0   1:VI0_DAT7
himm 0x120F0008 0x1;  #  0:GPIO1_1   1:VI0_DAT6
himm 0x120F000C 0x1;  #  0:GPIO1_2   1:VI0_DAT5
himm 0x120F0010 0x1;  #  0:GPIO1_3   1:VI0_DAT4
himm 0x120F0014 0x1;  #  0:GPIO1_4   1:VI0_DAT3
himm 0x120F0018 0x1;  #  0:GPIO1_5   1:VI0_DAT2
himm 0x120F001C 0x1;  #  0:GPIO1_6   1:VI0_DAT1
himm 0x120F0020 0x1;  #  0:GPIO1_7   1:VI0_DAT0

himm 0x120F0024 0x2;  # 00:GPIO10_6 01:VI1_CLK   10:VI0_CLK

himm 0x120F0028 0x1   #  0:GPIO2_0   1:VI1_DAT7
himm 0x120F002C 0x1   #  0:GPIO2_1   1:VI1_DAT6
himm 0x120F0030 0x1   #  0:GPIO2_2   1:VI1_DAT5
himm 0x120F0034 0x1   #  0:GPIO2_3   1:VI1_DAT4
himm 0x120F0038 0x1   #  0:GPIO2_4   1:VI1_DAT3
himm 0x120F003C 0x1   #  0:GPIO2_5   1:VI1_DAT2
himm 0x120F0040 0x1   #  0:GPIO2_6   1:VI1_DAT1
himm 0x120F0044 0x1   #  0:GPIO2_7   1:VI1_DAT0

himm 0x120F0048 0x2;  # 00:GPIO6_0   01:VI_ADC_REFCLK0   10:VI1_CLK
himm 0x120F004C 0x2;  # 00:GPIO11_7  01:VI2_CLK          10:VI_ADC_REFCLK1

himm 0x120F0050 0x1   #  0:GPIO3_0   1:VI2_DAT7
himm 0x120F0054 0x1   #  0:GPIO3_1   1:VI2_DAT6
himm 0x120F0058 0x1   #  0:GPIO3_2   1:VI2_DAT5
himm 0x120F005C 0x1   #  0:GPIO3_3   1:VI2_DAT4
himm 0x120F0060 0x1   #  0:GPIO3_4   1:VI2_DAT3
himm 0x120F0064 0x1   #  0:GPIO3_5   1:VI2_DAT2
himm 0x120F0068 0x1   #  0:GPIO3_6   1:VI2_DAT1
himm 0x120F006C 0x1   #  0:GPIO3_7   1:VI2_DAT0

himm 0x120F0070 0x2;  # 00:GPIO10_5   01:VI3_CLK   10:VI2_CLK

himm 0x120F0074 0x1   #  0:GPIO4_0   1:VI3_DAT7
himm 0x120F0078 0x1   #  0:GPIO4_1   1:VI3_DAT6
himm 0x120F007C 0x1   #  0:GPIO4_2   1:VI3_DAT5
himm 0x120F0080 0x1   #  0:GPIO4_3   1:VI3_DAT4
himm 0x120F0084 0x1   #  0:GPIO4_4   1:VI3_DAT3
himm 0x120F0088 0x1   #  0:GPIO4_5   1:VI3_DAT2
himm 0x120F008C 0x1   #  0:GPIO4_6   1:VI3_DAT1
himm 0x120F0090 0x1   #  0:GPIO4_7   1:VI3_DAT0

himm 0x120F0094 0x2;  # 00:GPIO6_1  01:VI_ADC_REFCLK1     10:VI3_CLK

#VGA
himm 0x120F0098 0x1;   # 0: GPIO11_6  1: VGA_HS
himm 0x120F009C 0x1;   # 0: GPIO11_3  1: VGA_VS

#HDMI
himm 0x120F0174 0x1;   # 0: GPIO13_4  1:HDMI_HOTPLUG
himm 0x120F0178 0x1;   # 0: GPIO13_5  1:HDMI_CEC
himm 0x120F017C 0x1;   # 0: GPIO13_6  1:HDMI_SDA
himm 0x120F0180 0x1;   # 0: GPIO13_7  1:HDMI_SCL

#SPI
himm 0x120F00C4 0x1;   # 00:TEST_CLK 01:SPI_SCLK 10:GPIO5_0
himm 0x120F00C8 0x1;   # 0: GPIO5_1   1:SPI_SDO
himm 0x120F00CC 0x1;   # 0: GPIO5_2   1:SPI_SDI
himm 0x120F00D0 0x1;   # 0: GPIO5_3   1:SPI_CSN0
himm 0x120F00D4 0x1;   # 0: GPIO5_4   1:SPI_CSN1

#I2C                       
himm 0x120F00E0 0x1; # 0:GPIO12_6  1:I2C_SDA
himm 0x120F00E4 0x1; # 0:GPIO12_7  1:I2C_SCL

#I2S
himm 0x120F00A0 0x1; # 0: GPIO9_0   1: I2S0_BCLK_RX
himm 0x120F00A4 0x1; # 0: GPIO9_1   1: I2S0_WS_RX
himm 0x120F00A8 0x1; # 0: GPIO9_2   1: I2S0_SD_RX

himm 0x120F00AC 0x2; # 00: GPIO9_3  01: I2S1_BCLK_RX  10:I2S2_MCLK
himm 0x120F00B0 0x1; # 0: GPIO9_4   1: I2S1_WS_RX
himm 0x120F00B4 0x1; # 0: GPIO9_5   1: I2S1_SD_RX

himm 0x120F00B8 0x1; # 0: GPIO9_6   1: I2S2_BCLK_TX
himm 0x120F00BC 0x1; # 0: GPIO9_7   1: I2S2_WS_TX
himm 0x120F00C0 0x1; # 0: GPIO5_4   1: I2S2_SD_TX  

# crg configuration
#VI(0x5c001111--150M,0x3c001111--324M,0x1c001111--300M)
himm 0x1204002c 0x1c001111 #720p

#VI ADC REF0/REF1
himm 0x120400B4 0x00000035

#TDE
himm 0x12040048 0x00000002

#IVE
himm 0x1204005C 0x00000002

#CIPHER
himm 0x12040060 0x00000002

# system configuration
######### MISC QOS setting! ######

himm 0x1212007c 0x44443201  ## VGS-JPGD-IVE -TDE -AVC0- A7 - VO -VI
himm 0x12120080 0x26334444  ## GSF-DDRT-AVC1-VPSS-VOIE-JPGE-AIO-MDU
himm 0x12120084 0x66666426  ## ###-DMAm0-DMAm1-FMC-USB2-CIPHER-SCD-SATA

#######VIVO 总线优先级 7优先级最大###########
himm 0x12120094 0x65   ###【2:0】VI   【6:4】VO
 
###############################
## mddrc0 pri&timeout setting #
###############################
himm  0x12110020  0x00000001  # AXI_ACTION[19:8]:wr_rcv_mode=0,12ports

himm  0x12110200  0x00370000  # ports0 选择随路QOS模式
himm  0x12110210  0x00370000  # ports1
himm  0x1211021c  0x08300830  # ports1读写自适应优先级
himm  0x12110220  0x00370000  # ports2
himm  0x1211022c  0x08300830  # port2读写自适应优先级
himm  0x12110230  0x00370000  # ports3
himm  0x1211023c  0x08300830  # port3读写自适应优先级
himm  0x12110240  0x00370000  # ports4
himm  0x12110250  0x00370000  # ports5
himm  0x12110260  0x00370000  # ports6
himm  0x12110270  0x00370000  # ports7
##DDRC AXI pri ports0 - 7
############## WR pri ##############
himm  0x12110204  0x76543210  # ports0         
himm  0x12110214  0x76543210  # ports1         
himm  0x12110224  0x76543210  # ports2
himm  0x12110234  0x76543210  # ports3
himm  0x12110244  0x76543210  # ports4
himm  0x12110254  0x76543210  # ports5
himm  0x12110264  0x76543210  # ports6   
himm  0x12110274  0x76543210  # ports7
############## RD pri ##############
himm  0x12110208  0x76543210  # ports0         
himm  0x12110218  0x76543210  # ports1         
himm  0x12110228  0x76543210  # ports2
himm  0x12110238  0x76543210  # ports3
himm  0x12110248  0x76543210  # ports4
himm  0x12110258  0x76543210  # ports5
himm  0x12110268  0x76543210  # ports6
himm  0x12110278  0x76543210  # ports7
##############  qosbuf #############
himm  0x12114000  0x00000002   #qosb_push_ctrl
himm  0x12114004  0x000007F1   #cycle
himm  0x1211410c  0x0000000a   #qosb_dmc_lvl
himm  0x12114110  0x0000000a   #qosb_dmc_lvl
himm  0x1211408c  0xb3032010   #qosb_wbuf_ctrl
himm  0x12114090  0xb3032010   #qosb_wbuf_ctrl
himm  0x121140f4  0x00000033   #row-hit enable
himm  0x121140ec  0x00000044   #row-hit
himm  0x121140f0  0x00003333   #row-hit
himm  0x121141f4  0x00000000   #qosb_wbuf_pri_ctrl

himm  0x121141f0  0x00000001   #enable qosbuf timeout,through prilvl to remap timeout level
############## WR timeout ###########
himm  0x1211409c  0x00000010  # wr_tout3 ~wr_tout0         
himm  0x121140a0  0x00000000  # wr_tout7 ~wr_tout4         
himm  0x121140a4  0x00000000  # wr_tout11~wr_tout8
himm  0x121140a8  0x00000000  # wr_tout15~wr_tout12

############## RD timeout ###########
himm  0x121140ac  0x00000010  # rd_tout3 ~rd_tout0          
himm  0x121140b0  0x00000000  # rd_tout7 ~rd_tout4          
himm  0x121140b4  0x00000000  # rd_tout11~rd_tout8
himm  0x121140b8  0x00000000  # rd_tout15~rd_tout12

himm  0x121141f8  0x00800002  # qosb_rhit_ctrl,open_window=128,close_window=2

Each of these addresses functionality is well defined in the SoC PDF. The executable himm is simply a mmap wrapper that allows command line editing of physical memory. The source code for this tool is contained inside of the SDK.

SNES9X Port


The last part of this project was actually writing the snes9x port. To begin, a good read over porting.html is required. The document states that there are some methods that must be implemented however most can simply be null subroutines.

To make things easier there is existing unix code for the snes9x unix port. However this unix code contains x11 code which is not applicable to my port.

The following routines were simply copied from the unix.cpp file:

bool8 S9xMapInput(const char *n, s9xcommand_t *cmd);
void InitJoysticks(void);
bool8 ReadJoysticks(void);
const char *S9xStringInput(const char *message);
const char *S9xGetFilename(const char *ex, s9x_getdirtype dirtype);
const char *S9xGetFilenameInc(const char *ex, enum s9x_getdirtype dirtype);
const char *S9xGetDirectory(enum s9x_getdirtype dirtype);
const char *S9xBasename(const char *path);

The only function of true interest is S9xDeinitUpdate which is called everytime a frame is computed and ready to be rendered. All of the scaling and pixel conversion logic must be performed inside of this function.

SNES9X Rendering


Porting x11/xorg to run on this embedded device is a no go due to size constraints. There will be no graphics libraries on the system as well (no SDL, OpenGL, etc). My snes9x fork will simply render directly to the frame buffer device /dev/fb0. The hisilicon documentation for the framebuffer is detailed and the SDK even contains examples of how to operate the frame buffer device. HiFB API Reference.pdf, HiFB Development Guide

The hisilicon SDK contains example code on how to display images to the frame buffer. The steps are: initialise the display device, then setup a display layer. The game I want to emulate (Super Mario World) has a pixel resolution of 256x240. My idea was to init the video output device to display at the lowest possible resolution so that upscaling the SNES output would require less work. The lowest the hisilicon HDMI output supports is 680x480 at 60HZ. This means upscaling will only be a little over x2.

The following code is used to setup HDMI at 640x480 resolution:

/******************************************
 step  1: init variable
******************************************/
memset(&stVbConf, 0, sizeof(VB_CONF_S));

u32BlkSize = CEILING_2_POWER(u32PicWidth, SAMPLE_SYS_ALIGN_WIDTH) *
             CEILING_2_POWER(u32PicHeight, SAMPLE_SYS_ALIGN_WIDTH) * 2;

stVbConf.u32MaxPoolCnt = 128;
stVbConf.astCommPool[0].u32BlkSize = u32BlkSize;
stVbConf.astCommPool[0].u32BlkCnt = 6;

/******************************************
 step 2: mpp system init.
******************************************/
s32Ret = SAMPLE_COMM_SYS_Init(&stVbConf);
if (HI_SUCCESS != s32Ret) {
  SAMPLE_PRT("system init failed with %d!\n", s32Ret);
  return HI_FALSE;
}

/******************************************
 step 3:  start vo hd0.
*****************************************/
s32Ret = HI_MPI_VO_UnBindGraphicLayer(GRAPHICS_LAYER_HC0, SAMPLE_VO_DEV_DHD0);
if (HI_SUCCESS != s32Ret) {
  SAMPLE_PRT("UnBindGraphicLayer failed with %d!\n", s32Ret);
  return HI_FALSE;
}

s32Ret = HI_MPI_VO_BindGraphicLayer(GRAPHICS_LAYER_HC0, SAMPLE_VO_DEV_DHD0);
if (HI_SUCCESS != s32Ret) {
  SAMPLE_PRT("BindGraphicLayer failed with %d!\n", s32Ret);
  return HI_FALSE;
}

stPubAttr.enIntfSync = VO_OUTPUT_640x480_60;
stPubAttr.enIntfType = VO_INTF_HDMI;
stPubAttr.stSyncInfo.bSynm = HI_TRUE;
stPubAttr.u32BgColor = 0x000000;  // background will be black

stLayerAttr.bClusterMode = HI_FALSE;
stLayerAttr.bDoubleFrame = HI_FALSE;
stLayerAttr.enPixFormat = PIXEL_FORMAT_YUV_SEMIPLANAR_420;

u32VoFrmRate = 60;
stSize.u32Width = SCREEN_WIDTH;
stSize.u32Height = SCREEN_HEIGHT;

memcpy(&stLayerAttr.stImageSize, &stSize, sizeof(stSize));

stLayerAttr.u32DispFrmRt = 60;
stLayerAttr.stDispRect.s32X = 0;
stLayerAttr.stDispRect.s32Y = 0;
stLayerAttr.stDispRect.u32Width = stSize.u32Width;
stLayerAttr.stDispRect.u32Height = stSize.u32Height;

s32Ret = SAMPLE_COMM_VO_StartDev(SAMPLE_VO_DEV_DHD0, &stPubAttr);
if (HI_SUCCESS != s32Ret) {
  SAMPLE_PRT("start vo dev failed with %d!\n", s32Ret);
  return HI_FALSE;
}

s32Ret = SAMPLE_COMM_VO_StartLayer(VoLayer, &stLayerAttr);
if (HI_SUCCESS != s32Ret) {
  SAMPLE_PRT("start vo layer failed with %d!\n", s32Ret);
  return HI_FALSE;
}

if (stPubAttr.enIntfType & VO_INTF_HDMI) {
  s32Ret = SAMPLE_COMM_VO_HdmiStart(stPubAttr.enIntfSync);
  if (HI_SUCCESS != s32Ret) {
    SAMPLE_PRT("start HDMI failed with %d!\n", s32Ret);
    return HI_FALSE;
  }
}

Relevant Graphics Concepts


Before jumping into things it’s best to explain some basics. In graphics rendering there is a term “pitch” which simply means “the number of bytes for a single vertical line of pixels”. This is important because overlapping bitmap images requires skipping X amount of bytes before writing the next vertical line of pixels (assuming both bitmaps are of different size). Hisilicon TDE handles overlapping and deals with pitch for us so we don’t need to!

Pixel formatting is also important to understand. The SNES9X emulator uses RGB565 by default, the hisilicon framebuffer device uses ARGB1555 by default and thus a pixel conversion is required. The TDE engine also handles pixel conversion to a certain extent.

Hardware Accelerated SNES9X with the TDE


Now that those basic graphics concepts are out of the way we can look into the TDE. The TDE as explained earlier in the post is a two dimensional graphics engine which can be used to offload scaling, overlapping, and pixel conversion operations so that the main ARM processor does not need to do that. The following API functions are of interest. These functions are inside of libtde.a and the code inside of libtde.a is essentially a wrapper function around ioctl. The TDE driver must be loaded into the linux kernel, this TDE driver depends upon the MMZ (media memory zone) driver.

HI_S32 HI_TDE2_Open(HI_VOID);

TDE_HANDLE HI_TDE2_BeginJob(HI_VOID);

HI_S32 HI_TDE2_QuickCopy(TDE_HANDLE s32Handle,
	TDE2_SURFACE_S *pstSrc,
	TDE2_RECT_S *pstSrcRect,
	TDE2_SURFACE_S *pstDst,
	TDE2_RECT_S *pstDstRect);

HI_S32 HI_TDE2_Bitblit(TDE_HANDLE s32Handle,
	TDE2_SURFACE_S *pstBackGround,
	TDE2_RECT_S *pstBackGroundRect,
	TDE2_SURFACE_S *pstForeGround,
	TDE2_RECT_S *pstForeGroundRect,
	TDE2_SURFACE_S *pstDst,
	TDE2_RECT_S *pstDstRect,
	TDE2_OPT_S *pstOpt);

HI_S32 HI_TDE2_EndJob(TDE_HANDLE s32Handle, 
	HI_BOOL bSync, HI_BOOL bBlock,
	HI_U32 u32TimeOut);

HI_S32 HI_TDE2_WaitAllDone(HI_VOID);

A full documentation of these functions is provided in the TDE API Reference Manual.

The most notable of these functions is the HI_TDE2_Bitblit function which allows for all sorts of operations to take place. The most important operations for the snes9x port is pixel conversion, and scaling. Both of these operations can be done at the exact same time using HI_TDE2_Bitblit.

stSrcRect.s32Xpos = 0;
stSrcRect.s32Ypos = 0;
stSrcRect.u32Height = SNES_HEIGHT;
stSrcRect.u32Width = SNES_WIDTH;

stSrc.enColorFmt = TDE2_COLOR_FMT_RGB565;
stSrc.u32Width = SNES_WIDTH;
stSrc.u32Height = SNES_HEIGHT;
stSrc.u32Stride = 2 * SNES_WIDTH;
stSrc.u32PhyAddr = g_pSnesBackBufferPhys;

stScaleRect.s32Xpos = 0;
stScaleRect.s32Ypos = 0;
stScaleRect.u32Height = SCALE_HEIGHT;
stScaleRect.u32Width = SCALE_WIDTH;

stScale.enColorFmt = TDE2_COLOR_FMT_ARGB1555;
stScale.u32Width = SCALE_WIDTH;
stScale.u32Height = SCALE_HEIGHT;
stScale.u32Stride = SCALE_WIDTH * 2;
stScale.u32PhyAddr = g_pScaleBufferPhys;

s32Ret = HI_TDE2_Bitblit(s32Handle, &stScale, &stScaleRect, &stSrc, &stSrcRect,
                         &stScale, &stScaleRect, &stOpt);

When the SNES draws black pixels it sets R, G, and B values to 0. This results in the pixel’s entire value being 0. The TDE engine is unable to convert RGB565 to ARGB1555 if the pixel value is 0. I am unsure why, but to fix this issue, I simply do a little error handling:

// if the screen is black all pixel values will be 0. this will cause bitblit
// to fail because it doesn't understand pixel conversions with pixels that are
// all 0... so i just make R = 1, G = 1, and B = 1... simple fix... lol...
if (s32Ret < 0) {
  for (int i = 0; i < SNES_WIDTH * SNES_HEIGHT; ++i) {
    if (*(((uint16_t*)g_pSnesBackBufferVirt) + i) == NULL) {
      *(((uint16_t*)g_pSnesBackBufferVirt) + i) = BUILD_PIXEL2_RGB565(1, 1, 1);
    }
  }

  s32Ret = HI_TDE2_Bitblit(s32Handle, &stScale, &stScaleRect, &stSrc,
                           &stSrcRect, &stScale, &stScaleRect, &stOpt);

  // if we fail here then its a legit issue and we should print the reason and
  // cancel the job…
  if (s32Ret < 0) {
    SAMPLE_PRT("HI_TDE2_Bitblit:%d failed,ret=0x%x!\n", __LINE__, s32Ret);
    HI_TDE2_CancelJob(s32Handle);
    return false;
  }
}

Once the pixel conversion and image upscaling is done we can now do a HI_TDE2_QuickCopy which will overlap the computed SNES frame upscaled and pixel converted. The following code will overlap, copy, and present the upscaled frame to the screen:

s32Ret =
    HI_TDE2_QuickCopy(s32Handle, &stScale, &stScaleRect, &stDst, &stDstRect);

if (s32Ret < 0) {
  SAMPLE_PRT("HI_TDE2_QuickCopy:%d failed,ret=0x%x!\n", __LINE__, s32Ret);
  HI_TDE2_CancelJob(s32Handle);
  return false;
}

/* 3. submit job */
s32Ret = HI_TDE2_EndJob(s32Handle, HI_FALSE, HI_TRUE, 10);
if (s32Ret < 0) {
  SAMPLE_PRT("Line:%d,HI_TDE2_EndJob failed,ret=0x%x!\n", __LINE__, s32Ret);
  HI_TDE2_CancelJob(s32Handle);
  return false;
}

HI_TDE2_WaitAllDone();  // just in case EndJob returns
                        // before the TDE computation is finished…

ioctl(g_hFrameBuffer, FBIO_REFRESH, &g_stCanvasBuf);

Conclusion


The end result is an SNES9X port that runs Super Mario World with no sound. There however is support for joysticks by default as the linux kernel is compiled with support for these devices. /dev/js0 is configured to be player 1’s controller. This code was copied from the snes9x unix code.

This SNES9X fork is extremely bare bones and doesn’t even support SRAM (game saves) however it was an extremely fun project.

Here is the first ever rendering that was done to see if the display was working. Note that there was no pixel conversion or upscaling code completed yet.