Porting NuttX to the Raspberry Pi 4B

When I was getting into RTOSs, the first RTOS I experimented with was Blackberry's QNX. It's a microkernel, POSIX compliant, real-time operating system for embedded applications. However, I discovered soon after that its microkernel and security features came with the requirement of a system with a MMU. The only hobbyist device that QNX ran on at the time was a Raspberry Pi 4B, which for my application of a rocketry flight computer, wasn't exactly the low-power, small form factor device I was looking for. It was at that point where I starting exploring alternatives, and came across Apache NuttX , which was scalable from 8 bit to 64 bit systems, and could run in sometimes as little as 16KB of flash!

Although NuttX came with support for so many cheap, hobbyist boards, I noticed that it was missing support for the Raspberry Pi 4B. I was interested to see if I could port NuttX to the Pi and run my old, POSIX compliant software that was already written for QNX on it. I knew the Pi 4B was a popular hobbyist board, and also came with a lot of great peripherals that could be useful for benchmarking NuttX and trying out all its different features. I also knew that NuttX already had support for the ARMv8-A architecture, so I had some groundwork already laid out for me. Thus began the endeavour of adding support for the BCM2711 on NuttX.

Most of the documentation about the BCM2711 online comes from a peripheral datasheet released by Raspberry Pi (which does not cover many of the peripherals already on the Pi 4B). I wanted to try to document my process a little bit to both make it easier for NuttX contributors to do future ports, and also to add a little more documentation about the BCM2711 online. There were not many resources about bare-metal programming the Pi 4B outside of the few I stumbled on (such as the rpi4-osdev project).

Below are the journal entries I made while undergoing the port. I have also included a more refined version of these notes in the NuttX guides documentation , but I wrote out all this information to be of use, so I may as well post it somewhere accessible. You can also see the initial support pull request to the NuttX kernel here . While you read, please remember that I was learning along the way while writing these entries, and some of the assumptions or statements I make about existing NuttX functionality and the BCM2711 may be misinformed, incorrect, or no longer true. Hopefully these entries are of some utility!

Day 1

Day 2

Day 3

Day 4

Day 5

Day 6

Day 7

Day 8

Day 9

Day 10

Day 11

Day 12

Day 13

Day 14

Day 1: 2024-07-25

I am beginning work on porting NuttX to the Raspberry Pi 4B. I had originally been using QNX on this platform, but after learning about NuttX and the range of different systems it supports, I wanted to start experimenting with it instead. Unfortunately the Raspberry Pi 4B was not one of the systems with existing NuttX support. I thought it would make a nice development board because it's very densely featured.

I first checked the NuttX issues listing in case someone else had already begun work on the port. I found an existing issue which referenced Gregory Nutt's previous work on a port, but at the time he was writing it for the BCM2708 (and the BCM2835) chip. The Raspberry Pi 4B uses a BCM2711, so I would be able to use his prior work as a reference point but not as a completely started implementation.

I initially began with researching the BCM2711 chip. I knew from prior experience that is was part of the aarch64 architecture. Looking at the data sheet provided by Raspberry Pi, I learned about the different available interfaces and their register interfaces. I looked through Gregory's previous work and the source code for the RP2040 implementation on NuttX (because of prior familiarity), and saw that there were quite a few headers defined for the memory map, register locations and bit masks for accessing the different sections of each register. This seemed like the logical place to start for me, because I would need to be able to access all the different registers in order to do anything with the chip. I began creating these header files in a similar style to what I had already seen done:

bcm2711_armtimer.h
bcm2711_aux.h
bcm2711_bsc.h
bcm2711_dma.h
bcm2711_gpclk.h
bcm2711_gpio.h
bcm2711_irq.h
bcm2711_mailbox.h
bcm2711_memmap.h
bcm2711_pcm.h
bcm2711_pwm.h
bcm2711_spi.h
bcm2711_systimer.h
bcm2711_uart.h

The memory map header contained base addresses for each of the different interfaces that I had seen listed in the datasheet. Then, I created a header file for each section of the data sheet (covering a different interface) in which I defined the register addresses for the interface by using the base address + some offset. Once I had all the registers listed, I also created bitmasks for all of the individual parts of each register. Some registers just contained 32 bit numbers or were intended to be accessed as a single 32-bit word, so I did not create masks for those.

Once I had completed this copying work into header files, I took a look through Gregory's prior implementation and started getting unsure of where to start. There are a lot of interfaces to implement for NuttX, and I wasn't quite sure which ones needed to be implemented first to build off of. I decided that this was probably because I was still relatively new to NuttX, and I would need to read through the documentation to get a better grasp of the inner workings of the kernel.

NuttX Documentation Website

I ended up reading through quite a bit of the NuttX website's documentation page; parts of the "OS Components" section and all of the "Implementation Details" section. After reading these, I did learn more about the inner workings of NuttX, but I still didn't know what was required for starting my port again. At this point, I remembered that I heard about a porting guide existing somewhere. I decided I would reach out on the developer email forum to ask for assistance in finding resources about porting NuttX. I received quite a few responses very quickly, within just a few hours. Among the resources I was recommended were a Confluence page updated in 2020 with a porting guide for NuttX, and the blog of a NuttX developer named Lup Yuen Lee, who had ported NuttX to the Pinephone and documented his work quite extensively. It's at this point that I am writing this entry, and I am now taking some time to read the porting guide and blog posts to see if it gives me a bearing on where to start.

What information I've collected so far:

The BCM2711 uses a quad-core Cortex-A72, 64-bit SoC at 1.5GHz. This uses the Armv8-A architecture (which I gather is the same as aarch64). This information is from the Raspberry Pi Processor Documentation . NuttX appears to already have support for Armv8-A, under the name a64 . This significantly reduces my workload. All of the up_* functions are already written.

The BCM2711 will need an irq.h header defining the number of interrupts ( NR_IRQS ) for that chip.

Watchdog interfaces are not implemented, and neither are some of the system timer interfaces. Nothing implemented for address environments.

Looks like SMP is implemented, not shared memory though.

Day 2: 2024-07-26

I had a chance to look through Lup's blog post about porting to the Pinephone, and noticed something interesting. One of the supported "boards" for NuttX is an aarch64 version of QEMU. In his blog post, Lup tested booting on a QEMU instance with a simulated Cortex A53, both single-core and SMP. I decided to verify that the common code for aarch64 would work with the Cortex A72 (the BCM2711's core(s)) by booting the QEMU instance using that core instead.

Following the same step from Lup's blog post, I booted the QEMU instance with the only modification being the selected core. Sure enough, the system booted without issue (on both single-core and SMP with 4 cores). This somewhat confirms that I should not encounter too many issues regarding the use of an A72 core when performing my port. Of course the real test is getting a minimal boot on the real hardware. Lup had done something similar for the Pinephone after writing some serial support, so that is where I'll start next.

Day 3: 2024-07-27

Reading through the article written by Lup on his serial driver, I remembered that the data sheet for the BCM2711 mentions a Mini-UART which is meant for use as a console. This will be the first thing I target my UART driver for, so that I will have the recommended console.

Before writing the serial driver, I wanted to make sure that I could create a bootable image. There's no point in getting serial output ready if it can't boot!

I created a minimal raspberrypi-4b:nsh config under the board's config/nsh directory, copied from the Pinephone NSH config. I removed any config flags that were specific to the A64 or Pinephone because I don't have them implemented for the BCM2711. I used the tools/configure.sh script with the -L option to confirm that my configuration showed up in the list. It did!

After this, I tried to build the new configuration to see if I was missing anything that would cause build system errors. When I used tools/configure.sh raspberrypi-4b:nsh , I got the output that the Make.defs file could not be found. Comparing the directory structure between my Raspberry Pi 4B directory and the Pinephone's directory, I noticed that the scripts folder on the Pinephone contained a linker script and a Make.defs file, which I was missing.

I added the two files to the Raspberry Pi 4B directory, directly copying from the Pinephone. I just made sure to change the comment headers to reflect the board name and chip (RPi 4B and BCM2711). I am not at all familiar with linker scripts, so I did not modify the linker script to work with the Pi 4. I know it will need modification eventually because the Raspberry Pi 4B will not have UBoot and will therefore likely need a different start address, but I am leaving this task for later once I can learn more about linker scripts. In the meantime, I just want to try to make the build work.

After compiling, I received the error that I am missing the chip.h header. I once again copied the chip.h header from the src/a64 directory in the arch folder to the matching location in my BCM2711 source tree and tried again. Even more missing chip.h error messages. I was missing the header file that goes in the <chip>/include directory as well.

I copied the include/a64/chip.h header to the corresponding location on the BCM2711's source tree ( include/bcm2711/chip.h ) once again. I noticed that this file contains definitions for the GIC interrupt controller location and the size of RAM. This is something that will most likely be different between the two chips. It also contained device IO base addresses and the load address for the NuttX kernel by UBoot (once again, will need to be changed). I left a "TODO" comment to change these values and tried compilation again.

The last thing to fix in the build errors was an undefined NR_IRQS . This definition is the number of IRQs supported by the chip, and should be defined in arch/arm64/include/bcm2711/irq.h (I had read this in the porting guide). Looking at the BCM2711 data sheet, there are 63 Videocore interrupts and 15 ARMC interrupts. There are also 5 interrupts per ARM core. I assume that I need to account for all 20 (5 IRQs * 4 cores). That gives a total of 98 IRQs. I'm not sure if this is right, but it's good enough to get something compiled.

Now we get further, but I receive the error of an implicit declaration of MPID_TO_CLUSTER_ID . It seems this is defined in most of the arm64 chip.h include headers, but not for the a64 . The definition across all of the files I've seen is the same, so I'll copy-paste it for now and see if it works. Same for CONFIG_GICR_OFFSET (another missing symbol). This one depends on the GIC version but I'll worry about it later.

This finally seems to be the end of the errors, but now I'm told that there's no rule to make the dramboot.ld linker script. It seems that the BOARD_DIR variable is not being set correctly for the new Raspberry Pi 4B board directory. I found that this variable is defined in tools/Config.mk , and is composed of the configured architecture and chip. For some reason the chip is being configured as QEMU, which means somewhere I have my configuration set up wrong. I double checked my defconfig for the NSH configuration and it's not that one.

It seems that CONFIG_ARCH_BOARD_CUSTOM is being set when it shouldn't be. I changed the generated config file to specify the Raspberry Pi 4B board instead of a custom board (set to QEMU), but now I receive linker errors with the arm64 common files and some boardctl functions. I'll need to investigate the root issue of the board not being set properly in the config.

I watched a lecture about linker scripts to determine how to write a linker script for the Raspberry Pi 4B.

Day 4: 2024-07-28

It turns out the issue with the generated configuration file that I was having earlier was because I had forgotten to list the BCM2711 chip in the arch/arm64/Kconfig file. I discovered this by comparing with the Raspberry Pi Pico configuration. When I looked at the "Board Selection" menuconfig option, I noticed that the options listed were only boards using the RP2040 or a custom board. With the Raspberry Pi 4B hand-modified config, the boards that showed up were just QEMU (one other that I forget) and custom board. It seemed from looking at the board selection Kconfig option that Custom Board was the default option (which makes sense on why that option was then being selected). I tracked down the chip option from arch/arm/Kconfig for the RP2040, which led me to then make the appropriate modifications in arch/arm64/Kconfig .

After learning more about linker scripts I now want to move on to changing the script for the board. I know that there is some SRAM and cache in the BCM2711 itself, and that the Pi 4B can come with 1, 2, 4 or 8GB of DDR RAM. I don't quite know how to represent this in the linker script, and I'm not sure how to describe the SD card as disk memory (I don't think this goes in the linker script because it is a variable size depending on what the person has on hand).

I have been looking at the LowLevelDevel BCM2711 for ideas about how to implement some of the hardware interaction and do a bare-metal boot of the Raspberry Pi.

Instead of modifying the linker script I decided to work on some more configuration options. I made the BCM2711 a multi-CPU chip. The Cortex A72 now also defines that it uses the GICv3. I got started a little on the Mini UART interface following Lup's blog post about doing the same for the Pinephone, but I can't go further with booting until I figure out the GIC addresses which are preventing me from doing so.

I looked at the GIC400 manual and confirmed with this GitHub issue I found what the GIC400 base address and distributor offset are. I am still not sure about the redistributor offset. I found on this documentation page that the GIC400 is actually GICv2 architecture. The AllWinner64 chip used for the Pinephone uses the GICv2 PL400. When I performed a Google search for that, I get the GIC400 results only. I will copy the GICR_BASE definition from there for now. Same with GICR_OFFSET , which seems to be consistent across all the implementations so far.

I ended up writing some more of the serial driver interface for the Mini UART, using the a64 implementation by Lup as a guide. Once I complete an interface for the Mini-UART, I'll move on to implementing some of the other missing functions the linker is complaining about. A lot of the boot functions I copied from the a64 and left empty, as they involved setting up the MMU and I don't quite know how to do that yet. It wasn't too bad to set up most of the UART operations table for the driver since they involve simple tasks (returning true if the TX FIFO is empty, etc). Setting up some of the structures for device registration will take some more time and thinking. I will likely need two different sets of UART ops, one for the Mini UART and the other for the regular UART interfaces.

I got most of the implementation for the Mini UART serial driver complete except for the interrupts. I will complete that later once I can get some more of the project compiled. The next thing preventing compilation is the board initialization function, which must be added in the Raspberry Pi 4B board source tree. I will just implement a stub for now since I'm more concerned about just booting and seeing a UART output first.

After adding the board initialization stub, I am onto the next linker error: missing g_mmu_config . This one is actually going to take some effort since the MMU will likely need to be set up properly in order to boot. I have to read about the MMU for the ARM Cortex A72.

In doing so, I stumbled across a tutorial for writing an RPi 4B OS . I am going to read through it and see if I learn anything useful for porting.

It turns out that I can still compile with an empty g_mmu_regions variable, so I am now writing the board_app_initialize function which is the next linker error. I am once again using the Pinephone implementation as a guide. With those stubs implemented, the build fully compiles into an executable and a .bin file. This is great sign, because now I have a list of "TODO" comments scattered in places where I need to implement features to actually get this image running.

Just to test the debug output, I uploaded the required files to an SD card ( config.txt , fixup4.dat , etc) with the debug UART enabled. This way I could see if the Pi would run it's initial bootloader and recognize the generated nuttx.bin file (even though I know it won't work). Sure enough, I was able to boot the Pi and the debug output showed that nuttx.bin was correctly loaded as the kernel. Nothing else happened, however. I need to finish implementing some NuttX interfaces.

Reading through Lup Yuen's blogs again, I see that the common logic for arm64 calls a PRINT macro during the initial boot sequence. It needs an up_lowputc function to be defined in order to work, which I haven't defined for the BCM2711. Implementing that next would help see the boot output.

After implementing the lowputc functions required for the PRINT macro (in C, not assembly), I re-configured the nsh preset defconfig to use ARCH_EARLY_PRINT . I didn't receive any errors when re-building, so I tried copying the binary to the SD card and booting the Pi again, just in case I would see something. No luck still, which means something is surely wrong with how the binary is being loaded.

Day 5: 2024-07-29

I noticed while reading through the Raspberry Pi OS tutorial that my base addresses for the peripherals were using the legacy addressing mode on the BCM2711. I need to change this to be configurable. I now allow the peripheral base to be selected based on configuration options (not included in Kconfig files yet), and add the other offsets to this base. Maybe this will allow the early print to run (as I was booting in 35-bit addressing mode but using the legacy addressing).

Reading through Lup's blogs again, I read his post about the A64 interrupt controller being a GICv2 controller. It turns out that the GICR address is actually the GIC CPU Interface address, so I had this address correct.

I will likely need some post-build logic in order to create a proper config.txt file for the NuttX image. There are lots of boot options that get specified in config.txt . I saw that the RP2040 has a post-build Config.mk file for generating a .uf2 file for NuttX. I can do the same for the Raspberry Pi's config.txt . I also saw in the rpi4-osdev Makefile that aarch64-none-elf-objcopy is used to create a .img file from the output binary in order to be loaded as the kernel. I'll add this conversion step to the post-build as well.

I added the boiler plate for a post build generation of the config.txt file (just an echo message for now), and also created the boiler plate files for the RPi 4B's port documentation on the NuttX website.

Day 6: 2024-07-30

It's clear from my attempts at booting with early printing enabled that I'm not going to get very far unless I actually gain an understanding of the boot process on the Pi and within NuttX. I've tried a few different things cobbled together from online tutorials, guides and hardware manuals, but since none of them miraculously worked, I need to try again with a better understanding.

I've downloaded a copy of the ARMv8 Programmer's Guide on my eReader, and I hope to finish reading that on my commutes to work so I can get a better understanding of the aarch64 architecture. In the meantime, I've also got access to all the public information from Broadcom about the BCM2711 (which I've finished reading, so that's checked off), which isn't much. I should probably start with reading the common source files for the NuttX aarch64 implementation so I can see what happens at boot.

What I don't understand is:

How to configure the MMU
If my linker script is done properly
How to configure interrupts properly
How the NuttX boot process works

Once I can figure these things out, I should be better equipped to get something that prints UART output up and running. Then I can worry about secondary things like writing device drivers for the board hardware, testing if SMP works out of the box (like it did for QEMU) and if I should bother supporting things like the 32-bit execution mode for the processor.

I looked a Lup's blog which has a call graph of the aarch64 boot sequence on NuttX. Everything starts at arm64_head.S , which is where I'll look first.

The start requirements are that MMU, D-cache and I-cache must be disabled. I will have to check to make sure that the Raspberry Pi 4B start.elf does not enable these. From skimming online it appears not, but I am unsure. The arm64_earlyprintinit function and arm64_lowputc functions are called very early in the start code. It seems other implementations are doing straight register manipulations and not using any variables in their lowputc logic, which is something I didn't adhere to in my original implementation. I am going to try changing that and seeing if I can get serial output this time around.

I have re-written the serial driver implementation and I can confirm that some garbled serial output is appearing (not a success, nothing is decipherable). I opted to turn GPIO26 high in my arm64_earlyprintinit function so I could get that as an indication of booting. The GPIO pin does go high, which indicates to me that the arm64_earlyprintinit function is being called correctly. Something is wrong with my UART configuration.

After trying out some more things, namely compiling the "kernel" from the rpi4-osdev project with the NuttX linker script and running it (no issues), I discovered something interesting. I tried calling some register write functions to print the message "hi" via the Mini UART transmit register. I knew the FIFO would have enough space on reset to hold at least those four characters (hi + newline and carriage return). Sure enough, on boot, "hi" appeared on the UART lines, followed by the same garbled output I had witnessed earlier. This is interesting, and it means that the char parameter being passed to the arm64_lowputc function is pointing to garbage memory. This makes me wonder if I'm not setting up stack memory properly in the linker script. It's time to have another look at the linker script.

In a desperate attempt for a quick solution, I tried modifying the dramboot.ld script that the Allwinner 64 used to just have a different load address of 0x8000 (where start.elf loads the kernel). I booted this image, and I'm no worse off but no better off; "hi" still appears but it's followed by garbled output again.

It seems that within the arm64_earlyprint function I can make multiple calls to arm64_lowputc and they print successfully. If I do the same in a for-loop with a string, calling arm64_lowputc for each character, it does not. I should check if the compiler is optimizing away the function calls into individual invocations with the raw character, but the string scenario cannot be optimized the same way hence it failing. There are two prior implementations of lowputc in C from the imx8 and imx9 chips, but the other implementations are all in assembly. I wonder if that has something to do with it. I need to know a way around this issue with the char ch parameter.

After some more experimentation, I discovered that I can call arm64_lowputc from the assembly startup without any problems. My example code right after arm64_earlyprintinit was:

mov w0, 'h'
bl arm64_lowputc
mov w0, 'i'
bl arm64_lowputc
mov w0, '\n'
bl arm64_lowputc

This did successfully print "hi" to the screen after the "Hello World" in arm64_earlyprintinit . This means calling the C function from assembly isn't what's causing problems (makes sense since others have done this before me). If not that, then what? Is the PRINT macro garbling the output?

I added a PRINT macro call immediately after the arm64_earlyprintinit call and got garbled output again. This surely means that the stack does not work. From the call graph, I can see that arm64_boot_el1_init and arm64_boot_primary_c_routine are both called before the early print. The first call should be okay empty (as I left it) since currently I don't have anything to initialize at EL1. The second call is more complex.

On top of that, I see prior to the primary C routine call that there an instruction which loads a stack and entry point:

/* load stack and entry point */
ldr    x24, =(g_idle_stack + CONFIG_IDLETHREAD_STACKSIZE)

The primary C routine first calls arm64_chip_boot() , which does various things such as board and serial initialization. Given the fact the first early print statement appears after this stage, I think we shouldn't be experiencing these issues with the lowputc garbled output.

With some more experimenting, printing a string in a for loop works as long as the string is declared as static . Something is definitely up with uninitialized data. I will need to do some more reading about the boot process to help me figure out what's going on.

Day 7: 2024-07-31

Looking at the low-peripheral mode memory map from the BCM2711 data sheet again, I see that most of the RAM starts at 0x0 (besides one SDRAM bank which is for the extra expansion for the 8GB Pi variant as far as I can tell). I'm modifying the MMU defines to reflect the RAM and device IO addresses. I wonder if this may resolve the stack issue. In any case it should help get me further along in the boot process.

I'm thinking of reaching out to the NuttX community if I can't figure out these PRINT macro issues. It seems strange to me that the whole arm64_chip_boot process would complete before any early printing is done. I can't see why this would be helpful.

Once I was able to experiment some more, it appears trying to print a non-static string before printing the static one results in a program crash (I don't see any screen output). This means that the boot sequence is likely crashing elsewhere during a normal start because of these issues (which I believe are related to the stack).

There is no difference when booting nuttx.img or nuttx.bin , with or without the debug UART for start.elf enabled.

Day 8: 2024-08-02

I am beginning to get frustrated by the early printing issues. I've checked the disassembly of the generated binary and confirmed that the strings passed to the PRINT macro are kept in the .rodata section, which is the same place as the static string from arm64_earlyprintinit that I was using to test.

From what I can tell by reading the disassembly, the correct addresses for the string are being loaded and iterated over in the PRINT macro. It must work too because it's the same for all the other arm64 implementations on NuttX. This leads me to think that something is wrong with my linker script or the way the program is being loaded, or possibly the initial value of the stack pointer. I am not sure.

Day 9: 2024-08-04

Finally, after much troubleshooting, I managed to boot the NuttX kernel on the Raspberry Pi 4 to the point where I can actually see the early print messages.

It turns out that a linker script load address of 0x80000 was not correct, and actually the load address is 0x480000. I don't know why all the resources I saw online say otherwise, but after seeing the debug output of the Pi boot loader say "Kernel relocated to 0x480000", I decided to try that address in my linker script again. I could have sworn I already tried that start address with no luck, but this is why you write down your troubleshooting steps I suppose.

Now I can see this error output during the boot process:

----gic_validate_dist_version: No GIC version detect
arm64_gic_initialize: no distributor detected, giving up ret=-19
_assert: Current Version: NuttX  12.6.0-RC0 6791d4a1c4-dirty Aug  4 2024 00:38:21 arm64
_assert: Assertion failed panic: at file: common/arm64_fatal.c:375 task: Idle_Task process: Kernel 0x481418

It looks like the way I configured the GIC-400 in my changes was not enough to allow NuttX to find the GIC. By grepping for the "No GIC version detect" message, I can see it appears in the gic_validate_dist_version function for the GICv3. The GIC400 is a GICv2. I also see in the .config file that the GIC version is listed as 3. I need to change that.

After modifying the Kconfig files to select GICv2 with the BCM2711, I rebooted the image. The GIC error messages are gone now (I take that as a success). What I see now is:

MESS:00:00:06.144520:0:----_assert: Current Version: NuttX  12.6.0-RC0 f81fb7a076-dirty Aug  4 2024 16:16:30 arm64
_assert: Assertion failed panic: at file: common/arm64_fatal.c:375 task: Idle_Task process: Kernel 0x4811e4

I enabled further debug information in the kernel. This included all the info, error and warning options that I thought would be relevant for the earlier booting stages (the MMU, GPIO, timers, interrupts, etc.). I now have error output which looks like this:

arm64_oneshot_initialize: cycle_per_tick 54000
arm64_fatal_error: reason = 0
arm64_fatal_error: CurrentEL: MODE_EL1
arm64_fatal_error: ESR_ELn: 0xbf000002
arm64_fatal_error: FAR_ELn: 0x0
arm64_fatal_error: ELR_ELn: 0x48a458
print_ec_cause: SError interrupt

It seems like right after initializing a one-shot timer, a fatal error occurs. This smells like an unhandled interrupt.

Looking at the arm64_oneshot_initialize function, right after the log of "cycle_per_tick 54000", there is a call to irq_attach , followed by up_enable_irq , followed by arm64_arch_timer_enable . One of these calls must be failing before we reach the next log statement, or an unhandled interrupt is causing a problem.

I added another log statement right after irq_attach , booted again and observed that the new log statement was not executed. This narrows down the search space. The debug output says the reason is 0 , which is the value given to K_ERR_CPU_EXCEPTION . The function arm64_fatal_error is only called in arm64_vector.S , and it is only called twice with that reason argument. Between the two times it's called with that reason , one of them is the arm64_serror_handler function, and the print_ec_cause information output says the cause was an "SError interrupt".

The arm64_serror_handler function is used as the default handler for 3 vector table entries. I now need to figure out what triggered the SError interrupt.

Adding logging information inside of the irq_attach function, I can see that the failure is probably occurring inside spin_lock_irqsave :

mm_malloc: Allocated 0x4c13b0, size 48
arm64_oneshot_initialize: cycle_per_tick 54000
irq_attach: In irq_attach
irq_attach: before spin_lock_irqsave
arm64_fatal_error: reason = 0

Repeating this process of adding logging statements, the failure is now narrowed down to within spin_lock , in up_testset .

I found a StackOverflow article about SErrors .

Day 10: 2024-08-05

It seems the SError is a tricky one to reason about. It occurs when the system detects an unrecoverable fault, and it is asynchronous.

I have found this blog post which had some information regarding SErrors. It has a handy excerpt from the ARM documentation breaking down the information encoded in the ESR_ELn register.

In my case, the first 6 bits ( 0b101111 ) indicate an SError (as is picked up and printed by print_ec_cause ). The following bit 0b1 , indicates that the trapped instruction was 32 bits. The remaining bits 0b1000000000000000000000010 are the ISS , or "Instruction Specific Syndrome" field. This should tell me more information about the exception.

Unfortunately, if this StackOverflow article is to be believed, then the ISS field is implementation defined. This means that unless Broadcom documents what this ISS means, I will have a lot of trouble decoding it myself. And Broadcom does not like to share information about their processor.

I found this ARM documentation page for SError ISS encoding .

In my case, IDS (bit 24) is set to 1, which means bits 23 to 0 contain implementation defined syndrome information. This means I have to figure out if anyone has reverse-engineered the Broadcom encoding.

Continuing to read through the first blog post about ARM exceptions, I see that the ESR_ELn register is supposed to contain the address to return to at the end of the exception. It is set to 0x48a4ac , pointing to the following instruction of up_testset (where I traced the error):

ldaxr x2, [x0] /* Test if spinlock is locked or not */

This address seems to consistently be the address in ESR_ELn after running the boot a few times. I was worried that because SError is an asynchronous exception, the address may change.

I did some searching online for ldaxr causing an exception, and found this interesting post on the ARM support forums for the Cortex A72 .

It seems like the MMU may need to be enabled in order to use these instructions. I am going to search to confirm this, but in my case the MMU was not enabled. Looking at the other arm64 implementations, most of them do enable the MMU unconditionally.

I enabled the MMU initialization, re-compiled and rebooted. Here is the new output with CONFIG_MMU_DEBUG enabled:

MESS:00:00:06.174977:0:----arm64_mmu_init: xlat tables:
arm64_mmu_init: base table(L1): 0x4cb000, 64 entries
arm64_mmu_init: 0: 0x4c4000
arm64_mmu_init: 1: 0x4c5000
arm64_mmu_init: 2: 0x4c6000
arm64_mmu_init: 3: 0x4c7000
arm64_mmu_init: 4: 0x4c8000
arm64_mmu_init: 5: 0x4c9000
arm64_mmu_init: 6: 0x4ca000
init_xlat_tables: mmap: virt 4227858432x phys 4227858432x size 67108864x
set_pte_table_desc:   
set_pte_table_desc: 0x4cb018: [Table] 0x4c4000
init_xlat_tables: mmap: virt 0x phys 0x size 1006632960x
set_pte_table_desc:   
set_pte_table_desc: 0x4cb000: [Table] 0x4c5000
init_xlat_tables: mmap: virt 4718592x phys 4718592x size 192512x
split_pte_block_desc: Splitting existing PTE 0x4c5010(L2)
set_pte_table_desc:     
set_pte_table_desc: 0x4c5010: [Table] 0x4c6000
init_xlat_tables: mmap: virt 4911104x phys 4911104x size 81920x
init_xlat_tables: mmap: virt 4993024x phys 4993024x size 65536x
enable_mmu_el1: MMU enabled with dcache
nx_start: Entry
up_allocate_heap: heap_start=0x0x4d3000, heap_size=0x47b2d000
mm_initialize: Heap: name=Umem, start=0x4d3000 size=1202900992
mm_addregion: [Umem] Region 1: base=0x4d32a8 size=1202900304
arm64_fatal_error: reason = 0
arm64_fatal_error: CurrentEL: MODE_EL1
arm64_fatal_error: ESR_ELn: 0x96000045
arm64_fatal_error: FAR_ELn: 0x47fffff8
arm64_fatal_error: ELR_ELn: 0x489d28
print_ec_cause: Data Abort taken without a change in Exception level
_assert: Current Version: NuttX  12.6.0-RC0 96be557b64-dirty Aug  5 2024 14:56:42 arm64
_assert: Assertion failed panic: at file: common/arm64_fatal.c:375 task: Idle_Task process: Kernel 0x481a34
up_dump_register: stack = 0x4d2e10
up_dump_register: x0:   0x13                x1:   0x4d32c0
up_dump_register: x2:   0xfe215040          x3:   0xfe215040
up_dump_register: x4:   0x0                 x5:   0x0
up_dump_register: x6:   0x1                 x7:   0xdba53f65cc808a8
up_dump_register: x8:   0xc4276feb17c016ba  x9:   0xecbcfeb328124450
up_dump_register: x10:  0xb7989dd7d34a1280  x11:  0x5ebf5f572386fdee
up_dump_register: x12:  0x6f7c07d067f6e38   x13:  0x3f7b5adaf798b4d5
up_dump_register: x14:  0xf3dffbe2e4cff736  x15:  0xd76b1c050c964ea0
up_dump_register: x16:  0x6d6fa9cfeeb0eff8  x17:  0x1a051d808a830286
up_dump_register: x18:  0x3f7b5adaf798b4bf  x19:  0x4d3000
up_dump_register: x20:  0x47fffff0          x21:  0x4d32d0
up_dump_register: x22:  0x47b2cd30          x23:  0x4d32a8
up_dump_register: x24:  0x4d32b0            x25:  0x4806f4
up_dump_register: x26:  0x2f56f66b2df71556  x27:  0x74ee6bbfb5d438f4
up_dump_register: x28:  0x7ef57ab47b85f74f  x29:  0x9a7fa1cb06923003
up_dump_register: x30:  0x489cf8          
up_dump_register: 
up_dump_register: STATUS Registers:
up_dump_register: SPSR:      0x600002c5        
up_dump_register: ELR:       0x489d28          
up_dump_register: SP_EL0:    0x4d3000          
up_dump_register: SP_ELX:    0x4d2f40          
up_dump_register: TPIDR_EL0: 0x0               
up_dump_register: TPIDR_EL1: 0x0               
up_dump_register: EXE_DEPTH: 0x1

Seems like we have a new exception happening to figure out.

Looking at the ARM documentation for exceptions again, the ESR_ELn tells us that this is a Data Abort exception taken without a change in the exception level. It is used for MMU faults generated by data accesses (and some other things, but this is the most likely cause).

Looking at the register x2 and x3 , those are the address of the AUX_MU_IO_REG . It seems we might be getting in trouble for trying to access peripheral IO addresses? The exception happens in the mm_addregion function.

While searching for clues online, I came across this handy utility for decoding the ESR register called aarch64-esr-decoder . Here is the output for the error I've encountered:

ESR 0x00000000000000000000000096000045:
37..63 RES0: 0x0000000 0b000000000000000000000000000
32..36 ISS2: 0x00 0b00000
26..31 EC: 0x25 0b100101
# Data Abort taken without a change in Exception level
25     IL: true
# 32-bit instruction trapped
00..24 ISS: 0x0000045 0b0000000000000000001000101
24     ISV: false
# No valid instruction syndrome
14..23 RES0: 0x000 0b0000000000
13     VNCR: false
11..12 RES0: 0x0 0b00
10     FnV: false
# FAR is valid
09     EA: false
08     CM: false
07     S1PTW: false
06     WnR: true
# Abort caused by writing to memory
00..05 DFSC: 0x05 0b000101
# Translation fault, level 1.

After poking around in the MMU initialization code, I noticed that the pre-processor definitions CONFIG_RAM_START and CONFIG_RAM_SIZE were being used to start up the MMU and allocate the heap. These were set to the wrong value in my nsh defconfig preset from before. I changed these to match the correct values (start at 0 and have 4GB - 64MB of size), and now I get much further in the boot process:

MESS:00:00:06.211786:0:----irq_attach: In irq_attach
irq_attach: before spin_lock_irqsave
spin_lock_irqsave: me: 0
spin_lock_irqsave: before spin_lock
spin_lock: about to enter loop
spin_lock: loop over
spin_lock_irqsave: after spin_lock
irq_attach: after spin_lock_irqsave
irq_attach: before spin_unlock_irqrestore
irq_attach: after spin_unlock_irqrestore
arm64_serialinit: arm64_serialinit not implemented
group_setupidlefiles: ERROR: Failed to open stdin: -38
_assert: Current Version: NuttX  12.6.0-RC0 be262c7ad3-dirty Aug  5 2024 17:16:27 arm64
_assert: Assertion failed : at file: init/nx_start.c:728 task: Idle_Task process: Kernel 0x48162c
up_dump_register: stack = 0x4c0170
up_dump_register: x0:   0x4c0170            x1:   0x0
up_dump_register: x2:   0x0                 x3:   0x0
up_dump_register: x4:   0x0                 x5:   0x0
up_dump_register: x6:   0x3                 x7:   0x0
up_dump_register: x8:   0x4c7468            x9:   0x0
up_dump_register: x10:  0x4c7000            x11:  0x4
up_dump_register: x12:  0x4b8000            x13:  0x4b7000
up_dump_register: x14:  0x1                 x15:  0xfffffff7
up_dump_register: x16:  0x48a654            x17:  0x0
up_dump_register: x18:  0x1                 x19:  0x0
up_dump_register: x20:  0x4ac181            x21:  0x4bf430
up_dump_register: x22:  0x0                 x23:  0x4c0170
up_dump_register: x24:  0x4c0170            x25:  0x2d8
up_dump_register: x26:  0x240               x27:  0x4b7000
up_dump_register: x28:  0xfdc3ed41d6862df6  x29:  0xbf8e8f7280a0100
up_dump_register: x30:  0x481bf8          
up_dump_register: 
up_dump_register: STATUS Registers:
up_dump_register: SPSR:      0x20000245        
up_dump_register: ELR:       0x480230          
up_dump_register: SP_EL0:    0x4c7000          
up_dump_register: SP_ELX:    0x4c6e90          
up_dump_register: TPIDR_EL0: 0x4bf430          
up_dump_register: TPIDR_EL1: 0x4bf430          
up_dump_register: EXE_DEPTH: 0x0               
dump_tasks:    PID GROUP PRI POLICY   TYPE    NPX STATE   EVENT      SIGMASK          STACKBASE  STACKSIZE      USED   FILLED    COMMAND
dump_tasks:   ----   --- --- -------- ------- --- ------- ---------- ---------------- 0x4c4000      4096       144     3.5%    irq
dump_task:       0     0   0 FIFO     Kthread - Running            0000000000000000 0x4c5010      8176      1200    14.6%    Idle_Task

CTRL-A Z for help | 115200 8N1 | NOR | Minicom 2.9 | VT102 | Offline | ttyUSB0

My logging messages from before with the irq_attach function are now being executed! It seems we get up to the serial driver initialization, which means I'm now past the nitty gritty boot stage (I think & hope) and into writing the serial driver all the way. I think failing to open stdin is a result of the unimplemented serial driver initialization. The error code is -38, which is the ENOSYS error being returned from the bcm2711_miniuart_attach and bcm2711_miniuart_ioctl functions.

I changed the return error value of the attach function to 0 for success just to see what would happen. It looks like we actual get to NSH before crashing:

mm_initialize: Heap: name=Umem, start=0x4cc000 size=4222828544
mm_addregion: [Umem] Region 1: base=0x4cc2a8 size=4222827856
mm_malloc: Allocated 0x4cc2d0, size 144
mm_malloc: Allocated 0x4cc360, size 80
gic_validate_dist_version: GICv2 detected
up_timer_initialize: up_timer_initialize: cp15 timer(s) running at 54.0MHz
arm64_oneshot_initialize: oneshot_initialize
mm_malloc: Allocated 0x4cc3b0, size 48
arm64_oneshot_initialize: cycle_per_tick 54000
uart_register: Registering /dev/console
mm_malloc: Allocated 0x4cc3e0, size 80
mm_malloc: Allocated 0x4cc430, size 80
uart_register: Registering /dev/ttys0
mm_malloc: Allocated 0x4cc480, size 80
mm_malloc: Allocated 0x4cc4d0, size 80
mm_malloc: Allocated 0x4cc520, size 80
mm_malloc: Allocated 0x4cc570, size 32
mm_malloc: Allocated 0x4cc590, size 64
work_start_highpri: Starting high-priority kernel worker thread(s)
mm_malloc: Allocated 0x4cc5d0, size 336
mm_malloc: Allocated 0x4cc720, size 8208
nxtask_activate: hpwork pid=1,TCB=0x4cc5d0
nx_start_application: Starting init thread
task_spawn: name=nsh_main entry=0x48b24c file_actions=0 attr=0x4cbfa0 argv=0x4cbf98
mm_malloc: Allocated 0x4ce730, size 1536
mm_malloc: Allocated 0x4ced30, size 64
mm_malloc: Allocated 0x4ced70, size 32
mm_malloc: Allocated 0x4ced90, size 8208
nxtask_activate: nsh_main pid=2,TCB=0x4ce730
lib_cxx_initialize: _sinit: 0x4ad000 _einit: 0x4ad000
mm_malloc: Allocated 0x4d0da0, size 848
mm_free: Freeing 0x4d0da0
mm_free: Freeing 0x4ced70
mm_free: Freeing 0x4ced30
nxtask_exit: nsh_main pid=2,TCB=0x4ce730
mm_free: Freeing 0x4ced90
mm_free: Freeing 0x4ce730
nx_start: CPU0: Beginning Idle Loop

The idle loop might be because we're waiting for interrupts that never arrive, or because there's just no interrupt handler yet.

After writing the interrupt handling logic, I'm encountering a strange issue where there is a constant TX interrupt despite it being disabled.

mm_malloc: Allocated 0x4d1da0, size 848
bcm2711_miniuart_txint: TXINT disabled, 00000002
bcm2711_miniuart_txint: TXINT enabled, 00000003
bcm2711_miniuart_txint: TXINT disabled, 00000002
bcm2711_miniuart_txint: TXINT enabled, 00000003
bcm2711_miniuart_txint: TXINT disabled, 00000002
bcm2711_miniuart_txint: TXINT enabled, 00000003
bcm2711_miniuart_rxint: RXINT disabled, 00000001
bcm2711_miniuart_rxint: RXINT enabled, 00000003
nx_start: CPU0: Beginning Idle Loop

NuttShell (NSH) NuttX-12.6.0-RC0
nsh> bcm2711_miniuart_txint: TXINT disabled, 00000002
bcm2711_miniuart_txint: TXINT disabled, 00000002
bcm2711_miniuart_txint: TXINT disabled, 00000002
bcm2711_miniuart_txint: TXINT disabled, 00000002
bcm2711_miniuart_txint: TXINT disabled, 00000002
bcm2711_miniuart_txint: TXINT disabled, 00000002

The transmit interrupt is being disabled properly, but the interrupt handler is still constantly being called and the interrupt ID has a value indicating that the transmit FIFO is empty (which technically is true, but the interrupt shouldn't be triggered with nothing happening). I'm not really sure how to clear the interrupt pending bit for the Mini UART since it's not mentioned in the manual and the bit is marked as read-only.

I tried holding down a key during the boot process and the key is successfully read and re-displayed. This means the receive logic does work. There is just something broken that is causing the interrupts to constantly be triggered.

Interestingly, putting a logging statement in the receive part of the interrupt handler allows me to actually interact with the NuttX shell. Without the statement my keyboard inputs appear to do nothing.

If I then run ostest , the input doesn't appear until I generate some kind of input (like holding 'enter').

Day 11: 2024-08-06

After much annoyance, I had discovered some online forum posts about the BCM2835 having its TX and RX interrupts swapped in the data sheet. I dismissed this as not being applicable to the BCM2711 processor because surely they would have fixed that error by now. I eventually got desperate and decided to give swapping the interrupts a shot. As it turns out, that worked and now I can interact with NSH properly.

And with that, I finally have NuttX booted with a proper shell on the Pi 4B! It's time to run the ostest and see how it fares.

It seems that the ostest passed which is great. Now that I have a working shell to experiment with, it's time to implement some more drivers. The first thing I noticed was the /proc file system was missing even though the configuration says to enable it. I think that will be useful for debugging, so I might want to implement that next. It's tempting to implement SMP, but I think I'll let that wait until I have the system a little more fleshed out.

Turns out that adding procfs was easier than I thought; I just needed to conditionally mount it in the board_app_initialize function for the Raspberry Pi 4B.

In my opinion, the next two obvious choices for drivers to implement would be the PL011 UARTs or the I2C functionality. This is because I have all the equipment necessary to test those features, and I know a fair bit about those two protocols. Both of these require GPIO control, and I don't want to have to continue hacking out putreg and modreg calls to change the GPIO settings. If I implement the GPIO framework first, I can write the remaining drivers more easily with some helper functions for switching pin function, configuring pull-ups/pull-downs, etc. So logically, I think the next thing to implement would be GPIO.

Day 12: 2024-08-11

I have implemented the basic functionality for controlling GPIO pins. This includes a bunch of utilities in arch/arm64/src/bcm2711/bcm2711_gpio.c and the high level GPIO driver in boards/arm64/bcm2711/raspberrypi-4b/src/rpi4b_gpio.c . Right now the interrupt handling for the pins is entirely untested.

While implementing the high level GPIO driver (using the RP2040's as a guideline), I noticed some shortcomings.

First, you must specify single pins as being an input, output or interrupt pin. You cannot have one pin which can be configured to do all three depending on what the user wants. So I have to pick a reasonable set of GPIO pins with some as inputs, some as outputs, etc.

Second, input GPIOs can typically have pull-up, pull-down or no resistors. The GPIO device interface does not allow this to be configured, so I had to pick a default of pull-up resistors enabled.

Finally, the interrupt pins do not allow you to select what kind of interrupt trigger each one has. The Raspberry Pi 4B has 6 different interrupt event triggers (rising edge, falling edge, high level, low level, async rising/falling edge). It is only possible to choose one type of event as a default when devices are registered, or a combination of these events, but all pins will have a pre-configured event which cannot be changed.

As far as the implementation goes, the GPIO utilities for configuring resistors and input/output seem to work just fine, and same with the high level driver. I'm able to turn off and on outputs and read the correct values on inputs. I will tackle interrupts later because I may discover something that allows more customization for the user.

As of right now, even the low-level GPIO utilities for interrupts could be optimized. The Raspberry Pi 4B has four GPIO IRQs (GPIO 0-4, or IRQs 49-53). I would imagine that some subset of GPIO pins trigger the GPIO 0 IRQ, another subset GPIO 1, etc. Unfortunately I can't confirm this because there is no indication in the BCM2711 data sheet. For now I've attached the same interrupt handler to all four IRQs, and it will perform a search of all 57 pins to see which have an event detected and then call their respective handlers.

At this point I think I want to move on to I2C drivers so that I can get more of this board's functionality working.

I've added most of the implementation for the I2C driver at this point, using the RP2040 implementation as a reference. I cannot figure out how to properly read/send using the hardware interface though, as I am continually getting ACK errors when scanning the I2C bus.

Right now, I am able to successfully attach the I2C IRQ handler, initialize one I2C device (I2C1 for now) and register the I2C character driver for any configured interface. This is a pretty good start, although I'm still missing the core functionality of sending and receiving data. I'm unsure how to implement the NOSTART option (I saw the RP2040 implementation did not have that feature either) and how to best implement NOSTOP .

Day 12: 2024-08-13

Since I haven't made much headway debugging the I2C driver I was working on, I decided to take a break by implementing something easier: the other UART interfaces on the Raspberry Pi 4B.

The Pi has 6 UART interfaces in total: 1 Mini UART (which I wrote the driver for to get console output) and 5 PL011 UART interfaces. I know from looking around the NuttX source tree that there is already a PL011 UART driver implementation because it's a pretty standard UART interface for ARM chips. After looking at the source, however, I noticed that the driver is meant as a standalone driver with its own config file; it's not a library to be used to create PL011 drivers like the NuttX UART device driver code is (the code I used to implement and register the Mini UART driver).

I grepped through the kernel source code and noticed that there are only a couple of implementations that actually use the PL011 functions. QEMU and Goldfish used it for both arm and arm64. This makes sense since they are ARM virtualizations, so they would simulate standard ARM UART interfaces.

The current PL011 implementation does not work for the Raspberry Pi implementation. It has too few configurable PL011 interfaces in its code and Kconfig file, and it numbers them 0 through 3. The Pi UARTs are numbered 0 through 5, and UART 1 is not a PL011 UART (so this messes with the existing numbering scheme). Finally, each UART base address must be configured through the Kconfig options. I already have the Raspberry Pi UART base addresses get configured based on the Pi's memory setup configuration (low peripheral, legacy mode, etc.). I don't want to code this logic again in Kconfig when it's already implemented in pre-processor logic.

My idea was to turn the PL011 code into a library that could be included by board-specific PL011 driver implementations. This would allow a board-specific driver implementation to define as many or as few PL011 UART devices as the board supports, and number them however they want. The driver code would have full control over setting the device fields like baud rate, base address, etc, using custom logic. Once all the devices are statically initialized, they could be registered using pl011_uart_register() which is a wrapper around the standard uart_register() and which initializes the device's ops member with the PL011 driver ops which are static and contained within the uart_pl011.c file. This way developers don't need to duplicate that code but can still get more control over their PL011 driver implementation.

This change would be good to make now before too many boards rely on the PL011 driver that's already implemented. Since PL011 is pretty standard, it would be useful for future boards/chip implementations. I raised this issue on the NuttX GitHub repository to get some initial feedback here . It seemed positively received, so I will now be spending some time changing the logic over to a library and modifying the QEMU/Goldfish code to construct their own drivers using the new library. Those changes should be relatively small since the driver isn't very involved for those boards. Most of the work will be changing the Kconfig/defconfig files.

I changed the implementation and re-wrote the arm64 serial driver for QEMU, everything booted just fine. I repeated the same process for the ARMv7 QEMU serial driver and it doesn't work correctly. I am trying to debug why.

Day 13: 2024-08-21

While I wait for my changes to make the PL011 driver more extensible, I went back to developing an I2C driver. I am successfully able to detect the sensors on the I2C bus using the i2c tool provided by the NuttX apps. This is a great first step.

However, if I then try to use i2c dump to read the contents of an EEPROM (which was detected) on the I2C bus, I get a failure and then the devices are no longer visible after executing the i2c dev command until after reboot. This indicates to me that there's probably something wrong with my write operations.

Day 14: 2024-08-23

After playing around some more, I've actually been able to dump the EEPROM contents over I2C using the driver I implemented. I'm not really sure why, but when I provide the -r register option, the transfer fails and then I'm no longer able to dump anything. This means my I2C driver works, but there must be some kind of internal state problem that isn't being reset enough to continue writing/reading.

I noticed that the send operations are finishing prematurely because the TXW (TX FIFO needs writing) interrupts were prematurely posting the wait for interrupt semaphore, when instead I needed to wait for the DONE interrupt. After fixing this issue, I still encounter the problem were all of the receive operations after a send operation do nothing but return an error. I'll have to debug further with more logging.

Conclusion

The implementation for the Raspberry Pi 4B does not end here; there are significantly more peripherals that need code written. However, getting NSH to show up on boot and the system passing OSTest was enough to successfully merge my implementation into the NuttX kernel. You can see the pull request here .

I may continue to make blog posts as I implement some of the remaining features. Hopefully now that the implementation is in the kernel, other people with Raspberry Pis available to them may start working on adding more support for the Pi 4B. I know this blog post was a little chaotic with me hopping between tasks any time I hit a roadblock, but hopefully it's still of some use and gave you an indication of how to troubleshoot the porting process. I anticipate future posts will be a little less chaotic since I can now pick a single peripheral to work on and hack away at it.

Matteo Golin