Documentation  |   Table of Contents   |  < Previous   |  Next >   |  Index

14    PalmOS Garnet ARM Programming

Palm OS® Programmer's Companion

Volume I


This chapter describes how to write portions of your application using ARM-native code. Most PalmOS® Garnet applications do not need native ARM code and will not benefit from using native ARM code. If you have an application that performs adequately on PalmOS Garnet, then you do not need to read this chapter.

This chapter is intended for developers who have applications that require a performance improvement in order to perform adequately on PalmOS Garnet. It is not intended for all Palm OS application developers.

Understanding PalmOS Garnet and ARM ^TOP^

PalmOS Garnet is a complete port of the Palm operating system from a 68K processor to an ARM processor.

The term 68K processor refers to the family of Motorola 68000 processors.
The term ARM processor refers to the family of Advanced RISC Machine processors. An ARM processor is a type of 4-byte RISC processor, and is available from many sources.

Starting in PalmOS Garnet, the entire operating system runs natively on the ARM processor. When an application calls a Palm OS API function, the API function runs at the full speed of the ARM processor. Because most applications spend the bulk of their time executing operating system functions, they get the performance benefit of the ARM processor with no effort.

Palm Application Compatibility Environment ^TOP^

Palm OS Garnet includes the Palm Application Compatibility Environment (PACE), within which all Palm OS applications run. PACE emulates the 68K-family processor traditionally used in Palm Powered handhelds, enabling both new and existing applications to run on Palm Powered handhelds that employ an ARM processor.

Because Palm OS functions are native and not emulated, PACE provides excellent performance for most 68K applications. As a result, most 68K applications will not benefit significantly from being rewritten for ARM.

Figure 14.1 shows how PACE provides a compatibility layer between 68K applications and PalmOS Garnet running natively on ARM.

Figure 14.1  Palm Application Compatibility Environment

Because an application's 68K code is emulated in PACE, certain algorithms—such as those performing data encryption or compression—may benefit from being rewritten in native ARM instructions.

Using ARM-Native Subroutines ^TOP^

If you have a processor-intensive 68K algorithm, writing an ARM-native subroutine may improve the performance of your 68K application on PalmOS Garnet.

An ARM-native subroutine (also called a PACE Native Object or PNO) is not a self-contained application; it is a native ARM function that the 68K application can call as a subroutine. The ARM-native subroutine allows the application to use the full processing power of the ARM-based hardware.

Figure 14.2 shows how your 68K application calls your ARM-native subroutine.

Figure 14.2  Using PceNativeCall to Call an ARM-Native Subroutine

Calling ARM-Native Subroutines ^TOP^

To call an ARM-native subroutine from your 68K application, you use the new function PceNativeCall.

The PceNativeCall function takes two arguments:

  1. A pointer to the ARM-native subroutine, generally but not necessarily stored in a code resource.

    If the ARM-native subroutine is stored in a resource, the 68K application can simply obtain the resource using DmGetResource and lock it with MemHandleLock to get a pointer to the ARM-native subroutine.

  2. A pointer to a data block, allowing the 68K application to exchange data with the ARM-native subroutine.

Before calling PceNativeCall you must test the processor type:

  • If the processor is ARM, the 68K application should call the ARM function.
  • If the processor is an x86 family processor (that is, the application is running in Palm OS Simulator on Windows), the 68K application should call the Windows DLL that represents the ARM function.
  • Otherwise, the 68K application should either call a 68K version of the function or fail gracefully if the functionality cannot be reasonably incorporated into a 68K application. Note that in this instance your application may be running on a version of Palm OS earlier than Garnet.

Listing 14.1 illustrates this process. For simplicity, in this example no parameters are passed to the ARM function.

Listing 14.1  Calling an ARM function

static UInt32 PceNativeResourceCall(DmResType resType, DmResID resID,
char*DLLEntryPointP, void *userCPB) { 
UInt32    processorType; 
MemHandle armH; 
MemPtr    armP; 
UInt32    result; 
// get the processor type 
FtrGet(sysFileCSystem, sysFtrNumProcessorID, &processorType); 
if (sysFtrNumProcessorIsARM(processorType)){ 
// running on ARM; call the actual ARM resource 
armH = DmGetResource(resType, resID); 
armP = MemHandleLock(armH); 
result = PceNativeCall(armP, userCPB); 
} else if (processorType == sysFtrNumProcessorx86) { 
// running on Simulator; call the DLL 
result = PceNativeCall((NativeFuncType *)DLLEntryPointP, userCPB); 
} else { 
// some other processor; fail gracefully 
ErrNonFatalDisplay("Unsupported processor type"); 
result = -1; 
return result; 

The #defines for the various processor types (all of which are named sysFtrNumProcessor...) can be found in SystemMgr.h.

Writing ARM-Native Subroutines ^TOP^

The ARM-native subroutine needs to include the proper prototype and the ARM code you want it to contain. A sample ARM-native subroutine, called armlet-simple.c, is included in the SDK to show the minimum amount of code required.

The ARM-native subroutine can also call Palm OS API functions, and can call back into 68K code. See "Calling Palm OS Functions From ARM Code." The file armlet-oscall.c, also included in the SDK, provides an example of calling a Palm OS function.

The following sections explain the steps for writing an ARM-native subroutine:

  1. "Isolate the Performance-Critical Area in Your 68K Application"
  2. "Convert the ARM-Native Subroutine to Take One Argument"
  3. "Handle 68K and ARM Technical Differences"
  4. "Test the ARM-Native Subroutine"
  5. "Build the ARM-Native Subroutine"
  6. "Embed the ARM Code in a 68K Application"

Isolate the Performance-Critical Area in Your 68K Application ^TOP^

To decide which algorithms will benefit from being written as an ARM-native subroutine, you should start by doing a performance analysis of your 68K application. If your 68K application runs "fast enough" when you do your performance testing, then there is no reason to write an ARM-native subroutine.

  • Test your 68K application using Palm OS Simulator. Palm OS Simulator is the easiest and best way to test your application for PalmOS Garnet compatibility. Running your application on Palm OS Simulator will show you whether any algorithms behave differently on PalmOS Garnet.

    Any algorithms that do extensive calculations, such as data encryption or compression, may run slower on Palm OS Simulator. If you notice a performance difference, then you have found a candidate algorithm that might benefit from being rewritten as an ARM-native subroutine.

  • Test your 68K application using the profiling version of Palm OS Emulator. The profiling version of Palm OS Emulator monitors your application's execution, generating statistics that show which algorithms take the most time.

    Emulator can help you pinpoint slow algorithms, but performance on Emulator will not indicate performance on PalmOS Garnet. Emulator does not include the PalmOS Garnet PACE component, but Simulator does.

Functions that make many calls to the operating system typically aren't good candidates to make ARM-native because it is difficult to debug both the "68K side" and the "ARM side" of an application, and because of the overhead involved in byte-swapping the parameters.

Convert the ARM-Native Subroutine to Take One Argument ^TOP^

The function PceNativeCall, which you use to call an ARM-native subroutine from your 68K application, takes only two arguments: a pointer to the ARM-native subroutine and a pointer to a data block. As a result, it will be easier to write your subroutine if it takes a single input argument.

ARM functions that are to be called from the 68K side—that is, functions that serve as entry points into your ARM code—must use the following function prototype (defined in PceNativeCall.h):

typedef unsigned long NativeFuncType (constvoid*emulStateP,
void*userData68KP, Call68KFuncType*call68KFuncP) 

The function parameters are defined as follows:

A pointer to the (opaque) PACE emulation state. This pointer is used when calling Palm OS functions and application callbacks; see "Calling Palm OS Functions From ARM Code" for more information.
The userDataP argument that was passed in to PceNativeCall, byte-swapped so it can be dereferenced directly by the ARM code. See "Handle 68K and ARM Technical Differences" for tips on accessing the data block indicated by this pointer.
A hook to call back into the PACE emulated environment from ARM code. It is used for both OS function calls and application callbacks. See "Calling a Function Using a Function Pointer" for more information.

The name of your ARM entry-point function must be PNOMain.

IMPORTANT: If your ARM entry-point function isn't named PNOMain, the compiler will generate a message such as "***[ARMC1000.bin.elf]Error 1". If you use Palm OS Developer Suite and create a "Managed Make 68K PNO C/C++ Project" with the "Simple" template, do not change the name of the entry point function.

ARM entry-point functions should return a value that is meaningful to the 68K side, since that value is passed back to the calling code. If no value is meaningful, return 0. Both register A0 and D0 are set to this return value, making it meaningful to code that is expecting either a pointer or an immediate result.

Handle 68K and ARM Technical Differences ^TOP^

When implementing the ARM-native subroutine, you should be aware of how the 68K processor and the ARM processor are different. The following sections describe some technical considerations that you need to handle in your ARM-native subroutine:

Big Endian and Little Endian

The 68K processor uses big-endian integers; the ARM processor uses little endian. Big and little refer to the order in which the bytes are stored in a multi-byte integer. In big-endian integers, the most significant byte is the first; in little-endian integers, the most significant byte is the last byte.

This means 2- and 4-byte integers are stored in reverse byte order, and thus must be byte-swapped when exchanged between the ARM and 68K processors. Endianness is only relevant in the context of 2- and 4-byte integers (including pointers). Other types of data, such as strings, don't need to be byte-swapped.

PACE automatically byte-swaps the PceNativeCall function's userData68KP argument, so it can be de-referenced immediately from within the ARM function with no modification. PACE also automatically byte-swaps the 4-byte return value that is passed back to the calling function.

PACE doesn't byte-swap any of the data pointed to by the userData68KP argument because PACE doesn't know anything about what kind of data is being passed. (Remember, only 2- and
4-byte integers need to be byte-swapped, and the userData68KP argument is simply a pointer to arbitrary data.)

Byte-Swapping Macros for Use in ARM-Native Subroutines

Endianutils.h contains convenience macros to byte-swap 2- and 4-byte integers in your ARM-native subroutine:

Byte-swaps a 2-byte (16-bit) integer value.
Byte-swaps a 4-byte (32-bit) integer value.

ARM-native subroutines are responsible for byte-swapping integers in the data block as necessary.

Integer Alignment

ARM processors require that 4-byte integers be aligned on a 4-byte boundary. 68K processors require only even address (2-byte) alignment.

To handle integer alignment differences, you have the two following options:

  1. Allocate data using MemPtrNew, carefully declaring data structures with appropriate integer alignment.

    MemPtrNew always returns a 4-byte aligned address, so you can be sure that the data starts on a 4-byte boundary. However, you must also be careful that the data itself is properly aligned. When aligning data objects, recognize that 68K and ARM processors align 4-byte objects differently, as shown in Table 14.1.

Table 14.1  Default Data Object Alignment

Data Object Size

68K Processor Alignment

ARM Processor


1 byte

Any address

Any address

2 bytes

2-byte alignment (even address)

2-byte alignment (even address)

4 bytes

2-byte alignment (even address)

4-byte alignment (address is a multiple of 4)

If a 4-byte data object is not properly aligned, the ARM processor may attempt to access the object using an address that is a multiple of 4, resulting in a loss of data.

  1. Copy 4-byte integers into local variables before using them.

    Endianutils.h contains convenience macros that you can use to read and write 4-byte values to and from local variables while simultaneously byte-swapping them:

    • Read68KUnaligned32(address)

      Reads a value from a specified address.

    • Write68KUnaligned32(address, value)

      Writes a specified value to a specified address.

Structure Packing

Different compilers handle the automatic padding of structures differently. Some compilers automatically add padding bytes to align structures on a given byte boundary depending on the compiler options specified. Use care when declaring structures, or make a local copy of any structure that you use.

Test the ARM-Native Subroutine ^TOP^

The ARM-native subroutine will run on PalmOS Garnet on ARM hardware. However, Palm OS Simulator does not run ARM-native code. Instead, Simulator provides an implementation of PalmOS Garnet running on Microsoft Windows. As a result, to test your ARM-native subroutine on Simulator, you need to build the subroutine as a Windows DLL. Simulator's implementation of PACE is built to recognize a subroutine call as a call into a DLL.

The SDK includes a sample application that builds a DLL with one entry point which has the same function as the sample ARM-native subroutine also included in the SDK.

When calling a DLL, the first argument passed to PceNativeCall is a pointer to the name of a DLL and the name of the entry point within that DLL that is to be executed, separated by a null character and terminated with a null character (for example, a pointer to the character string "test.dll\0EntryPoint").

By default, Simulator will look for the DLL in the directory where PalmSim.exe is running. If you want to place the DLL in a different location, you should specify the full path of the ARM-native subroutine DLL name (for example, "c:\\projects\\armletdll\\test.dll\0EntryPoint").

Your 68K application should check the processor type:

  • If the processor is ARM, the 68K application should call the ARM-native subroutine.
  • If the processor is Windows, the 68K application should call the Windows DLL.

Otherwise, the 68K application should call the 68K version of the subroutine, which assumes the application is running on an earlier version of Palm OS.

Build the ARM-Native Subroutine ^TOP^

You will need to use an ARM compiler to build the ARM-native subroutine. PalmSource, Inc. does not provide or support an ARM compiler or development environment, but several are available, such as ARM Developer Suite (ADS) and gcc.

The compiled object file for the ARM-native subroutine must be linked with the 68K application as a raw binary file. For calculating address offsets, it is generally easiest to put the entry point first in the raw binary file.

Embed the ARM Code in a 68K Application ^TOP^

Regardless of the mechanism that you use to generate the ARM binary, there are a couple of different ways to get it into a .prc file:

  • Using Palm OS Developer Suite, create a 68K PNO C/C++ project. A Managed Make project will automatically incorporate the ARM binary into your application. A Standard Make project can be set up to do the same.
  • Use CodeWarrior or a tool such as PilRC to place the raw ARM binary into a resource file. Then simply include this resource file in your 68K project.
  • Copy the resulting binary data into a different resource file as hex data. Use a hex dump utility to process the ARM binary file into a resource.
  • Include the ARM code directly in your application's source as integer arrays. Note, however, that the arrays are interpreted as big-endian by the 68K compiler, and as little-endian by the ARM processor. Thus you must byte swap the integer values to get appropriate opcodes. Also, the array itself must be 4-byte aligned in your source, so insure that your compiler settings are appropriate to produce this.

Calling Palm OS Functions From ARM Code ^TOP^

In Palm OS Garnet, native ARM code can call back into the 68K world, either to call Palm OS functions or to call developer-provided callbacks. A single entry point, Call68KFuncType, provides the mechanism for calling both developer-specified 68K functions and OS functions through traps. This function is declared in PceNativeCall.h as follows:

typedef unsigned long Call68KFuncType(constvoid*emulStateP,
unsignedlongtrapOrFunction, constvoid*argsOnStackP,

The function parameters are defined as follows:

Pointer to the PACE emulation state. Supply the pointer that was passed to your ARM function by PACE.
The trap number AND'ed with kPceNativeTrapNoMask, or a pointer to the function to call. Any value less than kPceNativeTrapNoMask is treated as a trap number.
Native (little-endian) pointer to a block of memory to be copied to the 68K stack prior to the function call. This memory normally contains the arguments for the 68K function being called. Call68KFuncType pops these values from the 68K stack before returning.
The number of bytes, in little-endian format, from argsOnStackP that are to be copied to the 68K emulator stack. If the function or trap returns its result in 68K register A0 (as when the result is a pointer type), you must OR the byte count with kPceNativeWantA0.

The return value from the 68K function (passed either in the 68K register D0 or A0) is returned as the result of this function, based on argsSizeAndwantA0. It is returned in native (little-endian) form.

Because of the amount of effort involved in getting parameters byte-swapped and properly aligned, if your ARM code routinely needs to call a series of operating system functions you may find it easier to write a small 68K callback function that calls the operating system functions, and then call this 68K function instead.

Calling a Trap ^TOP^

Listing 14.2 shows how to call an operating system function from ARM native code using the function's trap number. This sample calls MemPtrNew to allocate a block of 10 bytes, initializes that block, and returns it as the result of the ARM function.

Listing 14.2  Calling a Palm OS function from ARM code

/* This armlet makes a call (through PACE) to MemPtrNew to allocate a buffer.  
*The arguments to the OS function (here just size) must be on the stack, and 
*must be in big-endian format. 
#include "PceNativeCall.h" 
#include "endianutils.h" // byte-swapping macros 
// from CoreTraps.h 
#define sysTrapMemPtrNew 0xA013 // we need this in order to call into MemPtrNew 
// prototype for our OS call convenience function 
void *PalmOS_MemPtrNew (const void *emulStateP, Call68KFuncType *call68KFuncP, 
unsigned long sizeLE); 
// This is the main entry point into the armlet.  It's the first function in 
// the file so we can calculate its address easily. 
unsigned long NativeFunction (const void *emulStateP, void *userData68KP, 
Call68KFuncType *call68KFuncP) { 
unsigned char *bufferP; 
int i; 
// allocate 10 bytes of memory using a convenience function 
bufferP = (unsigned char*)PalmOS_MemPtrNew(emulStateP, call68KFuncP, 10); 
// Do something with the bytes in the buffer 
for (i = 0; i < 9; i++) bufferP[i] = i+'A'; // write in "ABCDEFGHI" 
bufferP[9] = 0;// terminate the string 
return (unsigned long)bufferP; 
// Convenience function for calling MemPtrNew within ARM code 
void *PalmOS_MemPtrNew(const void *emulStateP, Call68KFuncType *call68KFuncP, 
unsigned long sizeLE) { 
// First, declare the argument(s) that will be passed to the OS call. 
// In this case, we're calling MemPtrNew, so we need a size argument. 
// Because this code is compiled by an ARM compiler (little endian), 
// and MemPtrNew expects its argument to be big endian, swap it. 
unsigned long sizeBE = ByteSwap32(sizeLE); 
// Call the trap. Note that because MemPtrNew returns a pointer, the byte 
// count (the last parameter) must be "OR"d" with kPceNativeWantA0. 
return ((void *)((call68KFuncP)(emulStateP, 
PceNativeTrapNo(sysTrapMemPtrNew), &sizeBE, 4 | kPceNativeWantA0)));  

Calling a Function Using a Function Pointer ^TOP^

The code excerpt in Listing 14.3 shows how to pass a 68K callback function pointer to ARM native code, and Listing 14.4 shows how to call that 68K function from within the ARM code. The ARM code ultimately accomplishes the same result as in the previous example (calling MemPtrNew), but this example lets the 68K-side callback function do the allocation. Note that the pointer to the callback function is passed to the ARM code as data, embedded in a structure.

The following is implemented on the 68K side:

Listing 14.3  Calling PACE application code from ARM code (68K side)

typedef struct MyParamsTag { 
void *myAllocateFunctionP; 
UInt32 anotherValue; 
} MyParamsType; 
// function to allocate 10 bytes 
void *MyAllocateFunction() { 
return MemPtrNew(10); 
// code to call the native function, defined in the next listing 
MyParamsType myParams; 
MemHandle armChunkH; 
void *myNativeFuncP; 
Byte *result; 
armChunkH = DmGetResource('armc', 0); 
myNativeFuncP = MemHandleLock(armChunkH); 
myParams.myAllocateFunctionP  = &MyAllocateFunction; 
result = (Byte *)PceNativeCall(myNativeFuncP, &myParams); 

In the ARM file, the following code accepts the callback function pointer and uses the 68K function to allocate the 10-byte memory block:

Listing 14.4  Calling PACE application code from ARM code (ARM side)

typedef struct MyParamsTag { 
void *myAllocateFunctionP; 
unsigned long anotherValue; 
} MyParamsType; 
unsigned long MyNativeFunc(const void *emulStateP, 
void *userData68KP, Call68KFuncType *call68KFunc) { 
unsigned char *buffer68K; // array of Byte 
unsigned char i; // Byte 
void *my68KFuncP; 
// get the function pointer out of the passed parameter block 
my68KfuncP = ByteSwap4(userData68KP->myAllocateFunctionP); 
// invoke the callback function to allocate 10 bytes 
buffer68K = (void *)((call68KFunc)(emulStateP, my68KFuncP, 
&size, 4 | kPceNativeWantA0)); 
// do something with the bytes in the buffer 
for (i = 10; i > 0; i--) 
buffer68K[i] = i; 
return (unsigned long)buffer68K; 

Overview of Sample Files ^TOP^

The following ARM-native programming samples are included as part of the SDK.

ARM-Native Subroutine Sample Files ^TOP^

Table 14.2 shows the sample files that call ARM code from a 68K application.

Table 14.2  Calling ARM from 68K Sample Files





A trivial ARM-native subroutine showing how to pass a pointer from a 68K application.



An ARM-native subroutine showing you how to call a Palm OS API function, using MemPtrNew as an example.



An ARM-native subroutine showing you how to make sure your data is correctly 4-byte aligned.


Macros for doing endian byte-swapping and 4-byte alignment correction. Used by the armlet-oscall.c and armlet-endianness_and_alignment.c files.


Example showing a user-defined structure. Used by the armlet-endianness_and_alignment.c file.

Windows DLL Sample Files ^TOP^

Table 14.3 table shows the sample files that you can use to build a DLL for testing an ARM subroutine with Palm OS Simulator. For background information, see "Test the ARM-Native Subroutine".

Table 14.3  Windows DLL Sample Files - For Testing with Palm OS Simulator




Visual Studio project file for building a DLL file.


The main DLL source file.


Header file which defines the exports from the DLL file.


C++ source file used to build a precompiler header file and precompiled types file.


Header file used by StdAfx.cpp.