This month's code walk continues last month's presentation of object oriented principles by demonstrating how to apply encapsulation and polymorphism to standard, procedural software.

Level: Application

Encapsulation

Any project can benefit by encapsulation, the hiding of a data structure within a set of functions and macros. In fact, I believe it to be one of the most important techniques to help create good software. Whether you are writing a stand alone program which processes a few files or a piece of a large, integrated, system, always encapsulate.

I generally use one module (a single C source file with an associated header) for each data structure of a program, or for very closely related structures. The header file defines the interface to the structure, including the struct statement and the prototypes of its functions. Of course the structure isn't really hidden, and is completely unprotected other than by your own dicipline. Any access to the actual data members anywhere but the source file for that structure must be strict taboo.

Let's jump directly to an example and then I'll discuss why encapsulation is so important. Consider the following header file.

/* namenode.h */
#define NAME_SIZE 50
typedef struct _NameNode
{
    struct _NameNode *pNext;
    char acName [NAME_SIZE];
    int iValue;
} NameNode;

NameNode *pNewNameNode (char *pcName, int iValue);
SetName (NameNode *pNode, char *pcName);
#define SetValue(p,i) ((p)->iValue=(i))
#define GetName(p) ((p)->acName)
#define GetValue(p) ((p)->iValue)
InsertName (NameNode *ppList, NameNode *pNode);
FreeNames (void);

The first function generates NameNode structures. It might look like this:

NameNode *pNewNameNode (char *pcName, int iValue)
{
    NameNode *pNew = calloc (sizeof (NameNode);
    strcpy (pNew->acValue, pcName);
    pNew->iValue = iValue;
    return pNew;
}

This may work well until you need several hundred of these and you find that you are spending a good deal of time in the calloc function. Another solution is to create a set of arrays of nodes.

/* namenode.c */
#include "namenode.h"
#define NODES_PER_ARRAY 500

typedef struct _NodePage
{
    struct _NodePage *pNextPage;
    struct NameNode NodeArray[NODES_PER_ARRAY];
} NodePage;

static NodePage globalPageList=0;

NameNode *pNewNameNode (char *pcName, int iValue)
{
    static NodePage *pPage=0;
    static int iNextFree = -1;
    NameNode *pNew;
    /* Do we need to allocate a new page? */
    if ( iNextFree = -1 )
        {
        pPage = calloc (1, sizeof (NodePage));
        if ( pPage == 0 ) return 0;
        pPage->pNext = globalPageList;
        gloabalPageList = pPage;
        iNextFree = 0;
        }
    pNew = &pPage->NodeArray[iNextFree++];
    if ( iNextFree == NODES_PER_ARRAY ) iNextFree = -1;
    SetName (pNew, pcName);
    SetValue (pNew, iValue);
    return pNew;
}

This version creates a linked list of pages, each of which holds 500 nodes. When the function is called, the next node in the array is used. During the first call, or whenever all the nodes have been used up, a new page is allocated.

We will also need to check that the name doesn't overflow the node's character array. So, instead of bothering with that here, we will let the function handle that for us. This means that if we need to change the implementation of the name, for example to allow strings of unlimited length, not only does the user code never change, but we have one less thing to worry about within the structure's implementation module itself.

We can now write:

void SetName (NameNode *pNode, char *pcName)
{
    int i;
    for (i=0; *pcName && i < NAME_SIZE-1; i++, pcName++ )
          pNode->acName[i] = *pcName;
    pNode->acName[i]=0;
}

The next three functions are actually macros whose only purpose is to hide the name of the internal data structures. This may seem a trivial and useless step, however it is actually quite important. If you ever need to modify the value in any way, or derive the value instead of storing it, you can very easily replace the #define statement with a new one or with a prototype for an actual function. It's a free global replace in all the source code that accesses your structure.

We can then go on to write the insert and free functions:

InsertName (NameNode *ppList, NameNode *pNode);
{
    while ( *ppList && (*ppList)->iValue < pNode->iValue)
        ppList = &(*ppList)->pNext;
    pNode->pNext = *ppList;
    *ppList = pNode;
}
FreeNames (void)
{
    NodePage *pFree = gloablPageList;
    while ( pFree )
        {
        globalPageList = globalPageList->pNext;
        free (pFree);
        }
}

(We could also add a function to free an individual node. We would add a static global free list pointer which the pNewNameNode function and the FreeNameNode function would share. Freed nodes would then be reused without having to call the underlying free and calloc functions.)

And now for the two big reasons for using encapsulation. First, it reduces errors by locating all access to the structure into a single module. If there is an error, it's going to be here and not in the middle of someone else's code which did not properly apply some check or some necessary modification. If everyone had free access to the internal components, there might be errors all over the place.

The second reason to use encapsulation is that it allows you to modify the implementation and even the behavior of the data structure without having to touch the user code which calls it. You can interchange macros and functions, switch from internal arrays to linked lists, or change the name of the structure's fields. For example, if we replaced the SetName function with one that used malloc or realloc instead of using an array, the name field might change from acName to pcName and the function which frees a node would properly account for this. None of this would affect any of the calling code.

Always, always, encapsulate your data structures. Don't think about it, or consider taking a short cut, just do it.

Polymorphism

Polymorphism, on the other hand, is not an always technique. There's little point in using it other than when it is necessary, especially since you need to circumvent the naturally strong type checking that C likes to enforce.

The basic idea behind polymorphism is that you place function pointers inside the data structure itself and then always use those functions to interact with the data. What stays the same is the interface to those functions. This is really the central concept upon which object oriented programming is built. This technique allows you to create a whole group of data structures which may be very different internally, and even do different things, but which are all interchangeable.

You can create something called a handle structure, which contains a set of pointers to functions and a void pointer, which can be used to point to any type of internal structure.

I'll describe some real life uses after the following example which demonstrates how to build two spaceships which move, fire, and and exercise a special ability.

Again we have a single header file to define the interface.

/* spaceship.h */
typedef struct _SpaceShip
{
void (*FireFcn) (struct _SpaceShip *pShip, iRounds);
void (*MoveFcn) (struct _SpaceShip *pShip);
void (*SpecialFcn) (struct _SpaceShip *pShip);
void *pObj;
} Spaceship;

#define Fire (s,p) ((s)->FireFcn (s,p))
#define Move(s,x,y) ((s)->MoveFcn(s,x,y))
#define Special(s) ((s)->SpecialFcn(s))

The defines make the special polymorphic functions look like a plain old function call. Now we can create the source file to define the spacecraft.

/* spaceship.c */
#include "spaceship.h"
/* The first spaceship is light, fast moving, but has little firepower */
typedef struct _Ship1
{
    int iWarpFactor;
    int iBulletCount;
} Ship1;
/* The second spaceship is heavy, slow moving, and has big guns */
typedef struct _Ship2
{
    char cSpeed;
    int iBombCount;
} Ship2;

/* Functions for ship 1 */
SpaceShip *pNewShip1 (void)
{
    SpaceShip *pNew=calloc (1, sizeof(SpaceShip));
    Ship1 *pNewObj=calloc (1, sizeof(SpaceShip));
    /* Initialize the data */
    pNewObj->iWapFactor=0;
    pnewObj->iBulletCount=1000;
    /* Attach the functions and the object*/
    pSpaceShip->FireFcn=Ship1Fire;
    pSpaceShip->MoveFcn=Ship1Move;
    pSpaceShip->Special=Ship1Special;
    pSpaceShip->pObj=pNewObj;
}
int Ship1Fire (SpaceShip *pShip, int iRounds)
{
    pShip1 = (Ship1 *)pShip->pObj;
    if ( pShip1->iBulletCount > iRounds )
        {
        pShip1->iBulletCount -= iRounds;
        return iRounds * 10;
        }
    return 0;
}
Ship1Move (SpaceShip *pShip)
{
    pShip1 = (Ship1 *)pShip->pObj;
    pShip1->iWarpFactor++;
}
Ship1Special (SpaceShip *pShip)
{
    pShip1 = (Ship1 *)pShip->pObj;
    pShip1->iBullets = 1000;
}

/* Functions for ship 2 */
SpaceShip *pNewShip2(void)
{
    SpaceShip *pNew=calloc (1, sizeof(SpaceShip));
    Ship1 *pNewObj=calloc (1, sizeof(SpaceShip));

    /* Initialize the data */
    pNewObj->cSpeed=0;
    pNewObj->iBombCount=20;

    /* Attach the functions and the object*/
    pSpaceShip->FireFcn=Ship2Fire;
    pSpaceShip->MoveFcn=Ship2Move;
    pSpaceShip->Special=Ship2Special;
    pSpaceShip->pObj=pNewObj;
}

int Ship2Fire (SpaceShip *pShip, int iRounds)
{
    pShip2 = (Ship2 *)pShip->pObj;
    if ( pShip1->iBombCount > iRounds )
        {
        pShip2->iBombCount -= iRounds;
        return iRounds * 100;
        }
    return 0;
}
Ship2Move (SpaceShip *pShip)
{
    pShip2 = (Ship2 *)pShip->pObj;
    pShip2->cSpeed = pShip2->cSpeed == 5 ? 5 : pShip2->cSpeed+1;
}

Ship2Special (SpaceShip *pShip)
{
    pShip1 = (Ship1 *)pShip->pObj;
    pShip1->cSpeed = 10;
}

Although this example is trivial and useless, it demonstrates the following critical points:

The underlying structures can be very different.
The parameters and return value of a particular function must be consistent across all types.
The initialization must be explicitly selected by the caller based on what it needs to do. If today is Wednesday, create a Wednesday object, for example
An explicit cast is required at the beginning of each function to access the internal structure.

When should you used polymorphic handles? One rule of thumb is to minimize them as much as possible. This technique is very powerful, and not always the best approach. However, when you need it, it is indispensable.

It is particularly useful if you need to handle data from various input formats. You might need want have a loader for tiff, bmp, or jpeg pictures, with functions for reading, writing, and setting options like resolution or number of colors.

You might need to process a variety of different records. A student record might have a different internal structure, but support similar functions as a graduate student or even faculty.

You might also want to use polymorphic handles to interact with various databases. You may need to access data locally, or from a SQL server on the network. The initializations and access functions would be very different, but they both provide the same type of information.

A function may benefit by being able to interact directly with a file, or with memory. A stream handle could support seek, read, write, rewind, and flush operations.

I'm sure you'll think of plenty more.

:^D