Wednesday, November 17, 2004

Object Orientation

Object is an encapsulation of a state, and behavior. However, because most of the OOP are class based, the behavior depends on the class and thus type of the object. In other words, if an object is transmitted over the wire as sequence of bits along with its type information then at the receiving end and in-memory object can be constructed with its virtual table entries pointing to the type-implied method implementations.

Polymorphism is more useful for developing and using components. For example, a simple mechanism of well defined interfaces and a shared library (or DLL) can provide a good scheme for polymorphism. And example is libc. One can choose different implementation of libc based on price, performance, licensing agreements, memory usage etc. Another example is ODBC. Various ODBC drivers can be provided to connect to the same database or different database servers, local versus remote and so on. Important thing is that interface remain same while implementation is chosen based on non-functional requirements.

Polymorphism is also good for use in collections. It provides a way for defining a generic collection mechanism holding objects of different types. For example, array of integers and array of strings work the same way in 'C' language.

For navigating data, OODBMS is a very bad choice as it means going back to the bad old days of navigational databases. Relational Databases were developed precisely to solve the issues with navigational databases. Use of OOP and RDBMS combination mean expensive and conversion when storing/retrieving data into RDBMS. Thus, for in-memory storage of data we need a simplified table driven DBMS that is light-weight and non-transaction oriented. There is no point is hard-wiring entities and their relationships in a tight application specific navigational structure. This makes change and maintainence very difficult. Also, it forces a static classification on data/behavior in a particular way instead of responding to dynamic requirements of applications.

Also, as in SNMP we can achieve specialization using 'extends' cause for table. That way a new table to hold extended attributes for each entry in the primary table or to hold any specialized entries in the primary table can be created. Also, using stored procedures we can achieve behavior encapsulation. In fact, we can also use local "stored procedures". For example, let's say we have 'dateOfBirth' field in Employee table and we want to calculate the age. The procedure getAge can be tied to Employee table as a local stored procedure. Similarly, if the 'dateOfBirth' field is changed to handle 'date' in a different calendar we can simply change the procedure or add a new stored procedure for calculating the age.
Each base table (which does not extend any other table) can be implemented by creating a virtual table containing only the primary keys and the name of the table to which each row belongs to. For example, lets say we have Customer table and to handle Domestic and Foriegn customers we have DomCust and ForCust extended tables. To implement this relationship we create a fourth table (not directly visible to queries and applications) CustBase contains only foriegn key (custID) and name of table to which the row belongs to. Thus, while creating a new row if table used was Customer then CustBase will contains custID and instaTable field as Customer. So, if a DomCust row was created we can always navigate from custID to the corresponding row of DomCust and access extra or specialized values of that table. Also, each extended (or derived table) contains the names of tables that it derives from. Thus, we can also go to the corresponding row of the Customer table from each row of the derived tables.

Also, one-to-many relationships can be said to be a static join operation. Thus, one-to-many relationship between Customers and Orders can be easily modelled by putting a foriegn key CustomerID in Order table and a static join operation using field called orders in Customer table. Whenever 'orders' field is accessed, results of the join operation are returned. Now, whenever a new relationship is added between a customer and a order we can update both the tables and update the results of the virtual join operation. This increases the efficiency as join does not have to be performed on demand. The results of join operation can be kept ready.



Thus conclusion is that we should use OO/Polymorphism for components and containers while using RDBMS/in-memory relational table based scheme for managing relationship between entities.

Monday, July 26, 2004

Encoded Assumptions

While coding programmers invariably have to make lots of assumptions. Because of change in business logic or external circumstances those assumptions may not remain valid. Since the same assumption is coded at multiple places in a Software it becomes a maintainence nightmare to locate and correct the code.  

For example, a code that checks whether a given phone number is local or not may simply compare its area code with the local area code. However, this assumption may not be true if local phone company starts treating calls within the same local area code as local toll calls.

One solution is to code assumptions explicitly. For example, in the above case, the assumption was All_Calls_Within_Same_Area_Code_Are_Local. Give each assumption a name, and define a C macro by the same name. The macro should currently expand to TRUE. Assert the macro wherever the assumption is made, like this: assert(All_Calls_Within_Same_Area_Code_Are_Local). Since, it is currently defined as TRUE, assert will expand to nothing.

It becomes very easy to find out all the places in the code where a particular assumption is made by simply searching for the use of the corresponding macro. To find out places where the assumptions are being made that are false dynamically, simply redefine the corresponding macros to FALSE. The modified code will start asserting everywhere those assumptions were made.

Thus, this method allows easy documentation, easy detection and testing for all the critical assumptions that a software makes. This aids in maintainence greatly.

Friday, July 23, 2004

Not NULL Terminating strings in C

In 'C' programming language, a string is nothing but a NULL terminated character array (char * or char []). Functions dealing with strings (including standard library functions, such as strlen) iterate over the characters till they reach the NULL character. Absence of terminating NULL character will cause them to process beyond the end of the string and even the array itself and cause segmentation fault.

Many of the standard C library functions, such as strdup, strcat, return NULL terminated strings. But some, such as strncpy, sometimes do not.

Thursday, July 22, 2004

Common Programming mistakes are simple

Most common programming mistakes are surprisingly simple. Following posts will list some of them.