10 Reasons why OSF DCE sucks: A programmer's viewpoint

by Bruce Ediger
Bruce Ediger's home page
Bruce Ediger's source code offerings


What is OSF DCE?

The acronym "OSF DCE" stands for "Open Software Foundation Distributed Computing Environment".

Essentially, DCE comprises a remote procedure call (RPC) mechanism and subsidiary systems to aid distrbuted computation. Programs that use RPC perform some preliminary setup work. After that setup, predetermined programming lanuage function calls get sent to a server somewhere else on the network. That server performs the function, then returns the results to the client. See this introduction for further explanation of RPC.

RPC is just one of many methods to organize a distributed system. People often choose RPC to take advantage of programmers' current skills. When writing a distributed system, a programmer can structure the source code of the distributed system quite a bit like a non-distributed version of the same system. Network communications hide behind what look like familiar function calls.

Since the framework of RPC exists mainly to ease difficulties programmers might have with distributed systems, I think that a critique of DCE must perform its work on the level that a programmer would see.

All criticism of documentation is based on:

OSF DCE Application Development Guide Revision 1.0, ISBN 0-13-643826-1, apparently the first printing, copyright 1993, OSF.

OSF DCE Application Development Reference Revision 1.0, ISBN 0-13-643834-2, apparently the first printing, copyright 1993, OSF.

Send me comments or ask me questions about this strongly opinionated piece.

  1. Silly naming conventions.

    What is the difference between rpc_binding_vector_p_t and rpc_binding_vector_t? The silly naming conventions lead to embedding significant letters deep inside long strings. There are only small lexical and visual distinctions between constants like rpc_c_authn_dce_secret and functions with names like rpc_server_register_auth_info.

    The word "object" is bandied around in multiple contexts, each of which has its own subtle meanings and distinctions. Pthreads "objects" are not the same as CDS "objects" which are subtly different from XDS "objects"

    The whole system makes arbitrary and subtle distinctions between "protocol sequences", "endpoints", "objects", "mappings" and "bindings"


  2. Proliferation and inconsistent use of typedefs and symbolic constants.

    rpc_s_ok, error_status_ok, uuid_s_ok and sec_rgy_status_ok are all constants that end up being a 32-bit 0, and all indicate a successful function call of one sort or another.

    error_status_t and unsigned32 are typedefs used in different places for the same thing: the storage type of something to hold a status return.


  3. Stinky Documentation.

    The examples don't match the man pages. Section 11.3 of the OSF DCE Application Development Guide contains an example, called "binop". Throughout that example, a variable of type error_status_t is used to get completion status from RPC runtime routines. Throughout the appropriate pages of the OSF DCE Application Development Reference, the definition of the RPC runtime routines shows the use of variable of type unsigned32 to return completion status.

    Occasionally, the index entry doesn't match where the entity being indexed is. The "BLISS compiler, generating reentrant code" index entry says it's on page 6-14, when in fact, the "OSF DCE Application Development Guide" has that info on page 6-13. The index entry in question is on page "Index-5"

    The docs don't make clear that the DCE RPC runtime raises exceptions to indicate error conditions. The documentation buries this important fact in discussion of "attribute configuration language"

    Example code is of poor quality and/or very often just plain wrong. See teldir.c code, section 28.4.5 of Application Development Guide. That example has multi-line macros without backslash continuations, code that that inserts ASCII blanks into strings instead of ASCII NULs, and several places where following the code slavishly will produce compilable, yet wrong, results.

    Apparently the documentation hasn't been proofread. Gems like the following abound.

    OSF DCE Application Development Reference, page 2-18:

    "The rpccp control program accesses RPCCP, the RPC control program. This program provides a set of commands for accessing the operations of the RPC Name Service Interface (NSI operations)."

    OSF DCE Application Development Reference, page 2-28 (about environmental variables used by DCE RPC runtime):

    "RPC_DEBUG Appears for the sole purpose of telling you not to set it or use it."

    OSF DCE Application Development Guide, page 24-18:

    "Since this is a simple RPC NSI entry, there is not very much in the entry that is interesting to read, but this entry is used as an example anyway as a simple demonstration."

    Pitiful sentence structure, but I think I understand: even though it's a poor example, we're going to use it. Of note is the fact that it takes 17 pages of intermixed source code and explanation for this so-called simple demonstration.

    Many instances of "TBDs" like "Figure ?" "Section ?" are still in the text of some of the more esoteric manuals, like the ENCINA manuals.


  4. Grotesque Complexity.

    As a sickening immediate introduction to the complexity, look at this function prototype from page 5-224 of Application Development Reference:

    void sec_rgy_login_get_effective (
    sec_rgy_handle_t context,
    sec_rgy_login_name_t *login_name,
    sec_rgy_acct_key_t *key_parts,
    sec_rgy_sid_t *sid,
    sec_rgy_unix_sid_t *unix_sid,
    sec_rgy_acct_user_t *user_part,
    sec_rgy_acct_admin_t *admin_part,
    sec_rgy_acct_plcy_t *policy_data,
    signed32 max_number,
    signed32 *supplied_number,
    uuid_t id_projlist[],
    sec_rgy_name_t cell_name,
    uuid_t *cell_uuid,
    sec_override_fields_t *overridden,
    error_status_t *status

    Seventeen arguments, 12 types, all of them typedefs, none simple, some pointers, one array.

    Proof that OSF DCE is complex: the documentation is enormous.

    The Application Development Reference has:
    83 pages documenting pthreads.
    409 pages documenting remote procedure call.
    116 pages documenting directory services.
    75 pages documenting time service.
    312 pages documenting security service.
    201 pages documenting distributed file service.

    The Application Development Reference is only an alphabetical listing of the C language API man pages. There is another book of equal thickness that describes how to use the APEM, the Application Development Guide.

    Figure 26-12, Data Type OM_descriptor_struct in the OSF DCE Application Development Guide gives an example of the enormous complexity inherent in the system.

    Despite the enormous and complex documentation, some portions are incomplete, notably the security service. The man page for the function sec_login_certify_identity() for instance, has many "and then a miracle occurs" sort of ellipses in the example code.

    Another incomplete portion is the XOM abstract object manipulation stuff. XOM has grotesquely complex schema, complete with superfluous layers of indirection, yet convenience routines are not included. The convenience routines are, however, laid out in the documentation. Use them and violate OSF's copyright. This has got to be one of the most confusing implementations of anything possible. Obvious convenience routines (string to XDS name, and vice versa) are lacking, example code is wrong, and extra levels of indirection are present. Essentially, XOM is an implementation of an object- oriented system, done in C, with the runtime implemented as a set of C-preprocessor macros.

    A bizarre fault in at least one vendor's xomi.h header file has an element of several "C" structs being named "class" This would be entirely acceptable for "C" programs, yet the header files have "ifdef __cplusplus" preprocessor directives, which indicates the header file could be used in "C++" programs. Unfortunately, "class" is a reserved keyword in "C++"


  5. Inconsistency gets taken to bizarre extremes.

    As an example, let's take the various methods of error returns.

    Pthreads calls (functions with names beginning with pthread_) follow the UN*X convention of returning 0 on success, -1 on failure, with a magical, global variable errno to give some indication of what went haywire.

    Global variable errno is actually supposed to be a per-thread "lvalue" but at least under one vendor's implementation, it's per process.

    DCE RPC calls (functions with names beginning with rpc_) almost always return void (nothing), but have a pointer to an int passed in as a formal argument. That int gets filled in to indicate success or type of failure.

    Just for added complexity, the DCE RPC runtime will also "raise exceptions" for some of the error conditions. If you don't "catch the exceptions" your program aborts. By macro substitution, you can also specify 'exception' returns for the pthreads calls. This can cause troubles if different source code modules are compiled with different #include files.

    The directory service routines return opaque objects (anonymous structs) that must be further manipulated to determine success or failure, and what failed. Since some of the opaque object's internal structures are unknown, you must use the magic routine om_get() to do much of the work.

    The distributed file service takes a pointer to a special structure, some elements of which indicate success or failure, others indicate what kind. The structures are documented, not opaque, and must be manually manipulated.

    OSF DCE uses about 4 and a half different methods for indicating failure.


  6. Consistency is taken to bizarre extremes.

    Those "orthogonality" weenies are at it again in the case of the Pthreads API. Apparently because a pthread needs "attributes" in the form of stack size, priority and scheduling algorithm, the other pthreads "objects" are given "attributes" on creation too. Unfortunately, no way of setting or examining the "attributes" for "objects" like condition variables are available. Condition variable "attributes objects" serve as a placeholder so that a matrix of operations/objects will be completely filled in, and "orthogonality" will be achieved, at least in the hearts and minds of those for whom totally filled-in matrices mean something.


  7. The command line "administration" utilities are crude.

    They seem to be incomplete implementations of what they ought to be. That is, error handling is incomplete, uninformative and inconsistent. Command keyword usage is not consistent, and varies in format and quantity of output produced.

    As an example, the "Object UUID" is printed in a sort of big-endian mode by rpccp, while cdscp prints it in little-endian mode.

    As another example, rpccp will print a help message when the user enters "?" or "help" cdscp only accepts "help" - entering "?" yeilds an error.


  8. The command line "administration" utilities cheat.

    They do things for which no API is defined, or they do things by some extra-API methods. For instance, the sole and only way to create a CDS "directory" is via the command-line utility cdscp. The implication is that despite it's enormous complexity, the DCE API is incomplete, and that since there are "secret" APIs, the "open" in OSF is confirmed as marketing spin talk.


  9. It's proprietary.

    Let's face it, the "Open" in OSF is marketing spin talk. OSF DCE is all proprietary. There isn't even an RFC for the host-independent data format, much less for the guts of the RPC stuff. Of course, there's an MIT implementation of X.500, possibly the ickiest part of the whole DCE mess, but it's not really the same as having the source code for an implementation of XDR and ONC RPC lying around, and a couple of real RFCs to appeal to should a vendor be reluctant to admit to bugs. For this reason alone, OSF DCE is going to cost a lot of money.


  10. OSF DCE duplicates well-known and widely implemented standards.

    The most prominent case of this the DCE Distributed Time Service (DTS). A well known, widely implemented standard, complete with real RFCs and supporting mathematics and research, exists in Network Time Protocol (NTP). NTP doesn't have the faults that other DCE stuff alleges to correct by duplicating well-known and widely implemented standards.


Related Links

Open Software Foundation

DCE 1.1 client code.

Maybe all forms of RPC suffer from heinous defects, at some basic, information-theoretic level.

DCE 1.2.2 license and download.

Brown University Computer Science Department 1992 whitepaper and 1993 whitepaper on Open Software, Unix, DCE, etc. These two papers interest me in a historical sense, primarily because the author makes predictions which failed to come true. Secondarily, the author includes Windows NT in the second paper, apparently because he believed the early hype about it. The 1993 whitepaper actually contains a very coherent explanation of DCE, remote procedure calls, and how programmers implement them.

Carnegie Mellon University's Coda filesystem and the accompanying RPC2 ( manual). remote procedure call system.

Source code for Sun Microsystem's ONC RPC remote procedure call system.

General introduction to Remote Procedure Call concepts.

ONC RPC and NIS+ Development Guide.

Pithy commentary on the concept of remote procedure call.

RFC 707, known as one of the first public descriptions of remote procedure call.

Bruce Ediger's home page

$Id: anti_dce.html,v 1.7 1999/10/14 05:25:02 bediger Exp bediger $