Thursday, November 02, 2006

Integrating Lucene with Spring Framework & Hibernate

While looking for integration support for Lucene with Spring Framework & Hibernate, I have come across a full-blown open source Java Search Engine Framework called Compass Framework which is built on top of the Lucene Search Engine and provides seamless integration support to popular development frameworks like Hibernate and Spring Framework.

Why do we need yet another framework for implementing search functionality?

Lucene is a low level API which implies that it can easily cause coupling problems especially with the domain objects. This way of directly coding the Lucene API into the application maybe a performance killer and can also become a cause of maintenance nightmare in future (with domain model changes). Looking for other options for integrating Lucene with our Spring based application, I came across two alternatives that exist in the open-source arena:

1. Lucene Spring Modules

One option is using the "Lucene Spring Modules", which is a part of "Spring Modules project" which tries to extend the functionalities of Spring Framework to include other open-source tools. The project is intended to facilitate integration between Spring Framework and other projects without cluttering or expanding the Spring core.

2. Compass Framework

Another option is to use Compass Framework which provides a declarative way to map the domain model to the search engine. Compass provides a high level abstraction on top of the Lucene's low level API which supports a declarative mapping of domain objects. It externalizes all dependencies and coupling in a compass meta data file and thus provides a declarative technique to map the domain objects. Compass also implements fast index operations and optimization which increases the application performance.

Compass Framework provides a module named "Compass::Spring" which is intended to provide closer integration with the Spring Framework. It supports IoC using Spring's Application Context and provides support for Hibernate Session Factory. CF claims to support complex applications with bigger domain models easily. Compass also claims to bring maintenance and performance down to negligible values. Compass comes with a sample project (the old petclinic sample with additional search functionalities using Compass Framework) that demonstrates its integration support with Spring Framework & Hibernate. The product is also quite mature with much elaborate documentation. The current stable version is compass version 1.1M2.

More about Compass Framework

Compass is a first class open source Java Search Engine Framework, enabling the power of Search Engine semantics to your application stack decoratively. Compass is a powerful, transactional Object to Search Engine Mapping (OSEM) Java framework which allows you to declaratively map your Object domain model to the underlying Search Engine, synchronizing data changes between Index and different datasources. Compass provides a high level abstraction on top of the Lucene low level API. Compass also implements fast index operations and optimization and introduce transaction capabilities to the Search Engine.

In recent versions, compass provides a Lucene Jdbc Directory implementation, allowing storing Lucene index within a database for both pure Lucene applications and Compass enabled applications. Compass also provides support to SpringHibernate Gps Device (configured in Spring context file using IoC) which utilizes Compass OSEM feature (Object to Search Engine Mappings) and Hibernate ORM feature (Object to Relational Mappings) to provide simple database indexing. All the OSEM mappings are defined in a compass meta-data file and the SpringHibernate Gps Device intercepts the Hibernate session factory object to index data transparently. The Gps Device also provide real time mirroring of data changes done through Hibernate so you didn't have to explicitly re-index data after a store/update/delete. The path data travels through the system are: Database -- Hibernate -- Objects -- Compass::Gps -- Compass::Core (Search Engine). The compass returns the ids of objects matched along with a tag that identifies the class of object it belongs.

Dear readers don’t forget to read about the origin of compass framework as described by the author Shay Banon’s on his blog. It is well written and I bet you will surely enjoy the narration!!!!!!!!

References:

Open Symphony's Page
Shay Banon’s Blog


Wednesday, November 01, 2006

Full Text Search

In this article I have tried to evaluate some of the options for integrating full-text search features in java applications.

MySQL’s built-in Full Text Search engine

From my initial search what I could find was that MySQL’s built-in Text Search Engine surprisingly does effective full-text searching if the dataset is small. Also it has the least cost to implement since the search criteria can be specified as a part of query itself. But as the size of dataset grows its efficiency becomes dependent on the system resources like CPU, RAM etc.

Open Source Full Text Search engines

Most of the external full text search engines work by keeping a separate index of the table data which will be updated at frequent intervals (maybe with some amount of caching) so that time spend on the database server is less for searching for information. This approach will certainly lessen the load on the database server.

A complete list of popular full text search engines is available at WikiMedia site

1. Sphinx

Sphinx is a full-text search engine, distributed under GPL version 2. Generally, it's a standalone search engine, meant to provide fast, size-efficient and relevant full-text search functions to other applications. Sphinx was specially designed to integrate well with SQL databases and scripting languages. Currently built-in data source drivers support fetching data either via direct connection to MySQL, PostgreSQL, or from a pipe in a custom XML format.

2. Lucene

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java with features like Scalable High-Performance Indexing, Powerful Accurate and Efficient Search Algorithms etc.

As a full-text search engine, Lucene needs little introduction. Lucene, an open source project hosted by Apache, aims to produce high-performance full-text indexing and search software. The Java Lucene product itself is a high-performance, high capacity, full-text search tool used by many popular Websites such as the Wikipedia online encyclopedia and TheServerSide.com, as well as in many, many Java applications. It is a fast, reliable tool that has proved its value in countless demanding production environments.

Although Lucene is well known for its full-text indexing, many developers are less aware that it can also provide powerful complementary searching, filtering, and sorting functionalities. Indeed, many searches involve combining full-text searches with filters on different fields or criteria. For example, you may want to search a database of books or articles using a full-text search, but with the possibility to limit the results to certain types of books. Traditionally, this type of criteria-based searching is in the realm of the relational database. However, Lucene offers numerous powerful features that let you efficiently combine full-text searches with criteria-based searches and sorts.

Bench Marks


The results of benchmarking the most popular full text search engines (MySQL’s built-in Text Search engine, Sphinix Text Search engine plug-in for MySQL and Lucene) is published in the PlanetMySQL site.

Conclusion

Lucene is “the most” popular full text search solution available now to conduct efficient full text searches on database compared to MySQL’s built-in Text Search engine and Sphinix plug-in for MySQL. Lucene is just a java API so it provides seamless integration with other Java programs as compared to Sphinix written in pearl. It is a set of tools that allows us to create an index and then search it. So we need to manually handle the index creation/updation and searches using the API. But the good news is that Spring & Hibernate supports integration of Lucene through various support classes. So go ahead and use it in your java projects. Have a great time searching inside your applications with Lucene!!!

Reference:

Java World Article on Integrating Lucene

Friday, April 14, 2006

Part III - Open Source Java Frameworks for Web Development

This is the last part of the article on Frameworks. In the previous articles I have given an introduction on the concepts of frameworks and how foundations on modern java frameworks were laid. In this article, I will compare several production quality web frameworks, such as Struts, Spring, and Hibernate and go over basic similarities and underlying concepts.

Basic Concepts

Almost all modern Web-development frameworks follow the Model-View-Controller (MVC) design. Business logic and presentation are separated and a controller of logic flow coordinates requests from clients and actions taken on the server. This approach has become the de facto of Web development. The underlying mechanics of each framework are of course different, but the APIs that developers use to design and implement their Web applications are very similar. The difference also lies in the extensions that each framework provides, such as tag libraries, Java Server Faces, or Java Bean wrappers.

All frameworks use different techniques to coordinate the navigation within the Web application, such as the XML configuration file, java property files, or custom properties. All frameworks also differ in the way the controller module is implemented. For instance, EJBs may instantiate classes needed in each request, or Java reflection can be used to dynamically invoke an appropriate action classes. Also, frameworks may differ conceptually. For example, one framework may define the user request and response (and error) scenario, and another may only define a complete flow from one request to multiple responses and subsequent requests.

Java frameworks are similar in the way they structure data flow. After request, some action takes place on the application server, and some data populated objects are always sent to the JSP layer with the response. Data is then extracted from those objects, which could be simple classes with setter and getter methods, java beans, value objects, or some collection objects. Modern Java frameworks also simplify a developer's tasks by providing automatic Session tracking with easy APIs, database connection pools, and even database call wrappers. Some frameworks either provide hooks into other J2EE technologies, such as JMS (Java Messaging Service) or JMX, or have these technologies integrated. Server data persistence and logging also could be part of a framework.

Popular Web Frameworks

Apache Struts Framework
The Struts framework is an open-source product for building Web applications based on the model-view-controller (MVC) design paradigm. It uses and extends the Java Servlet API and was originally created by Craig McClanahan. In May 2000, it was donated to the Apache Foundation. It features a powerful custom tag library, tiled displays, form validation, and I18N (internationalization). Also, Struts supports a variety of presentation layers, including JSP, XML/XSLT, JavaServer Faces (JSF), and Velocity, as well as a variety of model layers, including JavaBeans and EJB.

Spring Framework
The Spring Framework is a layered Java/J2EE application framework based on code published in Expert One-on-One J2EE Design and Development. The Spring Framework provides a simple approach to development that does away with numerous properties files and helper classes that litter projects. Key features of the Spring Framework include:

Powerful JavaBeans-based configuration management, applying Inversion-of-Control (IoC) principles.

A core bean factory, usable in any environment, from applets to J2EE containers.

Generic abstraction layer for database transaction management, allowing for pluggable transaction managers, and making it easy to demarcate transactions without dealing with low-level issues.

JDBC abstraction layer with a meaningful exception hierarchy.

Integration with Hibernate, DAO implementation support, and transaction strategies.

Hibernate Framework
Hibernate is an object-relational mapping (ORM) solution for the Java language. It is also open source software, as is Struts, and is distributed under the LGPL. Hibernate was developed by a team of Java software developers around the world. It provides an easy to use framework for mapping an object-oriented domain model to a traditional relational database. It not only takes care of the mapping from Java classes to database tables (and from Java data types to SQL data types), but also provides data query and retrieval facilities and can significantly reduce development time otherwise spent with manual data handling in SQL and JDBC.

Hibernate's goal is to relieve the developer from a significant amount of common data persistence-related programming tasks. Hibernate adapts to the development process, whether it is started with a design from scratch or from a legacy database. Hibernate generates the SQL, and relieves the developer from manual result set handling and object conversion, and keeps the application portable to all SQL databases. It provides transparent persistence, the only requirement for a persistent class is a no-argument constructor.

There are more frameworks than I have described here, of course, both open-source and commercial, such as

WebWork -
http://www.opensymphony.com/webwork/
Tapestry -
http://jakarta.apache.org/tapestry/,

and many frameworks were in-house developed by extending some other MVC frameworks.

Enterprise Development Environments

Some of these frameworks became very popular within the Web developer Community and enterprise development space. As these frameworks matured into stable releases, commercial IDE (integrated development environment) toolmakers started to build support for them into their products. Some even went as far as to develop whole products based on the concepts of the framework. For example, BEA WebLogic Workshop is build around the Struts framework.Borland JBuilder has built-in support for Struts and features JSF and JSTL support as well.

The Eclipse platform became a very popular development tool, partly because of its plug-in base and partly because of its Web framework support. Numerous plug-ins to Eclipse or even entire distributions of Eclipse-based IDEs appeared. Many of the plug-ins were designed for Struts framework development, such as MyEclipse (www.myeclipse.org) or M7 (www.m7.com).

As the Web development arena continues to evolve its tools and programming methodologies, so will the Java application frameworks continue to grow. The future seems very bright for the Java Web-development frameworks.

End of Part III

Part II - Java Frameworks (The evolution of Java development)

This is the second part of the article on Frameworks which illustrates the evolution of java development. Major part of it have been gleaned from the article “Java Frameworks Take Hold “ By Rene Bonvanie

Java 2 Platform, Enterprise Edition (J2EE) is an incredibly powerful technology. It is designed to be flexible enough to adapt to many different types of applications without requiring developers to invent new approaches.

Start that first project, and the questions come fast and furious. What combination of JavaServer Pages (JSP), Enterprise JavaBeans (EJB), and servlet components should you use to build each part of the system? How will performance be ensured? Is one approach more scalable than another? And finally, how can the choices, once selected, be enforced consistently across a development team?

These questions are at the core of one of the most important discussions in the Java community today. And the mass adoption of Java in internet development projects has resulted in a flood of solutions in the shape of best practices, frameworks, and development tools.

In The Beginning: Design Patterns

The Java community recognized early on that guidelines were necessary to help developers deal with the myriad J2EE-related choices. Gradually, a set of best practices emerged, usually called the J2EE Design Patterns.

J2EE Design Patterns generalize proven, high-quality approaches for frequently encountered design issues with the J2EE application model in a format that all developers can use. Typically, a design pattern is a written description of the problem domain followed by some sample code implementing a solution.

Take, for example, the Web tier in a typical J2EE application. JSP and servlets do a great job at increasing developer productivity when building individual dynamic Web pages but provide little support for managing page-to-page flow. Furthermore, on their own, JSP and servlets do not enforce separation of the Web presentation and business logic.

Here is where design patterns fit in. The basic problem just described is resolved by a pattern called the Model-View-Controller (MVC) design pattern. This pattern specifies a way to build an application so there is a consistent way to control page flow and to separate presentation and business logic layers. The MVC approach naturally builds on JSP and servlets, using the strength of these core specifications.

Next Generation: Frameworks

Developers have gravitated to the J2EE Design Patterns en masse because they represent some of the best-known practices for J2EE application development. Incorpora-ting design patterns into applications promises high-quality, high-performance implementations.

Yet the problem most developers face when working with design patterns is that they are exactly as their name implies: a set of patterns that tend to be academically rigorous but are not easily enforceable or automated on their own. Design patterns are merely coding templates and recommendations that developers are expected to follow, with no guarantees of consistency, understanding, or enforceability. Vendors and developers are now moving to the next generation: developing frameworks based on the J2EE Design Patterns.

At the lowest level, J2EE frameworks automate the easily repeatable coding aspects of the patterns with techniques such as automatic code generation or a metadata-driven approach. At the highest level, J2EE frameworks turn into visual design and declarative programming environments.

An example framework is Apache Struts, implementing the MVC design pattern. It is a popular open source Web-tier framework that originated in an effort to provide a standard implementation of the MVC design pattern. Struts took the major concepts of MVC and created a consistent, reusable metadata layer into which J2EE developers plug the specifics of their applications.

With Struts, J2EE developers no longer have to worry about building the MVC design pattern "plumbing" in every project; rather, they can focus on applying their creative thinking to the presentation layer of the business application itself. Struts—and frameworks in general—bring other benefits too: reduced training costs, faster project delivery, and consistency across application implementations.

One can work across a standard J2EE architecture and find representative frameworks implementing design patterns in each tier. For example, Web-tier frameworks such as Apache Struts are easily combined with business-tier frameworks such as Business Components for Java (BC4J), to write entire applications.

There are many options in the data tier. For instance, BC4J provides a highly scalable implementation of the data access object pattern for persistence. Such business-tier frameworks are also frequently paired with persistence layers such as Oracle9iAS TopLink that help developers map general-purpose business-domain models to data stores such as relational databases.

Frameworks such as Apache Struts and BC4J represent a growing trend in the framework world: implement a set of J2EE Design Patterns and ensure the framework is open and flexible enough to easily plug into other popular frameworks. The goal is to give J2EE developers choice and productivity at the same time.

Making Choices

In the open source and commercial space, dozens of frameworks are emerging. Logical questions follow. What makes a successful framework? How does a developer choose the right framework? Which ones will survive?

One answer is that the surviving, widely adopted frameworks will likely be those that cleanly and elegantly solve architectural problems and significantly increase productivity over straight programming. Leading J2EE frameworks will be judged on quality of implementation, maturity, usability, cost, performance, and reliability.

As the core J2EE specifications evolve to incorporate framework features, the J2EE containers will provide developers with best practices and design consistency already built in. When this happens, developers will focus their selection criteria on the second major area frameworks tend to feature: productivity and ease of use for developers.

Open source Java Frameworks

Below is the list of some popular frameworks in the open source space.

Open Source J2EE Application Frameworks

Spring - Spring is a layered Java/J2EE application framework, based on code published in Expert One-on-One J2EE Design and Development

Jeenius - Jeenius is a framework to simplify the creation of J2EE applications. It has a strong focus on building web-based applications.

Open Source Web Frameworks in Java

Struts - The core of the Struts framework is a flexible control layer based on standard technologies like Java Servlets, JavaBeans, ResourceBundles, and XML, as well as various Jakarta Commons packages. Struts encourages application architectures based on the Model 2 approach, a variation of the classic Model-View-Controller (MVC) design paradigm.

Spring MVC – MVC framework provided by Spring is amost similar to Struts but is more powerful and easy to use.

WebWork - WebWork is a web application framework for J2EE. It is based on a concept called "Pull HMVC" (Pull Hierarchical Model View Controller).

Cocoon - Apache Cocoon is a web development framework built around the concepts of separation of concerns and component-based web development. Cocoon implements these concepts around the notion of 'component pipelines', each component on the pipeline specializing on a particular operation. This makes it possible to use a Lego(tm)-like approach in building web solutions, hooking together components into pipelines without any required programming.

Turbine - Turbine is a servlet based framework that allows experienced Java developers to quickly build secure web applications. Turbine is an excellent choice for developing applications that make use of a services-oriented architecture. Some of the functionality provided with Turbine includes a security management system, a scheduling service, XML-defined form validation server, and an XML-RPC service for web services. It is a simple task to create new services particular to your application.

Tapestry - Tapestry is a powerful, open-source, all-Java framework for creating leading edge web applications in Java. Tapestry reconceptualizes web application development in terms of objects, methods and properties instead of URLs and query parameters. Tapestry is an alternative to scripting environments such as JavaServer Pages or Velocity. Tapestry goes far further, providing a complete framework for creating extremely dynamic applications with minimal amounts of coding.

Open Source Persistence Frameworks in Java

Hibernate - Hibernate is a powerful, ultra-high performance object/relational persistence and query service for Java. Hibernate lets you develop persistent objects following common Java idiom - including association, inheritance, polymorphism, composition and the Java collections framework. Extremely fine-grained, richly typed object models are possible. The Hibernate Query Language, designed as a "minimal" object-oriented extension to SQL, provides an elegant bridge between the object and relational worlds. Hibernate is now the most popular ORM solution for Java.

OJB - ObJectRelationalBridge (OJB) is an Object/Relational mapping tool that allows transparent persistence for Java Objects against relational databases.

Ibatis SQL Maps - The SQL Maps framework will help to significantly reduce the amount of Java code that is normally needed to access a relational database. This framework maps JavaBeans to SQL statements using a very simple XML descriptor. Simplicity is the biggest advantage of SQL Maps over other frameworks and object relational mapping tools. To use SQL Maps you need only be familiar with JavaBeans, XML and SQL. There is very little else to learn. There is no complex scheme required to join tables or execute complex queries. Using SQL Maps you have the full power of real SQL at your fingertips. The SQL Maps framework can map nearly any database to any object model and is very tolerant of legacy designs, or even bad designs. This is all achieved without special database tables, peer objects or code generation.

End of Part II

Part I - Introduction to Frameworks

For all of my friends who are already familiar with development of web based applications with Java (J2EE) using JSP and servlets, and would like to start using Java Frameworks, here is a three part article that introduce the basics of modern java frameworks.

The concept of framework has been kicking around in software development for a long time in one form or another. In its simplest form, a framework is simply a body of tried and tested code that is reused in multiple software development projects. A framework in general, provides an implementation for the core and unvarying functions and includes mechanisms to allow developer to plug-in various functions or to extend the funtions.

Frameworks can be classified into 3 based on their scope, as follows:

1. System infrastructure frameworks - These frameworks simplify the development of portable and efficient system infrastructure such as operating system and communication frameworks, and frameworks for user interfaces and language processing tools. System infrastructure frameworks are primarily used internally within a software organization and are not sold to customers directly.

2. Middleware integration frameworks - These frameworks are commonly used to integrate distributed applications and components. Middleware integration frameworks are designed to enhance the ability of software developers to modularize, reuse, and extend their software infrastructure to work seamlessly in a distributed environment. There is a thriving market for Middleware integration frameworks, which are rapidly becoming commodities. Common examples include ORB frameworks, message-oriented middleware, and transactional databases.

3. Enterprise application frameworks - These frameworks address broad application domains (such as telecommunications, avionics, manufacturing, and financial engineering) and are the cornerstone of enterprise business activities. Relative to System infrastructure and Middleware integration frameworks, Enterprise frameworks are expensive to develop and/or purchase. However, Enterprise frameworks can provide a substantial return on investment since they support the development of end-user applications and products directly.

Regardless of their scope, frameworks can also be classified by the techniques used to extend them, which range along a continuum from whitebox frameworks to blackbox frameworks.

1. Whitebox frameworks rely heavily on OO language features like inheritance and dynamic binding to achieve extensibilty. Existing functionality is reused and extended by (1) inheriting from framework base classes and (2) overriding pre-defined hook methods using patterns like Template Method. Whitebox frameworks require application developers to have intimate knowledge of the frameworks' internal structure. Although whitebox frameworks are widely used, they tend to produce systems that are tightly coupled to the specific details of the framework's inheritance hierarchies.

2. Blackbox frameworks support extensibility by defining interfaces for components that can be plugged into the framework via object composition. Existing functionality is reused by (1) defining components that conform to a particular interface and (2) integrating these components into the framework using patterns like Strategy and Functor. Blackbox frameworks are structured using object composition and delegation more than inheritance. As a result, blackbox frameworks are generally easier to use and extend than whitebox frameworks. However, blackbox frameworks are more difficult to develop since they require framework developers to define interfaces and hooks that anticipate a wider range of potential use-cases.

Object-Oriented (OO) Application Frameworks

Object-oriented (OO) application frameworks are a promising technology for reifying proven software designs and implementations in order to reduce the cost and improve the quality of software. An OO application framework is a reusable, ``semi-complete'' application that can be specialized to produce custom applications. In contrast to earlier OO reuse techniques based on class libraries, frameworks are targeted for particular business units (such as data processing or cellular communications) and application domains (such as user interfaces or persistance).

The primary benefits of OO application frameworks stem from the modularity, reusability, extensibility, and inversion of control they provide to developers, as described below:

Modularity - Frameworks enhance modularity by encapsulating volatile implementation details behind stable interfaces. Framework modularity helps improve software quality by localizing the impact of design and implementation changes. This localization reduces the effort required to understand and maintain existing software.

Reusability - The stable interfaces provided by frameworks enhance reusability by defining generic components that can be reapplied to create new applications. Framework reusability leverages the domain knowledge and prior effort of experienced developers in order to avoid re-creating and re-validating common solutions to recurring application requirements and software design challenges. Reuse of framework components can yield substantial improvements in programmer productivity, as well as enhance the quality, performance, reliability and interoperability of software.

Extensibility - A framework enhances extensibility by providing explicit hook methods that allow applications to extend its stable interfaces. Hook methods systematically decouple the stable interfaces and behaviors of an application domain from the variations required by instantiations of an application in a particular context. Framework extensibility is essential to ensure timely customization of new application services and features.

Inversion of control - The run-time architecture of a framework is characterized by an ``inversion of control.'' This architecture enables canonical application processing steps to be customized by event handler objects that are invoked via the framework's reactive dispatching mechanism. When events occur, the framework's dispatcher reacts by invoking hook methods on pre-registered handler objects, which perform application-specific processing on the events. Inversion of control allows the framework (rather than each application) to determine which set of application-specific methods to invoke in response to external events (such as window messages arriving from end-users or packets arriving on communication ports).

Early object-oriented frameworks (such as MacApp and Interviews) originated in the domain of graphical user interfaces (GUIs). The Microsoft Foundation Classes (MFC) is a contemporary GUI framework that has become the de facto industry standard for creating graphical applications on PC platforms. Although MFC has limitations (such as lack of portability to non-PC platforms), its wide-spread adoption demonstrates the productivity benefits of reusing common frameworks to develop graphical business applications.

The next generation of OO application frameworks targeted at complex business and application domains. At the heart of this effort were the Object Request Broker (ORB) frameworks, which facilitate communication between local and remote objects. ORB frameworks eliminate many tedious, error-prone, and non-portable aspects of creating and managing distributed applications and reusable service components. This enables programmers to develop and deploy complex applications rapidly and robustly, rather than wrestling endlessly with low-level infrastructure concerns. Widely used ORB frameworks include CORBA, DCOM, and Java RMI.

In server-side development, a number of core tasks crop up over and over again. Such tasks can be pulled into a core framework, built and tested once, and reused across multiple projects. Utilizing this opportunity, many frameworks emerged that simplified the development of web based projects. As development of Web-based application servers and their applications expanded, so did the frameworks that supported these technologies. Currently, there are many software frameworks in the enterprise development space especially for the Java J2EE platform.

A good framework enhances the maintainability of software through API consistency, comprehensive documentation, and thorough testing. Some companies invest formally in frameworks and developers build up a library of components that they use often. Such actions reduce development time while improving delivered software quality - which means that developers can spend more time concentrating on the business-specific problem at hand rather than on the plumbing code behind it. There are also many mature frameworks available in the open source arena. Adopting such stable frameworks are more effective than going on to develop a framework from scratch.

End of Part I

Wednesday, April 12, 2006

Object Oriented Database Management Systems

In today's world, Client-Server applications that rely on a database on the server as a data store while servicing requests from multiple clients are quite commonplace. Most of these applications use a Relational Database Management System (RDBMS) as their data store while using an object oriented programming language for development. This causes a certain inefficency as objects must be mapped to tuples in the database and vice versa instead of the data being stored in a way that is consistent with the programming model. The "impedance mismatch" caused by having to map objects to tables and vice versa has long been accepted as a necessary performance penalty. The following article is aimed at seeking out an alternative that avoids this penalty.This information was gleaned from the article “An Exploration Of Object Oriented Database Management Systems“ by Dare Obasanjo.

Overview of OODBMS

An OODBMS is the result of combining object oriented programming principles with database management principles. Object oriented programming concepts such as encapsulation, polymorphism and inheritance are enforced as well as database management concepts such as the ACID properties (Atomicity, Consistency, Isolation and Durability) which lead to system integrity, support for an ad hoc query language and secondary storage management systems which allow for managing very large amounts of data.

The Object Oriented Database Manifesto specifically lists the following features as mandatory for a system to support before it can be called an OODBMS; Complex objects, Object identity, Encapsulation, Types and Classes, Class or Type Hierarchies, Overriding, overloading and late binding, Computational completeness, Extensibility, Persistence, Secondary storage management, Concurrency, Recovery and an Ad Hoc Query Facility. An OODBMS is thus a full scale object oriented development environment as well as a database management system. Features that are common in the RDBMS world such as transactions, the ability to handle large amounts of data, indexes, deadlock detection, backup and restoration features and data recovery mechanisms also exist in the OODBMS world.

A primary feature of an OODBMS is that accessing objects in the database is done in a transparent manner such that interaction with persistent objects is no different from interacting with in-memory objects. This is very different from using an RDBMSs in that there is no need to interact via a query sub-language like SQL nor is there a reason to use a Call Level Interface such as ODBC, ADO or JDBC. Database operations typically involve obtaining a database root from the the OODBMS which is usually a data structure like a graph, vector, hash table, or set and traversing it to obtain objects to create, update or delete from the database.

Comparisons of OODBMSs to RDBMSs


There are concepts in the relational database model that are similar to those in the object database model. A relation or table in a relational database can be considered to be analogous to a class in an object database. A tuple is similar to an instance of a class but is different in that it has attributes but no behaviors. A column in a tuple is similar to a class attribute except that a column can hold only primitive data types while a class attribute can hold data of any type. Finally classes have methods which are computationally complete (meaning that general purpose control and computational structures are provided) while relational databases typically do not have computationally complete programming capabilities although some stored procedure languages come close.

Below is a list of advantages and disadvantages of using an OODBMS over an RDBMS with an object oriented programming language.

Advantages

Composite Objects and Relationships: Objects in an OODBMS can store an arbitrary number of atomic types as well as other objects. It is thus possible to have a large class which holds many medium sized classes which themselves hold many smaller classes, ad infinitum. In a relational database this has to be done either by having one huge table with lots of null fields or via a number of smaller, normalized tables which are linked via foreign keys. Having lots of smaller tables is still a problem since a join has to be performed every time one wants to query data based on the "Has-a" relationship between the entities. Also an object is a better model of the real world entity than the relational tuples with regards to complex objects. The fact that an OODBMS is better suited to handling complex,interrelated data than an RDBMS means that an OODBMS can outperform an RDBMS by ten to a thousand times depending on the complexity of the data being handled.

Class Hierarchy: Data in the real world is usually has hierarchical characteristics. The ever popular Employee example used in most RDBMS texts is easier to describe in an OODBMS than in an RDBMS. An Employee can be a Manager or not, this is usually done in an RDBMS by having a type identifier field or creating another table which uses foreign keys to indicate the relationship between Managers and Employees. In an OODBMS, the Employee class is simply a parent class of the Manager class.

Circumventing the Need for a Query Language: A query language is not necessary for accessing data from an OODBMS unlike an RDBMS since interaction with the database is done by transparently accessing objects. It is still possible to use queries in an OODBMS however.

No Impedence Mismatch: In a typical application that uses an object oriented programming language and an RDBMS, a signifcant amount of time is usually spent mapping tables to objects and back. There are also various problems that can occur when the atomic types in the database do not map cleanly to the atomic types in the programming language and vice versa. This "impedance mismatch" is completely avoided when using an OODBMS.


No Primary Keys: The user of an RDBMS has to worry about uniquely identifying tuples by their values and making sure that no two tuples have the same primary key values to avoid error conditions. In an OODBMS, the unique identification of objects is done behind the scenes via OIDs and is completely invisible to the user. Thus there is no limitation on the values that can be stored in an object.

One Data Model: A data model typically should model entities and their relationships, constraints and operations that change the states of the data in the system. With an RDBMS it is not possible to model the dynamic operations or rules that change the state of the data in the system because this is beyond the scope of the database. Thus applications that use RDBMS systems usually have an Entity Relationship diagram to model the static parts of the system and a seperate model for the operations and behaviors of entities in the application. With an OODBMS there is no disconnect between the database model and the application model because the entities are just other objects in the system. An entire application can thus be comprehensively modelled in one UML diagram.

Disadvantages

Schema Changes: In an RDBMS modifying the database schema either by creating, updating or deleting tables is typically independent of the actual application. In an OODBMS based application modifying the schema by creating, updating or modifying a persistent class typically means that changes have to be made to the other classes in the application that interact with instances of that class. This typically means that all schema changes in an OODBMS will involve a system wide recompile. Also updating all the instance objects within the database can take an extended period of time depending on the size of the database.

Language Dependence: An OODBMS is typically tied to a specific language via a specific API. This means that data in an OODBMS is typically only accessible from a specific language using a specific API, which is typically not the case with an RDBMS.

Lack of Ad-Hoc Queries: In an RDBMS, the relational nature of the data allows one to construct ad-hoc queries where new tables are created from joining existing tables then querying them. Since it is currently not possible to duplicate the semantics of joining two tables by "joining" two classes then there is a loss of flexibility with an OODBMS. Thus the queries that can be performed on the data in an OODBMS is highly dependent on the design of the system.

List of Object Oriented Database Management Systems

Proprietary
Object Store
O2
Gemstone
Versant
Ontos
DB/Explorer ODBMS
Ontos
Poet
Objectivity/DB
EyeDB

Open Source
Ozone
Zope
FramerD
XL2

Conclusion

The gains from using an OODBMS while developing an application using an OO programming language are many. The savings in development time by not having to worry about seperate data models as well as the fact that there is less code to write due to the lack of impedance mismatch is very attractive. There is little reason to pick an RDBMS over an OODBMS system for new application development unless there are legacy issues that have to be dealt with.

Technorati links: