LIMSzine

LAB DATA MANAGEMENT

The following column highlights the new Data Warehousing tools, and explains what a Data Warehouse is, how it is different from and interacts with a LIMS, and what it takes to build a Data Warehouse.


New Data Warehousing Tools Incorporate Lab Data


The use of laboratory data for organizational decision making will soon become common practice at many companies due to the latest crop of business intelligence tools that fall under the Data Warehousing banner. Data Warehousing tools take data from a variety of sources, clean it, integrate it, store it in a single repository, and allow senior finance, business analysts, and field sales staff to access and analyze the data to make strategic decisions.

What is a Data Warehouse?

A Data Warehouse is a physically separate store of data transformed from the application data found in operational environments. In almost every case, the data entering the Data Warehouse comes from an operational database. What makes a Data Warehouse different is that it is subject-oriented, integrated, time-variant, and nonvolatile. Data Warehouses should not, however, be confused with operational databases.

Whereas the laboratory information in the LIMS, or operational environment of the lab, consists of samples, tests, and results, the Data Warehouse is organized around subjects, such as customer, vendor, product, and activity. This difference is particularly apparent when examining the content of the data.

Another key difference is that data within the Data Warehouse is, without exception, integrated. This integration shows up in many different ways -- in consistent naming conventions, in consistent measurement of variables, in consistent encoding instructions, in consistent physical attributes of data, etc.

All data in the Data Warehouse is accurate as of some moment in time, not as of the moment of access, and may not be updated once recorded, thus it is time variant.

The final defining characteristic of a Data Warehouse is that it is nonvolatile, in other words, there are only two kinds of operations that occur: the initial loading of data and the access of data. Inserts, deletions, and changes -- normal activities in the operational environment -- do not occur in a Data Warehouse.

Because the source of nearly all Data Warehouse data is the operational environment, it is tempting to think that there is massive redundancy of data between the two environments. This is not the case, since only that data which is needed for organizational decision making and strategic support is placed into the Data Warehouse. In addition, data in the operational environment is very fresh, whereas data in the warehouse is much older and contains summary data that is never found in the operational environment.

Operational versus Strategic

The Data Warehouse takes the operational LIMS data and examines it against strategic criteria. This includes both project analysis in the form of studies, samples, and results, and product analysis by lot, batch, and results. Such in-depth data manipulation and examination have not typically been handled by the LIMS in any case, but by third-party software. Data Warehouse tools, however, go beyond the typical spreadsheet or database tools because they can handle complex queries, not just simple queries.

An operational database can process most simple queries. However, complex queries running against that operational database frequently present problems in terms of performance. When complex queries must be supported, organizations have a greater incentive to offload operational data to a Data Warehouse.

Building a Data Warehouse

Building a Data Warehouse involves extracting data from various operational databases and populating a specialized Data Warehouse database which users can then access without impacting the operational systems. An extensive effort is required to select, map and transform the data that goes into the warehouse, as well as a powerful front-end tool that allows users to easily retrieve and analyze the newly available information.

The first step is to access the appropriate operational data from the variety of legacy databases that exist across the corporation's diverse computing environments. Once accessed, extraction tools transform the data into a consistent, integrated form. This process involves cleaning, reconciling, de-normalizing, and summarizing data, and then loading the data into logical views that can be surfaced into a variety of analytical and reporting applications. Once this data is in the warehouse, business analysts can use business intelligence tools to exploit the data for effective decision-making.

Many organizations opt to set up a framework that provides an enterprise-wide solution using components such as extraction/transformation tools, a relational database, data access and reporting tools, OLAP EIS (Executive Information System) tools, the Internet, data mining tools, and development tools and utilities. When the Data Warehouse becomes too complex, there are middleware products which help to make the Data Warehouse easy to use and manage.

Harness Enterprise-Wide Data

Building powerful and flexible data warehousing solutions is a key element in harnessing the power of enterprise-wide data. Indeed, Data Warehouses represent the next step in the evolution of decision support tools and herald a move away from off-the-shelf database software toward client/server applications that can accommodate the specific requirements of each organization.

There are several trends driving and affecting Data Warehouse capabilities. Database size will naturally increase. More sophisticated data analysis tools -- such as OLAP, data visualization, and data mining -- will gain greater market acceptance and use. Internet web access to the Data Warehouse will become standard. Analysts predict a proliferation of Data Warehouses across the enterprise, effectively producing a data mart within the organization. In addition, there will be an increasing use of metadata and non-textual data.

There are a number of reasons why implementing Data Warehouse solutions makes good business sense. When there is a mixture of proprietary systems and legacy systems in an organization, such a solution can resolve access to disparate data. By organizing that data in a single repository, better forecasting and sales analysis can be performed. Knowledge of corporate strengths and deficiencies can also lead to cost savings and productivity gains. The lab will naturally be part of this information access revolution, particularly for businesses whose products are dependent upon laboratory data.


The above column on Lab Data Management was written by Helen Gillespie, Web Master for the LIMSource, and reprinted from Scientific Computing & Automation, January 1997

About the LIMSzine | Library | About the LIMSource | LIMSource Contents | Home