A generic Java framework targeted to data cleaning and data validation.

See:
          Description

Packages
org.gbif.datatester Package that contains the framework for creating and running data tests.
org.gbif.datatester.exception Package that contains all exceptions used by the framework and by the tests.
org.gbif.datatester.tests Package that contains a set of generic data tests that can be useful in different situations.

 

A generic Java framework targeted to data cleaning and data validation. The idea behind this project has been originally conceived within the biodiversity informatics field. It followed the establishment of the first global networks that served primary data from biological collections. With the increase in the ammount of shared data, which included researchers and policy makers among its users, data quality naturally gained importance. In this context, some networks started to develop tools and interfaces to help with data cleaning and data validation issues. The main idea of this project was to gather all knowledge from those first data cleaning tools and to produce a new framework that could serve as a common ground for implementing and running a large number of data tests.

The framework has been originally developed as an open source software by the Reference Center on Environmental Information (CRIA) with funding from the Global Biodiversity Information Facility (GBIF) and the Gordon and Betty Moore Foundation. Despite being originated from the biodiversity informatics field, it is by no means bound or limited to this area. Its design pursued the following goals:

Two Java packages were created: one containing the framework itself, and another containg a set of generic tests that can be useful in different situations.