SSA Data Quality checker

SSA Data Quality checker is a customizable solution allowing to automatically get insights into quality of data based on consistency rules covering the following data quality dimensions:

Completeness Correctness Formatting Ranging Non-duplication Language detection Spelling URL Accessibility Image Analysis Document File Analysis

SSA Data Quality checker can be customized either as a standalone tool or a component integrated into existing data processing workflow.

The customization may include not only modifications to the existing consistency rules but also the creation of new ones as well as support for various data formats, volume of data, interfaces, API and transfer protocols.

Use Cases

After data entry

After data entry

After data scraping

After data scraping

After data export

After data export

Before data import

Before data import

Before and after<br/>deduplication

Before and after
deduplication

Before and after<br/>data cleansing

Before and after
data cleansing

Before and after<br/>data merge

Before and after
data merge

Before and after<br/>data enrichment

Before and after
data enrichment

Consistency rules

Completeness

Completeness

Completeness

  • Total number of records
  • Total number of attributes
  • Percentage of non-empty values
  • Number/percentage of non-empty values for each attribute

Correctness by data type

Correctness by data type

Correctness by data type

  • Numeric
  • Boolean
  • Date
  • String
Correctness by data type

Formatting

Formatting

Formatting

  • Validity of email address
  • Validity of URL
  • Validity of phone number
  • Validity of postal address
  • Validity of date and time
  • Validity of HTML

Numeric ranging

Numeric ranging

Numeric ranging

  • Number/percentage of 0 values
  • Minimal, maximal and average values
  • Number of records with [minimal, minimal + DELTA), (maximal - DELTA, maximal] and (average - DELTA, average + DELTA) values
  • Number of records in the 90% percentile
  • Frequency distribution
Numeric ranging

String ranging

String ranging

String ranging

  • Minimal, maximal and average length
  • Number of records within the 90% percentile length
  • Number/percentage of records that contain characters only, digits, special symbols
  • Number/percentage of records that contain "http” or "https"

Date and Time ranging

Date and Time ranging

Date and Time ranging

  • Minimal, maximal and average values
  • Number of records with [minimal, minimal + DELTA), (maximal - DELTA, maximal] and (average - DELTA, average + DELTA) values
  • Number of records in the 90% percentile
  • Frequency distribution
Date and Time ranging

Boolean ranging

Boolean ranging

Boolean ranging

  • Number/percentage of True/False/Non-boolean values

Non-duplication

Non-duplication

Non-duplication

  • Number/percentage of explicit duplicates
  • Number/percentage of implicit duplicates
Non-duplication

Language detection

Language detection

Language detection

  • List of used languages

Spelling

Spelling

Spelling

  • Number of potential issues (English only)
Spelling

URL Accessibility

URL Accessibility

URL Accessibility

  • Number of URLs
  • Number of inaccessible URLs

Image Analysis

Image Analysis

Image Analysis

  • Number/percentage of images with file size larger than MAX_FILESIZE
  • Number/percentage of images with size in pixels out of predefined values
  • Number/percentage of images with resolution (dpi) lower than predefined value
  • Number/percentage of images with aspect ratio out of predefined value
  • Number/percentage of images with image format other than predefined formats
Image Analysis

Document File Analysis

Document File Analysis

Document File Analysis

  • Number/percentage of documents with file size larger than MAX_FILESIZE
  • Number/percentage of documents with file format other than predefined formats

Customizations

Standalone application

Standalone application

Integrated  component

Integrated component

Consistency rules modification

Consistency rules modification

New consistency  rules

New consistency rules

Data sources

Data sources

Data formats

Data formats

Transfer protocols

Transfer protocols

Large scale

Large scale

Your message has been sent!

Keep a close watch on your inbox. We’ll get in touch with you in no time.

Have a project on top of Data Quality checker?

Our team of experts is available for your custom projects. Please get in touch to get a quote.

    By clicking “Request a Quote” below you confirm you have read and accepted the Privacy Policy

    you're currently offline