Data Cleaning[edit]
Once processed and organized, the data may be incomplete, contain duplicates, or contain errors. The need for data cleaning would arise from problems in the the-the-same-data is entered and stored. Data cleaning is the process of preventing and correcting these errors. Common tasks include record matching, deduplication, and column segmentation.[4] Such data problems can also be identified through a variety of analytical techniques. For example, with financial information, the totals for particular variables could be compared against separately published Numbers believed to be reliable. [5] unusual amounts above or below pre-determined thresholds may also be reviewed. There is several types of data cleaning that depend on the type of data. Quantitative data methods for outlier detection can is used to get rid of of likely incorrectly entered data. Textual data spellcheckers can be used to lessen the amount of mistyped words, but it's harder to tell if the words thems Elves is correct. [6]
When data is typically owned and organized, the data may be incomplete, contain duplicates, or contain errors. The need for data cleansing is caused by problems such as data being entered and stored. Data cleansing is the process of preventing and modifying.
What is data cleansing? (translated from Wikipedia)