How we
Do It ?

Cleaning Your Data

Here is the typical process to clean your data. We usually follow this steps to clean your data using industry standard tools to give you the best result in a timely manner. We use the same set of tools for small project and same set of tools in Cloud environment if we are dealing with millions of rows of data so our process is consistent all across the board.

1

Clean Data By Deleting Formats

It is not uncommon in Excel to apply different formatting options (cell color, font, borders, etc.) to make important data fields stand out and easily recognizable. However, excessive formatting may have the opposite effect; it will be difficult to filter, organize, and essentially work with the data.

2

Parsing Data from Text to Columns

Notice the last column, column H, crammed the Address, City, State, and Postcode into one single column and is separated by commas only. This is a common issue that makes it hard to filter data and use Excel functions and formulas.

 

3

Removing Extra Spaces And Non-Printable Characters

Some parts of the imported data’s text may contain leading and trailing spaces, these unwanted spaces make it complicated to navigate, collect data, and presentation-wise it looks confusing and unprofessional. Also, non-printing characters can cause issues when sorting and using lookup formulas. These unnecessary characters make it hard to understand the text.

 

4

Handle Null Values

If data was or mistakenly missed out or incorrectly entered into a system, certain data elements will be missing or blank. Similar to the example data from the retail clothing company where some customers did not enter their phone number. This blank cells will create issues when using functions and formulas. To perform data cleaning in Excel, use the Editing Group’s Go To Special function.

5

Convert Numbers Stored As Text into Numbers

It is not uncommon when data is imported from outside sources the numbers are stored as text values. Data cleaning with Excel must be done to avoid calculation issues. For example, the SUM function in Excel [link to sum articles] ignores non-numeric values. Total balances will be incorrect as it fails to sum all numerical values. How to clean data in Excel in this case?

6

Do Find And Replace to update data

Find and Replace is an essential tool for data cleaning in Excel. To standardize and clean data in Excel, you can find and replace blank cells with text, zeros with numerical values, input errors with correct values. Find and Replace can also be used for updating reference cells in formulas and functions.

To illustrate Data cleaning with Excel’s Find and Replace, notice the City Names contain inconsistent customer input. City names North West and South East were instead entered as “nw” and “se” respectively. Also, a few are entered as lower case characters.

7

Removing Duplicates

Duplicates in data can come from various reasons; import and export issues, customer input errors, inadequate data quality controls, etc. Data stored and collected from different sources then exported to Excel oftentimes is not 100% accurate as moving data between 2 systems can cause duplicates.

The most common cause of duplicates is human errors such as customers mistakenly use the wrong input field, data access is not secure and any team member can make changes, and management failing to implement consistent quality control or checks.  

Depending on your data, finding and removing duplicates can be done in single and multiple columns and rows. For the retail company’s data example, data cleaning in Excel through identifying duplicates is done based on the customer’s full name and order number.

8

Fixing Errors

Excel has many useful and versatile tools but if not used properly, various errors will appear. [link to Error checking in excel article] In the retail clothing data example, notice the #NA error appears under Column E, Order Discount.

 

9

Spell Check

Yes, Excel has spell checking abilities! Nothing can diminish a financial model, report, and presentation’s integrity than misspelled words and typo errors. To access the Spelling check tool, go to the Review tab and select Spelling from the Proofing Group. Alternatively, the quickest way is to press the F7 key.

 

10

Changing Text Values to Proper Case

For some reason, other users do not follow the capitalization rules when entering text so it is not uncommon for names, addresses, titles, etc. to have inconsistent formats. Depending on your preference, you can clean and edit text cases to upper and lower cases. Three functions make data cleaning with Excel in this scenario easy- UPPER, LOWER, and PROPER functions.

 

Batch Data Cleaning Pipeline in Cloud

When we have to clean millions of rows of data on a daily basis then we automate the data cleaning process by creating an automated data pipeline. We use Azure Data Factory, Azure Databricks, Azure Logic app tools and services to automate data cleaning and communicating with the customer in a cost effective and timely manner so it is a win-win for everybody.