Utilizing tools to help clean data

1 Reply

Utilizing tools to help clean data

Posted by Shannon Bishop on Dec 6, 2018 12:36 pm

You mention in your article using tools to help clean data. Are there any pitfalls to using certain tools? if so, how can one remedy or prevent these from occurring?   
  • Locked

Re: Utilizing tools to help clean data

Posted by Gary Netherton on Dec 6, 2018 12:44 pm

There can be pitfalls to using any tool.  The biggest pitfall I've encountered is not understanding how the tool works.

Whether you are using regular expressions in Excel through Visual Basic for Applications (VBA) or you are using tools from the tidyverse in the R programming language, if you do not understand how the tools work (what inputs do they expect, what format, etc.) then you can easily make a mistake.  For example, using the LEFT() function in an Excel spreadsheet may work fine as long as all  off the target data is of the same length.  If, on the other hand, you are expecting only 6-digit part numbers and there are a few that are 7 digits, you may misinterpret your results simply because your information is not completely accurate.  I currently use model numbers that are 3 or 4 digits long.  I am getting ready to add some new models that will increase the length to 5 for some products.  If I do not account for that in my searches and analyses, then it could mean the difference between evaluating an old-style  product versus the new release.