Data Analysis involves extracting meaning and insights from raw data. It involves methods and algorithms that examine, clean, transform and model the data to obtain conclusions.

- Stackoverflow.com Wiki
3 articles, 12 books. Go to books ↓

Data science continues to generate excitement and yet real-world results can often disappoint business stakeholders. How can we mitigate risk and ensure results match expectations?


Data scientists spend a large amount of their time cleaning datasets and getting them down to a form with which they can work. In fact, a lot of data scientists argue that the initial steps of obtaining and cleaning data constitute 80% of the job.


We think we’re doing the smart thing by choosing to save what we’re saving, but we can’t be sure. In 2010, the CERN Data Centre passed an enormous data milestone: 10 Petabytes of data. By the end of 2013, they had passed 100 Petabytes of data; in 2017, they passed the 200 Petabyte milestone. Yet for all of it, we know that we’ve thrown away — or failed to record — about 30,000 times that amount.