Originally written in 2017.

On data literacy


tech

Over the next 5 years, certain skills are going to become essential skills irrespective of one’s role. One of them is the ability to work with data intuitively. The ability to load data into an appropriate tool — that may be excel, or R Studio or Jupyter or Wolfram alpha pro, or tableau or PowerBI and then work with it to develop hypothesis around causality, patterns, clusters and verify it will be a crucial determinant of one’s responsiveness in the face of data.

There are innumerable learning resources for the inquisitive mind today — from KhanAcademy to Coursera to Udemy to Udacity to channel9 to iTunes University to the brilliant free videos uploaded by Stanford, Harvard and other top educational institutes.

We are at a crucial juncture of mankind’s technology evolution — as learning and cognitive systems become more nimble and skilled, for us to continue to be in the driver’s seat, we need to rely upon the single most skill that has distinguished us from all other species — adaptability. Learning is a never ending pursuit — one can never stop and one must never. Start with understanding the tool most of us claim innate familiarity — the venerable Excel. From PivotTables, to fetching data via macros to Scenarios to Solver and What If analyses. Then start using the free version of tools such as PowerBI and R with R Studio. Data abounds all around us, it needs a curious mind to unlock it — what is the most commonly used word across all of one’s sent emails? What time of the day has one typically taken your best pictures on your cell phone (determine photo quality and corroborate with EXIF data)? What time of the day has one sent the maximum chat messages? What is the typical duration of one’s lunch hour (that can be determined from the user login/logoff messages around lunch hour from the Windows Event Log)? How many check-ins did you commit when the International Space Station was right above you etc. All of these will give a feel for semi-structured data — data that’s not necessarily normalized and stored in a traditionally queryable data source.

As you start working with data, you will find yourself searching for the best way to represent that data — more from making sense when you are looking for patterns than from a presentation point of view — and that’s where reading the stalwarts in the industry come in handy — Edward Tufte, Donald Norman, Nancy Duarte and Jakob Nielsen. I will leave you with a very interesting read — the case of the rogue train — and how it was caught through data analysis — How we caught the Circle Line rogue train with data.