In a previous post I outlined the dimensions of data science and provided a checklist for the skills needed in each of the four areas. In order to help you navigate these dimensions and create your own learning journey, I’ve compiled a list of my favorite books, articles, and classes for each of these areas along with some general tool recommendations.
Having the mindset of a data scientist includes curiosity, tenacity, scientific thinking, problem-solving and being creative. Acquiring these skills is not something you take a class for and then check it off your list, but rather a continuous journey of improvement. Below is a list of some of my favorite books that have helped me develop this mindset.
Data wrangling is an area where practice makes perfect. Personally, I feel that SQL is foundational in this area, not only because you need to know how to extract data from a database, but learning this language will also teach you about data structures and relational models. Once you feel comfortable extracting data, you can begin to work with the data to reshape it for your machine learning models with feature engineering. You will always be working with different database systems to extract your data from so don’t worry about learning all of them. Once you get the basics you will easily be able to navigate between systems.
Class: SQL for Data Science
Class: Getting and Cleaning Data
Article: Feature Engineering
I really wish there was a class I could recommend for this area, but unfortunately, I haven’t come across one yet. Below are a few of my favorite data visualization books, but this just touches the surface for this area. What’s helped me improve is to get as much feedback from others on how they interpret the information I am presenting. Each time I receive feedback I refine my delivery and presentation and eventually get a little better along the way. There is a really great article by the HBR about the benefits of talking to people as a data scientist. Definitely worth a read.
This is another section where I feel like the open source resources are lacking. The majority of classes I find only teach you the algorithms for data science in regards to a language such as R or Python, but in reality, there is a plethora of tools one can use to build a model. The key is to understand the library of options available to you when selecting a model for your problem and then choosing whatever tool you would like to perform the operations of the model. Andrew Ng’s Machine Learning class is a staple in this area and is a great place to start for anyone wanting to get an overview of the algorithms available for machine learning. I’m also in the process of developing a predictive analytics class, if you are interested you can sign up to be notified here once the class is released.
Class: Machine Learning
General Purpose Tools:
I tried to keep the resources mentioned above tool-agnostic however, there are a few general purpose tools you can use for almost all of the four areas. In my opinion, Python is a really great language if you have some computer programming experience or if it’s your first language. Python allows you to easily wrangle data, visualize it and model it. For Python resources, I would check out the class Introduction to Data Science in Python or Python for Data Science and Machine Learning Bootcamp. R tends to be heavily used in academia and is another great general purpose language for data science. If you would like a specialized data science course in R, I would check out Data Science and Machine Learning Bootcamp with R on Udemy.
If you don’t want to program and would like a GUI interface, I’m really loving KNIME right now. They have a ton of plug-ins for Python, R, and the Keras library for deep learning. Plus the nodes and workflow make it super easy to replicate your work and test new models.
Last but not least, there is always Excel. Although Excel gets a bad rep, I don’t think it’s a bad place to start if you are already comfortable with the tool and work with smaller data sets. One of my favorite data science books is Data Smart by John Foreman. Not only does he do a great job of explaining data science algorithms and providing real business use cases examples, but he shows you how to do all of it in Excel, so definitely check it out if Excel is your jam.
Curated Data Science Classes
If you are looking for something a little more in depth or are starting from scratch, I would look into specializations or nano degrees which are a series of classes. Both Data Camp and Udacity do a great job of curating everything from data analytics, data science, and AI in the language of your choice.
Whatever classes you take or books you read, nothing can replace the benefit that comes from working on a real problem. This allows you to put all the pieces of your learning together and do what data scientists are supposed to do, find insights and automate things!
My advice is to first assess your skill level on the four dimensions, and then work through a problem and see where you might have gaps. In my next post, I will review the data science pipeline and how to get started working on problems.
I’m always looking for new resources so please comment and share some of your favorite data science classes and books, I’d love to check them out!