Chapter 3 Technical Path

3.1 Introduction

3.1.1 Opening Thoughts

There are many ways to learn technical material. Paid courses, YouTube videos, blog posts, and books are all equally effective based on your learning style. The content in the following sections tries to have a healthy balance of all content types.

By default, all of the recommended learning content is free. There are additional resources mentioned where you might have to pay for. These serve as additional avenues to strengthen the concepts and techniques learned in the initial free content.

Machine Learning falls into two camps of programming languages, those who use R and those who leverage Python. Both have their strengths and weaknesses. All learning content will have an equal amount of resources for both R and Python.

3.1.2 R or Python?

Short answer: Eventually you will have to learn both to become a successfully data practitioner, but if you could only pick one choose Python.

3.1.3 Getting the Most out of Learning

Deliberate practice is the best way of getting the learning to stick, and to rapidly evolve your skills.

Whenever you learn something new in the data and AI world, it’s best to usually apply it immediately to a real world project within your job or company. By using a real world problem to practice what you just learned, you’re able to reinforce the new knowledge into your long term memory while at the same time driving impact in your job by solving real problems. What a bonus! Be careful about only working on “toy data sets”, which is public data that has been beat to death by hundreds of blogs and courses. The real world of data is messy and unpredictable, so working on things related to your current job or company gets you comfortable with that uncertainty even faster.

Don’t feel bad looking up things on Bing/Google. Every technical person who works with computers today most likely looks up things online every day. Software syntax takes time to learn, and some of the best engineers still don’t remember all the ins and outs of a language. When it doubt look it up online! Sites like Stack Overflow will quickly become your best friend as you try to work through issues in your code.

3.2 Installing Software

Getting started with the right developer environment can save tons of headaches further down the road. While there are many options on what type of Interactive Developer Environment’s (IDE) to use, the below ones are quickly becoming the standard for each language.

3.3 Data Analysis and Manipulation

Learning how to manipulate data outside of existing tools like Excel or Power BI quickly give you data super powers you never thought possible before. Breaking out of the four walls of excel and into the data universe by leveraging languages like Python and R unlock so much more potential for impact in whatever job you do. Even if you don’t plan to build your own Machine Learning models, knowing the basics of data manipulation is an important skill to have, and builds a data foundation that Machine Learning is built upon if you ever want to come back and start building models.

3.4 Version Control

If you plan to work with others on any project that contains code, knowing version control and specifically git is a must. Get up to speed with how to use git and it’s most famous git server, GitHub. This skill opens up new opportunities to contribute to open source projects and even build your own open source software. It’s also required to work on any technical team who collaborate on projects together.

3.5 Machine Learning Basics

Let’s get our feet wet on the introductory concepts of machine learning. Learn more of the terminology, build a few models, and start to understand how the data science life cycle starts to take shape. This section is by no means a comprehensive view of machine learning today, but it’s a good starting point.

Future sections will cover most of these topics again but in more depth. Having some repetition of terms and concepts will help reinforce the knowledge in your brain and help you understand how there is always different angles to attack data problems with machine learning.

3.6 Regression

Regression deals with predicting numerical quantities. It will quickly become your bread and butter for leveraging machine learning in finance. Understanding how to use software packages to train models and how each model works are both crucial to leveraging regression techniques to the fullest. Most of the resources here deal with examples of regression in action. Take time to soak in how these tutorials and experts approach a regression problem, how they structure their code, and the way they communicate the outputs.

3.7 Time Series

Time series forecasting is a sub domain of regression, where we are trying to forecast a numerical quantity over time. Prediction over time is a separate world in machine learning, and has deep roots in more classic statistical methods.

While most regression models can be turned into a time series model by incorporating various date based features, there are also traditional statistical models that have been solely used for time series forecasting for decades. An interesting component of time series forecasting is that it can use multivariate data as well as univariate. For example you could forecast sales revenue by just using previous historical values of sales revenue (univariate) or use external regressor information like country holidays and population size to help forecast (multivariate). Knowing both types of models is a key component of being an expert time series practitioner.

3.7.4 How Various Models Work

3.8 Classification

Classification models try to forecast an outcome of an event. For example if a credit card transaction is fraud or if a self-driving car sees a stop sign next to the road. Usually the prediction outcome is a binary yes or no, and oftentimes a probability score between 0 and 1. With 1 having a 100% probability of something occurring. Classification models can even predict an outcome across multiple categories or buckets, like if a picture of a fruit is an apple, pear, orange, etc.

Classification models are some of the most widely used machine learning across industries today. Within finance there are many important implementations that range from compliance to risk management.

3.9 Unsupervised Learning

Unsupervised learning is an evolving field of machine learning, and many say is the future of AI in general. Instead of relying on existing data with known outcomes to learn from like supervised learning (regression and classification), unsupervised learning tries to learn its own unique things about a data set without needing to know the answer ahead of time. This can be a game changer in finance when trying to segment customers into specific groups based on their purchasing behavior or finding anomalies to flag for potential fraud or corruption.

3.9.3 How Various Models Work

3.10 Natural Language Processing

Natural language processing (NLP) is all about extracting insight from unstructured data in the form of text. Our world is drowning in openly available text from twitter, blogs, and countless documents like PDFs that could be useful within our jobs in finance. Knowing how to extract insights out of a pile of documents is a super power worth learning about!

3.11 Deep Learning

The most rapidly evolving area of AI is deep learning, which use a completely new modeling architecture called neural networks. Most of the most exciting advancements in AI over the last decade have come from training neural networks on huge data sets. Deep learning has the potential to totally change how we build any type of prediction across all types of machine learning.

3.12 Model Interpretability

A lot of times you may be asked to help understand how a particular machine learning model came up with its prediction. Knowing how to leverage various interpretability frameworks helps decode the black box of these models for better adoption by non-technical business partners and enables better understanding what features have the most impact in your model.

3.13 AI Ethics and Fairness

With great power, comes great responsibility. As machine learning becomes more ingrained in our society, ethical consequences of poorly deployed models will only increase. Make sure you are building models that help enrich a diverse and inclusive future by checking out the below resources.

3.14 Web Apps

Building user interfaces that bring machine learning models directly to the end user to consume code free can be a total game changer for your business partners. You don’t have to be a web developer to build applications that your users will love thanks to some amazing packages within the data science community. Check them out below.

3.15 Production on Azure

One of the harder aspects of machine learning is getting your work in a production environment to run at scale. This involves loading models to run in a cloud like Microsoft Azure.

3.15.2 General Data Analytics

3.15.4 Additional Resources

3.16 Life as a Data Scientist

Ready to commit to data science as a career? Check out the below content that features interviews from existing data scientists and best practices to be a great data practitioner.

3.16.1 Build Models and Build Community

To-DO