Using CART, Random Forest, Logistic Regression and LDA to solve a classification problem

This article describes my project on Credit Card Fraud Detection. If you are interested in the code, it can be found here.

The COVID-19 pandemic has caused drastic decline in the cash usage with everything slowly turning online these days. This has given rise to an unprecedented surge in contactless payments. The significant increase in credit card transactions, both online and in-person, has resulted in increased fraudulent transactions. Fraudulent methods are becoming more sophisticated and difficult for traditional fraud detection software to identify. …

A guide to Simple Linear Regression

Though straightforward comparative tests of individual statistics are useful in their own right, you’ll often want to learn more from your data.

In this story, you’ll look at linear regression models: a suite of methods used to evaluate precisely how variables relate to each other.

Regression analysis

Regression analysis is a powerful statistical method that allows you to examine the relationship between two or more variables of interest.

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the ‘outcome variable’) and one or more independent variables (often called ‘predictors’, ‘covariates’…

Create elegant data visualizations using the Grammar of Graphics

Being able to create visualizations or graphical representations of data at hand is a key step in being able to communicate information and findings to others from a non-technical background.

In this story, you will learn to use the ggplot2 library in R to declaratively make beautiful plots or charts of your data.

What is Data Visualization?

Wiki says “Data visualization is the graphic representation of data. It involves producing images that communicate relationships among the represented data to viewers of the images.”

What is ggplot2?

Feelings are complicated, but sentiment analysis need not be. Words have always been important when it comes to communicating concepts and emotions. Given the short attention span with which we now consume words on social media platforms, the choice of what words to use has become even more pressing.

Recently, I happen to come across “Text Mining with R” written by Julia Silge and David Robinson while exploring different datasets and packages available in R and was immediately drawn towards this book. I had no prior knowledge about text mining or sentimental analysis, and I decided to read the book.

A recent study of data scientists on Twitter found that they spend 80% of their time in data cleaning rather than mining or modelling data and 59% among them found it least enjoyable part of their work. This very fact led me to think about the different ways to overcome this obstacle and make this phase of data prepping more enjoyable. If you are struggling with the same as well, I welcome you to the world of Tidyverse!

The tidyverse is an opinionated collection of R packages designed for data science. …

Has anyone of you thought about how tier 1 business such as Domino’s Pizza has been able to continue to operate during this pandemic and other businesses had to press pause? This very question triggers the thought about the Supply Chain Management System of the Organization.

“Supply Chain Management is the management of goods and services. This involves movement and storage of raw materials, work-in-progress inventory, and finished goods from the origin point to the consumption point.”

COVID-19 does not require any introduction as it’s now known worldwide. Here we are discussing the impact of this pandemic on Supply Chain…

