“If you torture the data long enough, it will confess.”

― Ronald H. Coase,Essays on Economics and Economists

Every data has lot of hidden information. These hidden information was required to be investigated to find out the hidden patterns . These patterns can be helpful in making decisions on the procedure , removal of any ambiguity and also in getting key business insights. To solve all this questions, exploratory data analysis was introduced.

Exploratory data analysis is all about getting and overall understanding of data. It is mainly done to find it’s properties , patterns and visualizations. …

“You can have data without information, but you cannot have information without data.” — Daniel Keys Moran

According to wikipedia , data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.Pandas is the most commonly used library which is used in the field of data science and analytics.

Pandasis one of those packages that makes analysing data much easier. Pandas is an open source library for data analysis in Python. It was developed by Wes McKinney in 2008. …

Every Data tells us a story and this story is depicted using data visualizations-Saurav Anand

Plotly is one of the open source library which can be used for the purpose of data visualisation. It is built on plotly.js which in turn is built on d3.js.It is a high-level, declarative charting library. plotly.js ships with over 30 chart types, including scientific charts, 3D graphs, statistical charts, SVG maps, financial charts, and more.Plotly graphs can be viewed in Jupyter notebooks, standalone HTML files, or hosted online using Chart Studio cloud. Plotly itself is a graphics company with several products and open-source tools.

**…**

The only time a lazy man succeeds is when he tries to do nothing :- Evan Esar

In supervised machine learning , we know that the data are labelled. We know our target or the output variable which needs to be predicted. This output variable may be continuous numeric or categorical in nature for regression or classification problem respectively. The most common steps which is performed during model building are :-

- Training the model which means fitting the algorithm on the training data.
- Testing the model and predicting the output values
- Finding the accuracy of the model.
- Hyperparameter tuning to…

Most of the time spent by data scientists is in data cleaning , data exploration . A detail EDA (exploratory data analysis) is very much important and significant in the data science life cycle. In the year 2020 , there has been lot of automatic EDA libraries have been developed to save the time for the data scientist. Some of the most commonly used automatic EDA are listed in the following blog

There is 100 % certain that in the coming years ,we are going to see lot of tools developed to make the process of EDA more automatic and…

In good information visualizations, there are no rules , no guidelines , no templates, no standard technologies , no style books … You must simply do whatever it takes — Edward Tufte

Seaborn provides a high-level interface to Matplotlib, a powerful but sometimes unwieldy Python visualization library.On Seaborn’s official website, they state:

If matplotlib “tries to make easy things easy and hard things possible”, seaborn tries to make a well-defined set of hard things easy too.

**Features of Seaborn :**

- Using default themes that are aesthetically pleasing.
- Setting custom color palettes.
- Making attractive statistical plots.
- Easily and flexibly displaying distributions.
- …

Matplot is the python library for visualizing data by creating different graphs and charts. Some of the key points related to it are mentioned below :-

- Matplotlib is a 2-D plotting library that helps in visualizing figures.
- It took inspiration from MATLAB programming language and provides a similar MATLAB like interface for graphics.
- It really integrated well pandas which is used for data manipulation
- It is a robust, free and easy library for data visualization.

The first step is to install the matplotlib. if you are using Anaconda then it is already installed.

If matplotlib is not already installed, you…

“One of the holy grails of machine learning is to automate more and more of the feature engineering process.”― Pedro Domingos

AutoML refers to automated machine learning. It explains how the end to end process of machine learning can be automated at the organizational and educational level. The machine learning model includes basically the following steps :

- Data reading and merging and making it ready to use.
- Data preprocessing which refers data cleaning and data wrangling.
- Optimization where the feature and model selection process is done.
- Applying it to the application to predict the accurate values.

Initially all these…

We know that the supervised learning algorithms in machine learning are divided into two parts regression and classification. And when we talk about classification problems , logistic regression is the first algorithm which comes to our mind. Logistic regression is foremost used to model a binary (0,1) variable based on one or more other variables, called predictors.The binary variable being modeled is generally referred to as the response variable, or the dependent variable. So basically the output is discreet in nature. Logistic regression is also called as logit regression.

Logistic regression is a statistical method for analyzing a dataset in…

Database is the information you loose when your memory crashes — Dave Barry

According to Wikipedia, a dataset or data set is collection of data. In the open data discipline , the dataset is the unit to measure the information released in a public open data repository. The most common format for datasets we will find online are in the form of csv and spreadsheets where the data is organized in tabular form. In the case of tabular data, a data set corresponds to one or more database tables , where every column of a table represents a particular variable…

Machine Learning | Data Science |Artificial Intelligence Enthusiast |https://www.linkedin.com/in/saurav-anand-92229584/|https://www.kaggle.com/saurav9786