“If you torture the data long enough, it will confess.”
― Ronald H. Coase, Essays on Economics and Economists
Every data has lot of hidden information. These hidden information was required to be investigated to find out the hidden patterns . These patterns can be helpful in making decisions on the procedure , removal of any ambiguity and also in getting key business insights. To solve all this questions, exploratory data analysis was introduced.
Exploratory data analysis is all about getting and overall understanding of data. It is mainly done to find it’s properties , patterns and visualizations. …
“You can have data without information, but you cannot have information without data.” — Daniel Keys Moran
According to wikipedia , data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.Pandas is the most commonly used library which is used in the field of data science and analytics.
Pandas is one of those packages that makes analysing data much easier. Pandas is an open source library for data analysis in Python. It was developed by Wes McKinney in 2008. …
Every Data tells us a story and this story is depicted using data visualizations-Saurav Anand
Plotly is one of the open source library which can be used for the purpose of data visualisation. It is built on plotly.js which in turn is built on d3.js.It is a high-level, declarative charting library. plotly.js ships with over 30 chart types, including scientific charts, 3D graphs, statistical charts, SVG maps, financial charts, and more.Plotly graphs can be viewed in Jupyter notebooks, standalone HTML files, or hosted online using Chart Studio cloud. Plotly itself is a graphics company with several products and open-source tools.
…
The only time a lazy man succeeds is when he tries to do nothing :- Evan Esar
In supervised machine learning , we know that the data are labelled. We know our target or the output variable which needs to be predicted. This output variable may be continuous numeric or categorical in nature for regression or classification problem respectively. The most common steps which is performed during model building are :-
Most of the time spent by data scientists is in data cleaning , data exploration . A detail EDA (exploratory data analysis) is very much important and significant in the data science life cycle. In the year 2020 , there has been lot of automatic EDA libraries have been developed to save the time for the data scientist. Some of the most commonly used automatic EDA are listed in the following blog
There is 100 % certain that in the coming years ,we are going to see lot of tools developed to make the process of EDA more automatic and…
In good information visualizations, there are no rules , no guidelines , no templates, no standard technologies , no style books … You must simply do whatever it takes — Edward Tufte
Seaborn provides a high-level interface to Matplotlib, a powerful but sometimes unwieldy Python visualization library.On Seaborn’s official website, they state:
If matplotlib “tries to make easy things easy and hard things possible”, seaborn tries to make a well-defined set of hard things easy too.
Features of Seaborn :
Matplot is the python library for visualizing data by creating different graphs and charts. Some of the key points related to it are mentioned below :-
The first step is to install the matplotlib. if you are using Anaconda then it is already installed.
If matplotlib is not already installed, you…
“One of the holy grails of machine learning is to automate more and more of the feature engineering process.” ― Pedro Domingos
AutoML refers to automated machine learning. It explains how the end to end process of machine learning can be automated at the organizational and educational level. The machine learning model includes basically the following steps :
Initially all these…
We know that the supervised learning algorithms in machine learning are divided into two parts regression and classification. And when we talk about classification problems , logistic regression is the first algorithm which comes to our mind. Logistic regression is foremost used to model a binary (0,1) variable based on one or more other variables, called predictors.The binary variable being modeled is generally referred to as the response variable, or the dependent variable. So basically the output is discreet in nature. Logistic regression is also called as logit regression.
Logistic regression is a statistical method for analyzing a dataset in…
Database is the information you loose when your memory crashes — Dave Barry
According to Wikipedia, a dataset or data set is collection of data. In the open data discipline , the dataset is the unit to measure the information released in a public open data repository. The most common format for datasets we will find online are in the form of csv and spreadsheets where the data is organized in tabular form. In the case of tabular data, a data set corresponds to one or more database tables , where every column of a table represents a particular variable…
Machine Learning | Data Science |Artificial Intelligence Enthusiast |https://www.linkedin.com/in/saurav-anand-92229584/|https://www.kaggle.com/saurav9786