Python data science, analysis, tools,
We're the data generation
Most posts of this blog will deal with data science and digital innovations. It is an exciting field that is not just the sexiest job of the 21st century.
Health, sport, finance, web, or agriculture, humans have always analyzed data. Only tools and statistics models have evolved. Let’s explore that !
Ready to get into it ?
Let me introduce the first big picture of the data scientist work. This overview will help to not being lost in this vast wonderful ocean.
Python, R, SQL
I choosed to focus on Python, because it’s for me much more intuitive than R. It’s possible that you also found some SQL ressources.
Machine Learning is divided into 4 big learning parts : supervised, unsupervised, reinforcement and semi-supervised.
- Supervised : regression or classification algorithms
- Unsupervised : clustering, association, autoencoders, anomaly detection
You will have also to deal with operations like :
- Regularization : Ridge, Lasso, Elastic net
- Cross-validation : K-fold
- Evaluation metrics : ROC, CAP, AUC, accuracy, sensitivity (recall), specificity (selectivity), precision
- Dimensionality reduction : Principal Component Analysis, Linear Discriminant Analysis, Kernel PCA
- Boost : XG Boost
- Scaling : standardization, normalization
- Encoding : TargetEncoder
- Problems : overfitting, dirty and missing data, dummy trap, outliers, imbalanced dataset
With Python, you can easily use Matplotlib and Seaborn.
If you need more options, and interactive visualizations, i invite you to check what is it possible to do with Bokeh and Dash.
You can also use Tableau, which is a very interesting software for data visualization. Plotly also.
And ggplot2 for R.
Coursera, Udacity, Udemy, EdX, .. There are a lot of free and not expensive courses (less than 20$ if you wait for offers).
Choose the more recent ones, and be sure it’s in Python 3. Data science is full of novelties, tools evolve.
Here’s the article with Python courses for data science.