Build Cancer Cell Classification using Python — scikit-learn

Dhaval Thakur
Towards Dev
Published in
4 min readDec 16, 2021

--

I am back with another data science pet project which can be attempted by any person who is willing to learn about classification algorithms or is new to data science and wants to practice some small projects before tackling the big industry level projects.

In this pet sample project, we would be classifying cancer cells based on their features, and identifying them if they are ‘malignant’ or ‘benign’. We will be using scikit-learn for a machine learning problem using Gaussian Naive Bayes Algorithm!

Machine Learning Model creation (Image Credits: Data Camp)

Note that you would require a scikit-learn python package installed in your system in order to accomplish this project.

If you don't have scikit-learn, you may use this command in the terminal to install

pip install scikit-learn

Implementation of Cancer Cell classification machine learning algorithm

  1. We would be importing the scikit — learn python module along with the dataset.
import sklearnfrom sklearn.datasets import load_breast_cancer

2. Now lets store it in a variable.

data = load_breast_cancer()

3. Creating features set and labels. Now as we know that machine learning means that we train the system first on given set of data and then we test its accuracy from the test dataset. In this step we are going to do the same.

label_names = data['target_names']labels = data['target']feature_names = data['feature_names']features = data['data']

here if we want to see the contents of label_names variable it would be:

['malignant' 'benign']

To view the feature names, this variable would look like:

['mean radius' 'mean texture' 'mean perimeter' 'mean area'
'mean smoothness' 'mean compactness' 'mean concavity'
'mean concave points' 'mean symmetry' 'mean fractal dimension'
'radius error' 'texture error' 'perimeter error' 'area error'
'smoothness error' 'compactness error' 'concavity error'
'concave points error' 'symmetry error'…

--

--

Data Enthusiast, Geek, part — time blogger. Every week 1 new Data Science/ Product Management story 🖥 I also write on Python, scripting & blockchain