How To Randomly Sample Data Points (Uniform Distribution)

Pseudo Random Number Generators

Suman Prasad
2 min readJun 5, 2022

Previously we had studied in C & Java we can generate random numbers

In C we have a library stdlib in which we have a function called rand() which generates uniform random numbers.

Implementation In Python

Basically, It displays a value from 0 and 1 and picks a value uniformly at random.

Random numbers are uniformly distributed

we can generate random numbers which are non-uniformly distributed but it is not explicitly called out. Most number generators are called uniform distribution.

If we plot this as,

Value will lie from 0 and 1

Problem: Let’s say i have dataset n datapoints and i want to sample m points from this (I want uniformly sample)

What does Uniformly sample means?

It means, each point in my initial dataset

D = n datapoints

Each point should have equal chance of belonging to my new D’ dataset.

suppose, my n has 150 points

n has 150 points

When i am sampling 30 points from this, each point should have equal chance of belonging to my new dataset D’

Example: Let’s See with IRIS DataSet

here n=150points(IRIS DATASET) 4 → 4-dimensional data i.e., Petal Length, Petal Width, Sepal Length, Sepal Width.

Now, Imagine i want to sample 30 points randomly

Let’s understand what’s happening

D = x1,x2,…..x150

sampling the dataset

D’ = x1',x2',…….x30

Since, I have 150 points and i want to generate dataset with 30 points.

So, Probability of each point belonging to my dataset D’

30/150 = 0.2

so random.random() generates datapoints between 0 and 1 and if conditions checks which datapoints are below 0.2

sampled_data.append(d[i,:]) → this append to the final list, neglecting datapoints greater than 0.2

--

--

Suman Prasad
Suman Prasad

Written by Suman Prasad

Masters In Data Science @Central University Of Rajasthan. #datascientistenthusiast

No responses yet