Commit 8373a5a5 authored by Rémy Huet's avatar Rémy Huet 💻
Browse files

Clear

parent ddb7750f
%% Cell type:markdown id: tags:
# AOS 1
## Band reduction in multispectral images
Authors : Mathilde Rineau, Rémy Huet
A multispectral image is an image that has several components. For example, a color image
has 3 components: red, green and blue and each pixel can be viewed as a vector in R3. More
generally a multispectral image of size N×M with P spectral bands can be stored as a
N×M×P array. There are N×M pixels living in Rp.
When the number of spectral bands Pis too large, it is desirable to somehow reduce that
number ultimately to 3 for viewing purposes. This process is called band reduction.
The aim of this work is to propose a method using the PCA to perform a band reduction to 3 bands and to use it on a multispectral image.
%% Cell type:code id: tags:
``` python
import numpy as np
from scipy.io import loadmat
# First load the image from the MATLAB data file
image = loadmat('PaviaU.mat')
```
%% Cell type:markdown id: tags:
Introspection on `image` variable shows us its type and shape. The image is a python dictionary containing :
- A header as a string
- The version of the image
- Some "globals" (empty array)
- The image itself under `paviaU` as an array.
We do not care about other data than the image, so we retrieve only the data in an `image_data` variable.
%% Cell type:code id: tags:
``` python
image_data = image['paviaU']
print(type(image_data))
print(image_data.shape)
```
%%%% Output: stream
<class 'numpy.ndarray'>
(610, 340, 103)
%% Cell type:markdown id: tags:
The image is of type `ndarray`, a multidimensional array from `numpy`.
The shape of the array is 610x340x103.
This means that the image is composed of 610x340 "pixels" each composed of 103 bands.
First, we resize the image as a two dimensional array : we keep the 103 bands but we "merge" the lines and columns as a single line.
%% Cell type:code id: tags:
``` python
# X is our data reshaped as a vector of samples (each sample containing 103 bands)
X = image_data.reshape((-1, 103))
```
%% Cell type:markdown id: tags:
For a good PCA analysis, the data needs to be rescaled.
We will rescale it from 0 to 1 using `MinMaxScaler` from `scikit-learn`.
We will perform our analysis with the value from 0 to 1.
In a second time, we will rescale the data from 0 to 255 for printing purposes.
%% Cell type:code id: tags:
``` python
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(X)
X_scaled = scaler.transform(X)
```
%% Cell type:markdown id: tags:
Now, we can use PCA method on our data.
For this purpose, we will use the `PCA` object of `scikit-learn`. This method centers the data so we don't need to do it ourselves.
We will instantiate the object with a given number of components equal to 3 (we need 3 bands).
%% Cell type:code id: tags:
``` python
from sklearn.decomposition import PCA
pca = PCA(n_components=3)
pca.fit(X_scaled)
output_data = pca.transform(X_scaled)
print(output_data.shape)
```
%%%% Output: stream
(207400, 3)
%% Cell type:markdown id: tags:
The shape of our data is 207400x3.
Using PCA, we kept the 208400 "pixels" but reduced band number to 3.
The method `imshow` from `pyplot` accepts float data from 0 to 1 or integer data from 0 to 255.
For simplicity, we will only rescale the PCA output from 0 to 1 and pass it to `imshow`
%% Cell type:code id: tags:
``` python
import matplotlib.pyplot as plt
# Re-fit the scaler for the output data
scaler.fit(output_data)
output_scaled = scaler.transform(output_data)
# Reshape the output to print it as an image
output_image = output_scaled.reshape((610, 340, 3))
plt.imshow(output_image)
```
%% Cell type:markdown id: tags:
## Questions about the solution
As requested, we have performed a dimension reduction to 3 principal components which is a very significant reduction considering that the original dimension was 103. We might have chosen another reduction for instance by considering the percentage of explained variance instead of the number of principal components or by drawing the associated scree plot.
## Limits
By reducing the dimension from 103 to 3 we have lost information and we can't know exactly what this information was and how much it was significant. However, this reduction allows us to visualize more easily the data, indeed band reduction is supposed to give the same color to similar objects.
%% Cell type:markdown id: tags:
## Exploring variants of the methods
As said before we might have choosen to use the explained variance to determine how much principal components have to be retained.
You can see below two plots, the first one is a scree-plot which plots the importance of each principal components (from 1 to 103).
The second one represents the cumulative explained variance in terms of number of principal components.
We observe on the first plot that only 3 principal componants are really significant which is consistent to the results of the second plot where the explained variance grows very quickly to 1.
Consequently, by using another implementation of PCA we would have had the same results.
%% Cell type:code id: tags:
``` python
from sklearn.decomposition import PCA
pca = PCA()
pca.fit(X_scaled)
plt.figure()
plt.bar(range(1, X.shape[1]+1), pca.explained_variance_)
plt.show()
plt.figure()
plt.plot(range(1, X.shape[1]+1), np.cumsum(pca.explained_variance_ratio_))
plt.show()
```
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment