This repository contains the implementation of a quantized auto-encoder for image compression using PyTorch, along with scripts for training, testing, and evaluating the compression performance.
- Python 3.6 or higher
- PyTorch
- Torchvision
- scikit-learn
- scikit-image
- Pillow
- tqdm
- NumPy
- OpenCV
- arithmetic-compressor
Install the required packages using:
pip install -r requirements.txt
- Prepare your dataset by organizing the images into separate folders for training and validation.
- Update the paths in the
train.py
script to point to your dataset directories. - Adjust the hyperparameters (e.g., batch size, learning rate, quantization bits) as needed.
- Run the training script:
python train.py
- weights will be saved in the weights folder in intervals of 5 epochs
The script will train the auto-encoder model, saving the weights every 5 epochs in the weights3/
directory. The training and validation losses will be saved to losses.txt
.
-
Place the images you want to compress in the
images
folder. -
Run the
test.py
script:python test.py
This script will:
- Load the pre-trained auto-encoder model
- Compress each image in the
images
folder using the auto-encoder and arithmetic coding - Save the compressed bitstreams in the
compressed_folder
directory - Decompress the bitstreams and save the reconstructed images in the
decompressed_folder
directory
-
Place your original images in the
images
folder. -
Place the output images in the
decompressed_folder
folder. -
place the train weights in the weights folder
-
Run the
metrics.py
script:python metrics.py
This script will:
- Loop through all image files in the
images
anddecompressed_folder
folders - Calculate the Structural Similarity Index (SSIM) and Peak Signal-to-Noise Ratio (PSNR) values for each image pair (original and compressed)
- Print the SSIM and PSNR values for each image
The auto-encoder model used in this implementation consists of:
- Encoder: A pre-trained ResNet-101 encoder
- Decoder: A custom ResNet decoder
- Quantization: The latent space is quantized to reduce the bitrate
The model was trained using a combination of mean squared error and VGG perceptual loss, and the model outputs feature maps with 512 channels and 8 x 8 spatial dimensions. Then the feature maps are flattened and become a vector of size 32768. The vector is then quantized into B
quantization levels.
In the training phase, noise
is appended to the input image. The noise
is sampled from N(-0.5, 0.5) and then scaled by B
quantization levels. So the final noise vector is:
scale = 2 ** -B
noise = (torch.randn(n) * 0.5 - 0.5) * scale
In the inference mode, the vector is quantized using torch.clamp(0, 1)
and then scaled by B
quantization levels. So the final quantized vector is:
quantized = torch.clamp(vector, 0, 1) * 2 ** B + 0.5
quantized = quantized.int()
Check result and detailed report in the report pdf
This implementation is inspired by the following papers:
- Alexandre, D., Chang, C.-P., Peng, W.-H., & Hang, H.-M. (2019). An Autoencoder-based Learned Image Compressor: Description of Challenge Proposal by NCTU. https://doi.org/10.48550/arXiv.1902.07385
- Wang, B., & Lo, K.-T. (2024). Autoencoder-based joint image compression and encryption. Journal of Information Security and Applications, 80, 103680. https://doi.org/10.1016/j.jisa.2023.103680
- Sougata Moi (M23MAC008)
- Mitesh Kumar (M23MAC004)
- Niraj Singha (M23MAC005)
- Ratnesh Kumar Tiwari (M23MAC011)
This project is licensed under the MIT License.