🧬 Generative Molecular Structures using GANs
🎯 Project Summary
This project focuses on building a custom Generative Adversarial Network (GAN) capable of generating realistic visualizations of molecular structures. The model is trained on images derived from .cif
files sourced from the RCSB Protein Data Bank. The goal was to prototype a system capable of learning structural patterns from real-world molecular data—even under limited dataset constraints.
🧠 Key Functional Requirements
- Realistic Molecular Visualizations (RMSV): Generate high-resolution molecular structure images based on scientific data.
- Input/Output: Accept processed .cif file images as input → Output PNG renderings of synthetic molecule visuals.
- Small Data Optimization: Maintain structure learning capability with just ~270 training samples.
- Pipeline Automation: Convert
.cif
→.html
(via Py3Dmol) →.png
(via Selenium headless rendering).
📥 Dataset Source
Data was sourced from the RCSB Protein Data Bank, focusing on publicly available .cif molecular structure files. The data pipeline involved:
- Downloading
.cif
files - Rendering
.html
3D molecular structures using py3Dmol - Capturing output as
.png
using Selenium headless Chrome driver
🛠️ GAN Architecture
Discriminator
def make_discriminator_model():
model = tf.keras.Sequential([
layers.Conv2D(64, (5, 5), strides=2, padding='same', input_shape=(128, 128, 3)),
layers.LeakyReLU(),
layers.Dropout(0.3),
layers.Conv2D(128, (5, 5), strides=2, padding='same'),
layers.LeakyReLU(),
layers.Dropout(0.3),
layers.Conv2D(256, (5, 5), strides=2, padding='same'),
layers.LeakyReLU(),
layers.Dropout(0.3),
layers.Conv2D(512, (5, 5), strides=2, padding='same'),
layers.LeakyReLU(),
layers.Dropout(0.3),
layers.Flatten(),
layers.Dense(1, activation='sigmoid')
])
return model
def make_generator_model():
model = tf.keras.Sequential([
layers.Dense(8 * 8 * 256, use_bias=False, input_shape=(100,)),
layers.BatchNormalization(),
layers.LeakyReLU(),
layers.Reshape((8, 8, 256)),
layers.Conv2DTranspose(128, (5, 5), strides=2, padding='same', use_bias=False),
layers.BatchNormalization(),
layers.LeakyReLU(),
layers.Conv2DTranspose(64, (5, 5), strides=2, padding='same', use_bias=False),
layers.BatchNormalization(),
layers.LeakyReLU(),
layers.Conv2DTranspose(32, (5, 5), strides=2, padding='same', use_bias=False),
layers.BatchNormalization(),
layers.LeakyReLU(),
layers.Conv2DTranspose(3, (5, 5), strides=2, padding='same', use_bias=False, activation='tanh')
])
return model