6

Generative Molecular Structures with GANs

A custom GAN for generating molecular images from RCSB .cif files using a custom data pipeline and deep convolutional architecture.

🧬 Generative Molecular Structures using GANs


🎯 Project Summary

This project focuses on building a custom Generative Adversarial Network (GAN) capable of generating realistic visualizations of molecular structures. The model is trained on images derived from .cif files sourced from the RCSB Protein Data Bank. The goal was to prototype a system capable of learning structural patterns from real-world molecular data—even under limited dataset constraints.

dist

🧠 Key Functional Requirements

  • Realistic Molecular Visualizations (RMSV): Generate high-resolution molecular structure images based on scientific data.
  • Input/Output: Accept processed .cif file images as input → Output PNG renderings of synthetic molecule visuals.
  • Small Data Optimization: Maintain structure learning capability with just ~270 training samples.
  • Pipeline Automation: Convert .cif.html (via Py3Dmol) → .png (via Selenium headless rendering).

📥 Dataset Source

Data was sourced from the RCSB Protein Data Bank, focusing on publicly available .cif molecular structure files. The data pipeline involved:

  • Downloading .cif files
  • Rendering .html 3D molecular structures using py3Dmol
  • Capturing output as .png using Selenium headless Chrome driver

🛠️ GAN Architecture

Discriminator

def make_discriminator_model():
    model = tf.keras.Sequential([
        layers.Conv2D(64, (5, 5), strides=2, padding='same', input_shape=(128, 128, 3)),
        layers.LeakyReLU(),
        layers.Dropout(0.3),
 
        layers.Conv2D(128, (5, 5), strides=2, padding='same'),
        layers.LeakyReLU(),
        layers.Dropout(0.3),
 
        layers.Conv2D(256, (5, 5), strides=2, padding='same'),
        layers.LeakyReLU(),
        layers.Dropout(0.3),
 
        layers.Conv2D(512, (5, 5), strides=2, padding='same'),
        layers.LeakyReLU(),
        layers.Dropout(0.3),
 
        layers.Flatten(),
        layers.Dense(1, activation='sigmoid')
    ])
    return model
 
def make_generator_model():
    model = tf.keras.Sequential([
        layers.Dense(8 * 8 * 256, use_bias=False, input_shape=(100,)),
        layers.BatchNormalization(),
        layers.LeakyReLU(),
 
        layers.Reshape((8, 8, 256)),
 
        layers.Conv2DTranspose(128, (5, 5), strides=2, padding='same', use_bias=False),
        layers.BatchNormalization(),
        layers.LeakyReLU(),
 
        layers.Conv2DTranspose(64, (5, 5), strides=2, padding='same', use_bias=False),
        layers.BatchNormalization(),
        layers.LeakyReLU(),
 
        layers.Conv2DTranspose(32, (5, 5), strides=2, padding='same', use_bias=False),
        layers.BatchNormalization(),
        layers.LeakyReLU(),
 
        layers.Conv2DTranspose(3, (5, 5), strides=2, padding='same', use_bias=False, activation='tanh')
    ])
    return model