Plainify - Use AI to Rewrite Complex Documents for the Lay-Person

Transforming complex documents into readable content using AI - a Python tool that makes technical papers accessible to everyone.

Posted Nov 4, 2024 Updated Nov 4, 2024

By Joey

2 min read

Have you ever opened a medical paper or technical document and felt like you were reading an alien language? I certainly have. After spending hours decoding a particularly dense medical research paper, I thought there had to be a better way. That’s how Plainify was born - a tool that takes complex documents and makes them genuinely readable, without losing their original meaning.

What makes Plainify different from just throwing a PDF at ChatGPT is its preservation of document structure and systematic approach to simplification. Instead of getting a summarized blob of text, you get a properly formatted document where technical terms are explained right where they appear, making it easy to learn as you read.

What It Does

It re-writes the entire paper in easy to understand/lay-person lingo and writes it out in markdown.

At its core, Plainify does something seemingly simple but surprisingly complex: it takes a PDF document (like a medical paper or technical specification) and turns it into a Markdown file where every technical term is explained in plain English. It’s like having a knowledgeable friend reading alongside you, explaining things as you go.

For example, when it encounters a sentence like:

The patient presented with acute myocardial infarction and severe dyspnea.

It transforms it into:

The patient came in with a heart attack (when blood flow to the heart is blocked) 
and severe difficulty breathing (trouble getting enough air into the lungs).

The Technical Bits

If you’re interested in using Plainify, here’s what you need to know:

Quick Start

  
  # Clone and set up
  git clone https://github.com/jmcdice/plainify.git
  cd plainify
  python3 -m venv venv
  source venv/bin/activate
  pip install -r requirements.txt

  # Run it
  mkdir outputs
  ./plainify complex_paper.pdf outputs/simplified_paper.md

Key Features

Converts PDFs to Markdown with preserved formatting
Explains technical terms in parentheses
Handles large documents by splitting them intelligently
Processes chunks concurrently for better performance
Includes comprehensive logging for when things go wrong

Requirements

Python 3.8+
OpenAI API key
A few Python packages (all in requirements.txt)
Tesseract OCR (optional, for scanned documents)

Under the Hood

The magic happens in four stages:

The PDF is converted to Markdown while preserving its structure
The content is split into chunks that preserve context
Each chunk is processed by GPT-3.5-turbo to identify and explain complex terms
Everything is reassembled into a clean, readable document

Limitations and Future Plans

Currently, Plainify works best with text-based PDFs and may struggle with heavily formatted documents or complex tables. I’m working on improving these areas, along with adding support for:

More document formats
Custom terminology dictionaries
Batch processing
A simple web interface

Try It Yourself

The code is available on GitHub under the MIT license. If you run into any issues or have suggestions for improvements, feel free to open an issue or submit a PR.

Contributing

Interested in helping out? The project could use help with:

Supporting more document formats
Improving OCR integration
Adding new features
Testing with different types of documents

Check out the GitHub repository for more details on contributing.

Got complex documents you need to simplify? Give Plainify a try and let me know how it works for you.

AI Projects, Document Processing

ai openai pdf markdown python document-processing

This post is licensed under CC BY 4.0 by the author.