Plainify - Use AI to Rewrite Complex Documents for the Lay-Person
Transforming complex documents into readable content using AI - a Python tool that makes technical papers accessible to everyone.
Have you ever opened a medical paper or technical document and felt like you were reading an alien language? I certainly have. After spending hours decoding a particularly dense medical research paper, I thought there had to be a better way. That’s how Plainify was born - a tool that takes complex documents and makes them genuinely readable, without losing their original meaning.
What makes Plainify different from just throwing a PDF at ChatGPT is its preservation of document structure and systematic approach to simplification. Instead of getting a summarized blob of text, you get a properly formatted document where technical terms are explained right where they appear, making it easy to learn as you read.
What It Does
It re-writes the entire paper in easy to understand/lay-person lingo and writes it out in markdown.
At its core, Plainify does something seemingly simple but surprisingly complex: it takes a PDF document (like a medical paper or technical specification) and turns it into a Markdown file where every technical term is explained in plain English. It’s like having a knowledgeable friend reading alongside you, explaining things as you go.
For example, when it encounters a sentence like:
1
The patient presented with acute myocardial infarction and severe dyspnea.
It transforms it into:
1
2
The patient came in with a heart attack (when blood flow to the heart is blocked)
and severe difficulty breathing (trouble getting enough air into the lungs).
The Technical Bits
If you’re interested in using Plainify, here’s what you need to know:
Quick Start
1
2
3
4
5
6
7
8
9
10
# Clone and set up
git clone https://github.com/jmcdice/plainify.git
cd plainify
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Run it
mkdir outputs
./plainify complex_paper.pdf outputs/simplified_paper.md
Key Features
- Converts PDFs to Markdown with preserved formatting
- Explains technical terms in parentheses
- Handles large documents by splitting them intelligently
- Processes chunks concurrently for better performance
- Includes comprehensive logging for when things go wrong
Requirements
- Python 3.8+
- OpenAI API key
- A few Python packages (all in requirements.txt)
- Tesseract OCR (optional, for scanned documents)
Under the Hood
The magic happens in four stages:
- The PDF is converted to Markdown while preserving its structure
- The content is split into chunks that preserve context
- Each chunk is processed by GPT-3.5-turbo to identify and explain complex terms
- Everything is reassembled into a clean, readable document
Limitations and Future Plans
Currently, Plainify works best with text-based PDFs and may struggle with heavily formatted documents or complex tables. I’m working on improving these areas, along with adding support for:
- More document formats
- Custom terminology dictionaries
- Batch processing
- A simple web interface
Try It Yourself
The code is available on GitHub under the MIT license. If you run into any issues or have suggestions for improvements, feel free to open an issue or submit a PR.
Contributing
Interested in helping out? The project could use help with:
- Supporting more document formats
- Improving OCR integration
- Adding new features
- Testing with different types of documents
Check out the GitHub repository for more details on contributing.
Got complex documents you need to simplify? Give Plainify a try and let me know how it works for you.