FenScribe

Smart PDF Layout Optimizer ✂️

Reduce printing costs by automatically detecting and removing blank spaces in PDF documents, intelligently rearranging content.

⬇️ Get Started

🖼️ Samples

FenScribe GUI Example

FenScribe Graphical User Interface

FenScribe Processing Example

Example of PDF Optimization

💻 Installation & Dependencies

1. First, download or clone the repository:

  • Option A (Git): Clone the repository using Git:
    git clone https://github.com/ordylan/FenScribe.git
    cd FenScribe
  • Option B (Download ZIP): Download the ZIP file from the GitHub page: https://github.com/ordylan/FenScribe. Then, unzip the file and navigate into the extracted directory in your terminal.

2. Install the required Python libraries using pip:

pip install PyMuPDF Pillow python-docx tkinterdnd2

🪄 Usage

  1. Navigate to the directory where you downloaded/cloned FenScribe.
  2. Run gui.pyw for the graphical interface (CLI version currently unavailable). Double-click the file or run python gui.pyw in your terminal.
  3. After processing, the output Word document (output_*.docx) will be saved in the __Output/ directory.
  4. Use the provided Office macro (图片溢出缩小.bas) to resize overflowing images in the generated Word document.
    • Open the Word document (output_*.docx).
    • Open the VBA editor (Alt + F11).
    • Import the 图片溢出缩小.bas file (File > Import File...).
    • Run the macro双栏图片溢出自动调() (Run > Run Sub/UserForm or press F5).
    • The macro automatically detects the first column width in Word and scales oversized images to fit.

⚙️ Configuration Parameters

These parameters can be adjusted within the GUI or potentially in a future configuration file:

Parameter Description
threshold Brightness threshold for blank line detection (0-255). Converts RGB to grayscale average; rows ≥ threshold are considered blank. Higher values detect lighter grays as blank.
dpi Image resolution (Dots Per Inch) used when converting PDF pages to images for analysis. Higher DPI increases precision but also processing time and memory usage.
min_height Content validity filter (in pixels). Only preserves content blocks with height ≥ this value. Helps filter out small noise or artifacts.
blank_height Paragraph separation baseline (in pixels). Content is split into separate paragraphs when consecutive blank lines reach this height. Defines the minimum vertical gap considered a paragraph break.

⚠️ Important Notes

📜 License

This project is licensed under the MIT License.

Development Notes

This third-generation version of FenScribe features:

  • A Graphical User Interface (GUI) implementation for more user-friendly operation.
  • Partial utilization of AI-assisted development tools during its creation.
  • Continuous optimization and refinement through multiple development iterations.
  • Focus on converting PDF content to Word format (.docx) for easier editing and manipulation post-processing.