Skip to main content

Extract — Data Types

Engenerate extracts text, tables, figures, plots, and equations—keeping each type structured and accessible for downstream use in chat, datasets, and reports.

Text

Narrative text is extracted from all supported document types, including mixed-layout pages, multi-column formats, and scanned or image-based PDFs. Engenerate uses optical character recognition (OCR) to recover text from scanned engineering reports, field documentation, and legacy paper records—turning them into editable, searchable content within your project.

This is especially valuable for technical reports, specifications, and reference documents where the prose needs to be reviewable and reusable alongside structured data, whether those documents were born digital or digitized from physical originals.

Math & Equations

Mathematical content is extracted and stored in a universal format (LaTeX) that preserves the original notation. The built-in equation editor makes it easy to view, correct, and update equations directly in the content editor without needing to know LaTeX syntax.

Tables

Tables are extracted into structured content that can be reviewed independently of the source page layout. Large tabular datasets are stored in a structured format that makes them accessible for downstream use in datasets, chat context, and export.

Figures

Figures are extracted and stored as images with associated metadata and prose narrative context. Where applicable, figures are linked to extracted or validated datasets so that visual content and structured data remain connected.

Plots

Engenerate automatically detects chart and plot figures during processing and provides a first-pass extraction. For each detected plot:

  • The image is straightened and corrected for page skew
  • A first-pass extraction of visible data points from colored datasets is performed
  • Axis definitions are detected and an overlay is applied for review
  • Axes, datasets, and labels receive initial automatic attribution

The first-pass result is a starting point, not a final answer. The Plot Digitizer is where you refine and confirm those results.