Skip to content

Latest commit

 

History

History
73 lines (60 loc) · 1.73 KB

File metadata and controls

73 lines (60 loc) · 1.73 KB

Defuddle Go Examples

Simple examples demonstrating core features of the Defuddle Go content extraction library.

Available Examples

📁 basic/

Basic Content Extraction

  • Simple HTML parsing and content extraction
  • Metadata extraction and processing statistics
  • Perfect starting point for beginners

Element Processing

  • ARIA role conversion and code block processing
  • Math formula handling and heading standardization
  • Debug information and processing steps

HTML to Markdown Conversion

  • Convert HTML content to clean Markdown format
  • Text formatting, code blocks, and lists
  • Format comparison and compression analysis

Site-Specific Extractors

  • Automatic extractor selection by URL pattern
  • Reddit content extraction example
  • Specialized processing for different sites

Custom Extractor Development

  • Create custom extractors for specific sites
  • Pattern registration and BaseExtractor interface
  • Site-specific extraction logic implementation

Quick Start

# Run any example
go run examples/basic/main.go
go run examples/advanced/main.go
go run examples/extractors/main.go
go run examples/markdown/main.go
go run examples/custom_extractor/main.go

Common Configurations

Basic Extraction

options := &defuddle.Options{
    Debug: true,
}

Advanced Processing

options := &defuddle.Options{
    ProcessCode:      true,
    ProcessMath:      true,
    ProcessRoles:     true,
    ProcessHeadings:  true,
    Debug:            true,
}

Markdown Conversion

options := &defuddle.Options{
    Markdown: true,
}