You currently extract emails from Facebook pages manually by:
- Opening ~50 Facebook page URLs at once using a multi-link opener browser extension
- Running an email-hunter browser extension that scans visible page content
- Saving discovered emails manually
This workflow works but is:
- Slow due to browser overhead and manual steps
- Resource-heavy (many tabs, UI rendering)
- Dependent on multiple extensions
You want a free, lightweight, fast, and smooth software solution for personal use only that automates this process.
Assumptions (please confirm):
- You only want to extract publicly visible emails (About section, page description, posts)
- You do not want to bypass Facebook authentication, paywalls, or privacy controls
- You are okay running a local script/app on your PC (not a cloud SaaS)
- Accept a list of 80+ Facebook Page URLs as input
- Automatically load each page and scan for email addresses
- Extract and save emails to a local file (CSV / TXT)
- Be free and open-source based
- Faster than opening pages manually in a browser
- Headless operation (no visible browser UI)
- Rate limiting to avoid Facebook temporary blocks
- Resume capability if interrupted
- Deduplication of emails
- Basic logging (URL → email found / not found)
- Hacking private data
- Scraping personal profiles (non-pages)
- Commercial-scale scraping
The software will be a local Windows-based Python application that uses a headless browser to load Facebook Pages and extract publicly visible email addresses.
It mimics what you do manually, but without UI rendering or extensions.
Input URLs (.txt)
↓
Headless Browser (Playwright)
↓
Page Content (About + Visible Text)
↓
Email Extraction (Regex)
↓
Results (CSV / TXT)
- Language: Python 3.11+
- Browser Automation: Playwright (Chromium)
- Parsing: Built-in text extraction (no DOM hacking)
- Email Detection: Regex (RFC-compliant)
- Output: CSV file
- Packaging (optional): PyInstaller →
.exe
Facebook Pages:
- Load content dynamically (JavaScript)
- Hide emails in About sections
Playwright:
- Executes JavaScript like a real browser
- Faster than Selenium
- Less detectable
- No visible tabs
- Load Facebook Page URL
- Wait for page network to become idle
- Extract all visible text
- Run regex pattern:
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
- Normalize emails (lowercase)
- Deduplicate
- Store result mapped to URL
To avoid Facebook blocking:
- Max 3–5 pages in parallel
- 2–4 seconds delay between batches
- Randomized wait (human-like)
- Download from python.org
- Enable Add Python to PATH
Verify:
python --version
pip install playwright pandas
playwright install chromium
facebook_email_extractor/
├── input_urls.txt
├── extractor.py
├── results.csv
└── logs.txt
- Read URLs from
input_urls.txt - Launch Playwright in headless mode
- Process URLs in async batches
- Extract text using
page.inner_text('body') - Apply regex
- Save results
results.csv
| Facebook URL | Email Found |
|---|---|
| fb.com/QCA… | qcaelectric@mchsi.com |
pip install pyinstaller
pyinstaller --onefile extractor.py
Result:
dist/extractor.exe
Double-click to run.
- Environment setup complete
- URLs loading correctly
- Email extraction verified on 10 pages
- CSV output validated
- EXE build completed
- Success rate: % pages with emails found
- Speed: URLs/minute
- Accuracy: Emails match Facebook About section
- Stability: No crashes on 80+ URLs
Please contact me at https://sammuti.com 🙂 (TBD)
(TBD)
(TBD)
Please contact me at https://sammuti.com 🙂