Nexus News Agent

Nexus News Agent is an AI-powered project that provides a suite of tools for searching, crawling, and summarizing news articles from the web. It features a powerful AI engine to process text and a server component to generate images with the summarized content.

Features

Search: Find news articles on any topic using Google Custom Search.
Crawl: Fetch and parse the content of articles from their URLs.
Summarize: Generate concise, one-line summaries of articles using the Gemini API.
Image Generation: Create images with the title, summary, and source of an article overlaid on a template.
Modular Architecture: The project is divided into an ai_engine for core processing and a server for API and image generation.

Project Structure

PGAI/
├── .gitignore
├── .python-version
├── pyproject.toml
├── README.md
├── requirements.txt
├── uv.lock
├── ai_engine/
│   ├── app.py              # Main entry point for the AI engine
│   ├── tools_crawl.py      # Web crawling utilities
│   ├── tools_search.py     # Search utilities
│   └── tools_summarize.py  # Summarization logic
└── server/
    ├── main.py             # FastAPI server
    ├── workflow.py         # Image generation workflow
    ├── assets/
    │   ├── base.png
    │   ├── glyphnames.json
    │   └── JetBrainsMonoNerdFontMono-BoldItalic.ttf
    └── output/
        └── processed_image.png

Getting Started

Prerequisites

Python 3.11 or higher
An environment manager like venv or uv.

Installation

Clone the repository:
```
git clone <repository-url>
cd PGAI
```

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the dependencies: You can use either pip with requirements.txt or uv with pyproject.toml.
- Using pip:
```
pip install -r requirements.txt
```
- Using uv:
```
uv sync
```

Configuration

The project requires API keys for Google Custom Search and the Gemini API.

Create a .env file in the root of the PGAI directory.

Add your API keys to the .env file:

GOOGLE_CSE_KEY="your_google_custom_search_engine_key"
GOOGLE_CSE_CX="your_google_custom_search_engine_cx"
GOOGLE_API_KEY="your_gemini_api_key"

Usage

AI Engine

The AI engine is a command-line tool that takes a topic as input, searches for relevant articles, crawls them, and generates summaries.

Run the AI engine:
```
python -m ai_engine.app
```
Enter a topic when prompted.
The summaries will be saved in a JSON file in the root of the PGAI directory.

Server

The server is a FastAPI application that can generate an image with the summarized content of an article.

Start the server:
```
uvicorn server.main:app --reload
```

Send a POST request to the /content endpoint with the following JSON payload:

{
  "title": "Your Title",
  "content": "Your one-line summary.",
  "reference": "your.source.com"
}

The generated image will be saved as processed_image.png in the server/output directory.

Dependencies

The main dependencies are listed in pyproject.toml and requirements.txt. They include:

FastAPI: For the web server.
LangChain & LangGraph: For building the AI workflow.
Pillow & OpenCV: For image manipulation.
Beautiful Soup & Requests: For web crawling.
Pydantic: For data validation.

Contributing

Contributions are welcome! Please feel free to submit a pull request.

Fork the repository.
Create a new branch (git checkout -b feature/your-feature).
Make your changes.
Commit your changes (git commit -m 'Add some feature').
Push to the branch (git push origin feature/your-feature).
Open a pull request.

News Agent Nexus

A CLI tool to automatically search, crawl, and summarize news and articles on any given topic, creating a concise brief from multiple sources.

Overview

News Agent Nexus is a Python-based command-line tool that acts as an autonomous research agent. You provide a topic, and it performs a web search to find relevant seed articles, crawls those pages and linked articles, and uses a large language model to generate structured summaries. The final output is a JSON file containing a list of concise, easy-to-read summaries with source references.

Features

Topic-based Search: Initiates research from a simple user-provided topic.
Web Crawling: Fetches content from seed URLs and discovers related articles.
AI-Powered Summarization: Uses Google's Gemini models to generate structured, concise summaries.
Parallel Processing: Summarizes multiple articles concurrently for faster results.
Structured Output: Saves results in a clean, timestamped JSON file for easy use in other applications.
Configurable: API keys and search engine details are managed via environment variables.

Architecture

The agent operates in a three-stage pipeline orchestrated by app.py:

Search: The user's topic is fed to the tools_search module, which uses the Google Custom Search API to find a list of initial "seed" URLs.
Crawl: The tools_crawl module fetches the HTML content for each seed URL. It also identifies and crawls other promising article-like links on those pages to broaden the content base.
Summarize: The extracted text from the crawled pages is passed to the tools_summarize module. This module uses a large language model (Gemini) to generate a short, structured summary for each article in parallel.

The final summaries are collected and written to a single JSON file.

graph TD
    A[User Topic] --> B{app.py};
    B --> C[1. Search Seeds<br>(tools_search.py)];
    C -- Google CSE API --> D[Seed URLs];
    D --> E[2. Crawl Pages<br>(tools_crawl.py)];
    E -- HTTP Requests --> F[Page Content];
    F --> G[3. Summarize Articles<br>(tools_summarize.py)];
    G -- Gemini API --> H[Structured Summaries];
    H --> I{output.json};

Prerequisites

Python 3.8+
Access to Google Cloud Platform for:
- Google Custom Search API
- Google AI (Gemini) API

Installation

Follow these steps to set up the project locally.

1. Clone the repository:

git clone <YOUR_REPOSITORY_URL>
cd News-Agent-Nexus

2. Create and activate a Python virtual environment:

macOS / Linux (bash)

python3 -m venv .venv
source .venv/bin/activate

Windows (Command Prompt)

python -m venv .venv
.venv\Scripts\activate.bat

Windows (PowerShell)

python -m venv .venv
.venv\Scripts\Activate.ps1

3. Install the required dependencies:

pip install -r requirements.txt

Configuration

The tool requires three environment variables for Google's APIs. Create a file named .env in the root of the project directory and add the following, replacing the placeholder values with your actual credentials.

# .env

# For AI-powered summarization via Google AI Studio or GCP
GOOGLE_API_KEY="AIzaSy..."

# For the initial web search via Google Custom Search Engine API
GOOGLE_CSE_KEY="AIzaSy..."
GOOGLE_CSE_CX="your_custom_search_engine_id"

Variable	Purpose	Example Value
`GOOGLE_API_KEY`	API key for the Google Gemini model used in summarization.	`AIzaSy...`
`GOOGLE_CSE_KEY`	API key for the Google Custom Search Engine API.	`AIzaSy...`
`GOOGLE_CSE_CX`	The unique ID for your Programmable Search Engine instance.	`a1b2c3d4e5f67890`

The .env file is ignored by Git, so your keys will not be committed.

Quickstart

Once you have completed the installation and configuration steps, you can run the agent immediately.

Activate your virtual environment (if you haven't already).
Run the application:
```
python app.py
```

Enter a topic when prompted:

Enter your topic (e.g. 'latest AI safety blog posts'): developments in solid-state batteries

The script will then execute all stages and save the results to a summaries_{timestamp}.json file.

Usage

To run the agent, simply execute the main application script.

python app.py

The application will prompt you to enter a topic. After you provide the topic and press Enter, it will begin the search, crawl, and summarization process, printing its progress to the console.

Enter your topic (e.g. 'latest AI safety blog posts'): latest news on quantum computing hardware
[1/3] Searching…
    Using 8 seed URLs.
[2/3] Crawling (seed + top links)…
    Unique pages: 21
[3/3] Summarizing (parallel)…
    Summaries written: 5

Wrote summaries_1756808756.json with 5 summaries.

Timing breakdown (seconds):
  search     1.12
  crawl      8.45
  summarize  15.20
  TOTAL      24.77

Output Example

The output is a JSON array of summary objects, saved to a file like summaries_1756808756.json. Each object contains a title, the summarized content, and a reference URL.

[
  {
    "title": "Quantum Leap",
    "content": "Researchers at XYZ University have developed a new qubit stabilization technique, potentially extending coherence times by over 200%.",
    "reference": "https://example.com/news/quant"
  },
  {
    "title": "Scaling Up",
    "content": "A major tech firm announced a 1,000-qubit processor, a significant milestone in building fault-tolerant quantum computers. Details remain sparse.",
    "reference": "https://tech-journal.com/artic"
  }
]

Troubleshooting

RuntimeError: Google Custom Search credentials missing...: This error means the GOOGLE_CSE_KEY or GOOGLE_CSE_CX environment variables are not set. Ensure your .env file is correct and in the project root.
RuntimeError: GOOGLE_API_KEY is missing...: This error means the GOOGLE_API_KEY for the Gemini model is not set. Check your .env file.
Slow Performance: The crawling and summarization steps depend on network speed and API response times. The script runs summarization in parallel to mitigate this, but it can still take time.
No Summaries Generated: This can happen if the initial search yields no results, the web pages cannot be crawled, or the content is too sparse to summarize. Check the console output for errors.

FAQ

Q: Can I use a different search engine or language model? A: Currently, the tool is hardcoded to use Google Custom Search and Google Gemini. Replacing these would require modifying tools_search.py and tools_summarize.py respectively.

Q: How many articles does it summarize? A: The script is currently configured to summarize the top 5 most article-like pages it finds to keep the process quick and focused. This can be changed in app.py.

Q: Why are the summaries so short? A: The summary length (title, content) is constrained in the prompt sent to the language model to produce very brief, tweet-sized outputs. You can adjust the max_length constraints in the ArticleSummary Pydantic model in tools_summarize.py.

Limitations

API Costs: This tool makes calls to paid Google Cloud APIs. Be mindful of the potential costs, especially if running it frequently or on many topics.
Crawl Quality: The web crawler uses simple heuristics to find articles and may miss content or fail on sites with heavy JavaScript.
Summarization Accuracy: Summaries are generated by an AI and may contain inaccuracies or misinterpret the source material. Always consult the reference link for critical information.

Contributing

TODO: Please add contribution guidelines, such as how to submit pull requests, coding standards, and testing procedures.

License

TODO: A license has not yet been specified for this project. Please choose an open-source license and add a LICENSE file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Nexus News Agent

Features

Project Structure

Getting Started

Prerequisites

Installation

Configuration

Usage

AI Engine

Server

Dependencies

Contributing

News Agent Nexus

Table of Contents

Overview

Features

Architecture

Prerequisites

Installation

Configuration

Quickstart

Usage

Output Example

Troubleshooting

FAQ

Limitations

Contributing

License

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
ai_engine		ai_engine
server		server
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

License

Matrix7043/PGAI

Folders and files

Latest commit

History

Repository files navigation

Nexus News Agent

Features

Project Structure

Getting Started

Prerequisites

Installation

Configuration

Usage

AI Engine

Server

Dependencies

Contributing

News Agent Nexus

Table of Contents

Overview

Features

Architecture

Prerequisites

Installation

Configuration

Quickstart

Usage

Output Example

Troubleshooting

FAQ

Limitations

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages