Skip to content
/ proose Public

A Prudence-based web services API for the Goose HTML content extraction library

Notifications You must be signed in to change notification settings

mdorn/proose

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

Proose is a web services wrapper around the Goose HTML content extracting library.

Proose also has limited (5,000 character maximum) support for the unofficial Google Translate Java API.

Proose is based on Prudence, the RESTful web platform for the JVM. It was inspired by the need for a server-side implementation of Readability.js
Goose seems to be the best one in any language; Proose exposes it via a web services API written primarily in a few lines of server-side JavaScript running on top of Prudence.

To use it, you'll need the JavaScript-enabled edition of Prudence (v1.1). You'll need to install the proose source in your instance's applications directory, and install or link the included jar dependencies (located in libraries in the repo) in the instance's libraries directory. These are the dependencies:

Once it's up and running, it will return a JSON representation of the main text of the URI you give it within an HTTP POST containing your request data in JSON format:

curl -i -H "Accept: application/json" -X POST -d '{"uri": "http://threecrickets.com/prudence/rest/"}' http://localhost:8080/proose/page/

{
    "title": "Prudence: Scalable REST/JVM Web Development Platform",
    "text": "There's a lot of buzz about REST, but also a lot confusion about what it is and what it's good for. This essay attempts to convey REST's simple essence. Let's start, then, not at REST, but at an attempt to create a new architecture for building scalable applications. Our goals are for it to be minimal, straightforward, and still have enough features to be productive. We want to learn some lessons from the failures of other, more elaborate and complete architectures. ..."
}

curl -i -H "Accept: application/json" -X POST -d '{"uri": "http://threecrickets.com/prudence/legal/", "source_language": "en", "target_language": "fr"}' http://localhost:8080/proose/page/

{
    "title": "Licence et les marques - Prudence: REST Scalable / Plate-forme de développement Web JVM - Trois grillons",
    "text": "Prudence vous est fourni sous la licence GNU Lesser General Public License version 3.0.\n\nEn outre, nous voulons mentionner expressément que les grillons Trois LLC, le titulaire du droit d'auteur que de tout le code source, n'a pas l'intention de libérer les futures versions du projet Prudence open source sous plusieurs licences restrictives, telles que la GPL. Si nous changeons la licence de nouveau dans l'avenir, il ne pouvait être pour une licence moins restrictive (comme Apache Public License).\n\nNotez que cet accord ne couvre pas les bibliothèques redistribués tiers. Les bibliothèques vous sont fournis à des fins de commodité, mais restent sous leurs licences respectives, qui sont reproduites dans les «licences / *" fichiers. Pour les demandes d'autorisation spéciales, s'il vous plaît contacter Trois grillons LLC."
}

About

A Prudence-based web services API for the Goose HTML content extraction library

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published