-
Notifications
You must be signed in to change notification settings - Fork 736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch between RSS content and Web page content (store both) #1999
base: main
Are you sure you want to change the base?
Switch between RSS content and Web page content (store both) #1999
Conversation
reader/processor/processor.go
Outdated
@@ -80,14 +80,15 @@ func ProcessFeedEntries(store *storage.Storage, feed *model.Feed, user *model.Us | |||
logger.Error(`[Processor] Unable to crawl this entry: %q => %v`, entry.URL, scraperErr) | |||
} else if content != "" { | |||
// We replace the entry content only if the scraper doesn't return any error. | |||
entry.Content = content | |||
// TODO: document change | |||
entry.WebContent = content | |||
} | |||
} | |||
|
|||
rewrite.Rewriter(url, entry, feed.RewriteRules) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is problematic, I didn't look at how rewriter gets its data - I'll add a commit which fixes this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewriter now operates on web_content - as far as I can see rules do not operate on RSS content - only when processing original web page content.
739e6e2
to
93948d7
Compare
I merged the commits and rebased onto main. |
Looks like the unit tests are failing after your rebase |
Yeah, I'll need to change some of those tests I think. The |
77b36eb
to
d00bcfc
Compare
f397706
to
34f4898
Compare
4664e37
to
628dc2a
Compare
628dc2a
to
29ee5ec
Compare
Hi @fguillot, I've updated my pull request. Do you mind taking a look at it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should the rewrite rules apply only to WebContent
? They should apply to both Content
and WebContent
, especially for feeds that provide full HTML content by default.
Any progress on this? |
This to me seems to be changing the way the current v1/fetch-content endpoint works. The expectation is that the returned Surely, such a change should be prefixed to a bumped version - otherwise, what's the point of the v1 prefix at all? Also I don't see why fetch-content needs to be changed at all since the purpose of it is to return the web content. An easy solution would be to switch Clients are generally okay with additional API information, not so much with removed or changed. |
Please let me know what you think of this change.
It solves an issue that has irked me when using the interface, which is overwriting the original RSS content with the webpage content which sometimes is not as well formatted as the original RSS content or is empty/junk. It allows people to hit the
download
button just to see what the page looks like when it is downloaded, but still return to the RSS page to continue reading.The feature works like this:
Whenever an entry is loaded, if there is downloaded content available, that will be displayed and the
Download
button becomesShow RSS Content
.Otherwise the page loads like normal, showing the
Download
button and the RSS page content.If the
Show RSS Content
button is selected, the original RSS content is displayed and the button turns to the regularDownload
button.If this
Download
button is selected, the page is processed and saved like normal (re-downloading it even if it has been saved before).The
/fetch-content
API method now returns bothcontent
andweb_content
as JSON.