-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #2 from tumb1er/master
Implementation with tests
- Loading branch information
Showing
12 changed files
with
462 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,159 @@ | ||
Django Sitemap Generate | ||
======================= | ||
|
||
Background sitemap generation for Django | ||
Background sitemap generation for Django. | ||
|
||
[![Build Status](https://github.com/just-work/django-sitemap-generate/workflows/build/badge.svg?branch=master&event=push)](https://github.com/just-work/django-sitemap-generate/actions?query=event%3Apush+branch%3Amaster+workflow%3Abuild) | ||
[![codecov](https://codecov.io/gh/just-work/django-sitemap-generate/branch/master/graph/badge.svg)](https://codecov.io/gh/just-work/django-sitemap-generate) | ||
[![PyPI version](https://badge.fury.io/py/django-sitemap-generate.svg)](https://badge.fury.io/py/django-sitemap-generate) | ||
|
||
Use case | ||
-------- | ||
|
||
Almost every content site has a sitemap. Django provides an application serving | ||
sitemap views, and it's OK if your website is small. If you have complicate | ||
logic in sitemap generation or if you have millions of items in sitemap - you'll | ||
have a massive load spikes when Google and another search engines come with | ||
thousands of there indexer bots. These bots will request same sitemap pages in | ||
parallel and those requests couldn't be cached because of large index interval | ||
and small hit rate. | ||
|
||
The solution is to re-generate sitemap files periodically, once per day and not | ||
once per search engine indexer. These files could be served as static files | ||
which will not affect backend performance at all. | ||
|
||
Prerequisites | ||
------------- | ||
|
||
These project uses index sitemap view and per-model sitemap views to generate | ||
sitemap xml files. To provide it you will need following. | ||
|
||
1. Add `django.contrib.sitemaps` to installed apps | ||
```python | ||
INSTALLED_APPS.append('django.contrib.sitemaps') | ||
``` | ||
2. Configure at least one sitemap | ||
```python | ||
from django.contrib.sitemaps import Sitemap | ||
|
||
from testproject.testapp import models | ||
|
||
|
||
class VideoSitemap(Sitemap): | ||
name = 'video' | ||
changefreq = 'daily' | ||
limit = 50000 | ||
|
||
def items(self): | ||
return models.Video.objects.order_by('id') | ||
``` | ||
|
||
Note that `changefreq` parameter is a hint for search engine indexer, it | ||
does not affect sitemap files generation period. | ||
|
||
3. Configure sitemap serving | ||
```python | ||
from django.contrib.sitemaps import views | ||
from django.urls import path | ||
|
||
from testproject.testapp.sitemaps import VideoSitemap, ArticleSitemap | ||
|
||
sitemaps = { | ||
VideoSitemap.name: VideoSitemap, | ||
ArticleSitemap.name: ArticleSitemap | ||
} | ||
|
||
urlpatterns = [ | ||
path('sitemap.xml', views.index, {'sitemaps': sitemaps}, | ||
name='sitemap-index'), | ||
path('sitemap-<section>.xml', views.sitemap, {'sitemaps': sitemaps}, | ||
name='django.contrib.sitemaps.views.sitemap'), | ||
] | ||
``` | ||
|
||
Now your website supports sitemap views. | ||
|
||
Installation | ||
------------ | ||
|
||
```shell script | ||
pip install django-sitemap generate | ||
``` | ||
|
||
Working example is in `testproject.testapp`. | ||
|
||
1. Add `sitemap_generate` application to installed apps in django settings: | ||
```python | ||
INSTALLED_APPS.append('sitemap_generate') | ||
``` | ||
2. Add a reference to sitemap mapping to django settings: | ||
```python | ||
SITEMAP_MAPPING = 'testproject.testapp.urls.sitemaps' | ||
``` | ||
3. You may need to override default sitemap index url name | ||
```python | ||
SITEMAP_INDEX_URL = 'sitemap-index' | ||
``` | ||
4. Also you may need to setup forwarded protocol handling in django settings: | ||
```python | ||
SECURE_PROXY_SSL_HEADER = ('HTTP_X_FORWARDED_PROTO', 'https') | ||
``` | ||
5. Note that django paginates sitemap with `p` query parameter, but | ||
corresponding sitemap files are named `sitemap-video.xml`, | ||
`sitemap-video-2.xml` and so on. You'll need to configure some "rewrites". | ||
|
||
Usage | ||
----- | ||
|
||
When you request sitemap over http, django substitutes website domain name from | ||
request to links in sitemap xml. In background, you'll need some environment | ||
variables. By defaults link are generated for `localhost` over HTTPS. | ||
|
||
```shell script | ||
export \ | ||
SITEMAP_PROTO=https \ | ||
SITEMAP_HOST=github.com \ | ||
SITEMAP_PORT=443 | ||
|
||
# generate all sitemaps | ||
python manage.py generate_sitemap | ||
|
||
# generate sitemap for single model | ||
python manage.py generate_sitemap video | ||
``` | ||
|
||
You may run sitemap generation from crontab: | ||
|
||
``` | ||
0 0 * * * python manage.py generate_sitemap | ||
``` | ||
|
||
You may run sitemap generation from celery: | ||
|
||
```python | ||
@celery.task | ||
def generate_sitemap(): | ||
generator = SitemapGenerator(sitemaps={'video': VideoSitema[}) | ||
generator.generate() | ||
``` | ||
|
||
And you will need to configure xml files static responses, i.e. in nginx: | ||
|
||
``` | ||
location ~* /sitemaps/(?<fn>sitemap(-(article|video)).xml { | ||
try_files /media/sitemaps/$fn$arg_p.xml @backend; | ||
} | ||
|
||
location /media/ { | ||
alias /app/media/; | ||
} | ||
|
||
location @backend { | ||
proxy_set_header Host $http_host; | ||
proxy_set_header X-Real-IP $remote_addr; | ||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; | ||
proxy_set_header X-Forwarded-Proto $scheme; | ||
set $app http://app:8000; | ||
proxy_pass $app; | ||
} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
from os import getenv as e | ||
|
||
# Hostname used in sitemap links | ||
SITEMAP_HOST = e('SITEMAP_HOST', 'localhost') | ||
# Port used in sitemap links | ||
SITEMAP_PORT = int(e('SITEMAP_PORT', 443)) | ||
# Protocol used in sitemap links | ||
SITEMAP_PROTO = e('SITEMAP_PROTO', 'https') |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,134 @@ | ||
import os | ||
from logging import getLogger | ||
from typing import Dict, Optional, Callable, Type | ||
from urllib.parse import ParseResult, urlparse | ||
|
||
from django.contrib.sitemaps import Sitemap | ||
from django.core.files.base import ContentFile | ||
from django.core.files.storage import default_storage, Storage | ||
from django.core.servers import basehttp | ||
from django.http import HttpResponse | ||
from django.urls import reverse | ||
|
||
from sitemap_generate import defaults | ||
|
||
|
||
class SitemapError(Exception): | ||
""" Sitemap generation error.""" | ||
|
||
def __init__(self, status_code, content): | ||
super().__init__(status_code, content) | ||
self.status_code = status_code | ||
self.content = content | ||
|
||
|
||
StartResponseFunc = Callable[[str, dict], None] | ||
WSGIFunc = Callable[[dict, StartResponseFunc], HttpResponse] | ||
|
||
|
||
class ResponseRecorder: | ||
""" Helper for fetching sitemaps over WSGI request.""" | ||
|
||
def __init__(self, wsgi: WSGIFunc): | ||
""" | ||
:param wsgi: Django wsgi application | ||
""" | ||
self.wsgi = wsgi | ||
self.status: Optional[str] = None | ||
|
||
def record(self, url: str) -> bytes: | ||
""" | ||
Fetches an url over WSGI request and returns response content. | ||
:param url: request url | ||
:returns: response content | ||
:raises SitemapError: if response status code is not 200. | ||
""" | ||
url: ParseResult = urlparse(url) | ||
|
||
environ = { | ||
'REQUEST_METHOD': 'GET', | ||
'wsgi.input': '', | ||
'SERVER_NAME': defaults.SITEMAP_HOST, | ||
'SERVER_PORT': defaults.SITEMAP_PORT, | ||
'PATH_INFO': url.path, | ||
'QUERY_STRING': url.query, | ||
'HTTP_X_FORWARDED_PROTO': defaults.SITEMAP_PROTO | ||
} | ||
response = self.wsgi(environ, self._start_response) | ||
if self.status != "200 OK": | ||
raise SitemapError(self.status, response.content) | ||
return response.content | ||
|
||
def _start_response(self, status, _): | ||
""" WSGI headers callback func.""" | ||
self.status = status | ||
|
||
|
||
class SitemapGenerator: | ||
""" Sitemap XML files generator.""" | ||
|
||
def __init__(self, | ||
media_path: str = 'sitemaps', | ||
storage: Storage = default_storage, | ||
index_url_name: str = 'sitemap-index', | ||
sitemaps: Optional[Dict[str, Type[Sitemap]]] = None): | ||
""" | ||
:param media_path: relative path on file storage | ||
:param storage: file storage implementation used for sitemaps | ||
:param index_url_name: name of view serving sitemap index xml file | ||
:param sitemaps: mapping: sitemap name -> sitemap implementation | ||
""" | ||
cls = self.__class__ | ||
self.logger = getLogger(f'{cls.__module__}.{cls.__name__}') | ||
self.sitemap_root = media_path | ||
self.storage = storage | ||
self.index_url_name = index_url_name | ||
self.sitemaps = sitemaps or {} | ||
self.recorder = ResponseRecorder( | ||
basehttp.get_internal_wsgi_application()) | ||
|
||
def fetch_content(self, url: str) -> bytes: | ||
""" Fetch sitemap xml content with wsgi request recorder.""" | ||
self.logger.debug(f"Fetching {url}...") | ||
return self.recorder.record(url) | ||
|
||
def store_sitemap(self, filename: str, content: bytes): | ||
""" Save sitemap content to file storage.""" | ||
path = os.path.join(self.sitemap_root, filename) | ||
if self.storage.exists(path): | ||
self.storage.delete(path) | ||
self.storage.save(path, ContentFile(content)) | ||
|
||
def generate(self, sitemap=None): | ||
""" Generate all sitemap files.""" | ||
self.logger.debug("Start sitemap generation.") | ||
url = reverse(self.index_url_name) | ||
|
||
index_content = self.fetch_content(url) | ||
self.store_sitemap('sitemap.xml', index_content) | ||
|
||
for name, sitemap_class in self.sitemaps.items(): | ||
if sitemap and sitemap != name: | ||
continue | ||
self.logger.debug("Generating sitemap for %s", name) | ||
self.generate_pages(name, sitemap_class()) | ||
|
||
self.logger.debug("Finish sitemap generation.") | ||
|
||
def generate_pages(self, section: str, sitemap: Sitemap): | ||
""" Generate sitemap section pages.""" | ||
url = reverse('django.contrib.sitemaps.views.sitemap', | ||
kwargs={'section': section}) | ||
for page in sitemap.paginator.page_range: | ||
if page > 1: | ||
page_url = f'{url}?p={page}' | ||
filename = f'sitemap-{section}{page}.xml' | ||
else: | ||
page_url = url | ||
filename = f'sitemap-{section}.xml' | ||
|
||
page_content = self.fetch_content(page_url) | ||
self.store_sitemap(filename, page_content) |
Empty file.
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
from django.conf import settings | ||
from django.core.management import BaseCommand | ||
from django.utils.module_loading import import_string | ||
|
||
from sitemap_generate.generator import SitemapGenerator | ||
|
||
|
||
class Command(BaseCommand): | ||
help = "generate sitemap xml files" | ||
|
||
sitemaps = import_string(settings.SITEMAP_MAPPING) | ||
|
||
def add_arguments(self, parser): | ||
super().add_arguments(parser) | ||
parser.add_argument('sitemap', type=str, nargs='?') | ||
|
||
def handle(self, *args, **options): | ||
generator = SitemapGenerator(sitemaps=self.sitemaps) | ||
generator.generate(options.get('sitemap')) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# Generated by Django 3.0.5 on 2020-04-03 13:05 | ||
|
||
from django.db import migrations, models | ||
|
||
|
||
class Migration(migrations.Migration): | ||
|
||
initial = True | ||
|
||
dependencies = [ | ||
] | ||
|
||
operations = [ | ||
migrations.CreateModel( | ||
name='Article', | ||
fields=[ | ||
('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')), | ||
], | ||
), | ||
migrations.CreateModel( | ||
name='Video', | ||
fields=[ | ||
('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')), | ||
], | ||
), | ||
] |
Oops, something went wrong.