Skip to content

Commit cf4c3de

Browse files
committed
Port referer-parser to python.
1 parent 1a49dd2 commit cf4c3de

File tree

6 files changed

+114
-0
lines changed

6 files changed

+114
-0
lines changed

MANIFEST.in

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
include README.md

README.md

+59
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# referer-parser Python library
2+
3+
This is the Python implementation of [referer-parser] [referer-parser], the library for extracting search marketing data from referer _(sic)_ URLs.
4+
5+
The implementation uses the shared 'database' of known search engine referers found in [`search.yml`] [search-yml].
6+
7+
## Installation
8+
9+
pip install referer_parser
10+
11+
## Usage
12+
13+
Create a new instance of a Referer object by passing in the url you want to parse:
14+
15+
```python
16+
from referer_parser import Referer
17+
18+
referer_url = 'http://www.google.com/search?q=gateway+oracle+cards+denise+linn&hl=en&client=safari'
19+
20+
r = Referer(referer_url)
21+
```
22+
23+
The `r` variable now holds a Referer instance. The important attributes are:
24+
25+
```python
26+
print(r.known) # True
27+
print(r.referer) # 'Google'
28+
print(r.search_parameter) # 'q'
29+
print(r.search_term) # 'gateway oracle cards denise linn'
30+
print(r.uri) # ParseResult(scheme='http', netloc='www.google.com', path='/search', params='', query='q=gateway+oracle+cards+denise+linn&hl=en&client=safari', fragment='')
31+
```
32+
33+
The `uri` attribute is an instance of ParseResult from the standard libraries `urlparse` module.
34+
35+
## Contributing
36+
37+
1. Fork it
38+
2. Create your feature branch (`git checkout -b my-new-feature`)
39+
3. Commit your changes (`git commit -am 'Add some feature'`)
40+
4. Push to the branch (`git push origin my-new-feature`)
41+
5. Create new Pull Request
42+
43+
## Copyright and license
44+
45+
The referer-parser Python library is copyright 2012 Don Spaulding.
46+
47+
Licensed under the [Apache License, Version 2.0] [license] (the "License");
48+
you may not use this software except in compliance with the License.
49+
50+
Unless required by applicable law or agreed to in writing, software
51+
distributed under the License is distributed on an "AS IS" BASIS,
52+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
53+
See the License for the specific language governing permissions and
54+
limitations under the License.
55+
56+
[referer-parser]: https://github.com/snowplow/referer-parser
57+
[search-yml]: https://github.com/snowplow/referer-parser/blob/master/search.yml
58+
59+
[license]: http://www.apache.org/licenses/LICENSE-2.0

build_json.py

+12
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
#!/usr/bin/env python
2+
import json
3+
4+
import yaml
5+
6+
def build_json():
7+
searches = yaml.load(open('./data/search.yml'))
8+
with open('./data/search.json', 'w') as fp:
9+
json.dump(searches, fp)
10+
11+
if __name__ == "__main__":
12+
build_json()

data/search.json

+1
Large diffs are not rendered by default.

referer_parser.py

+27
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
import json
2+
from urlparse import urlparse, parse_qsl
3+
4+
5+
REFERERS = {}
6+
for ref, config in json.load(open('./data/search.json')).iteritems():
7+
for domain in config['domains']:
8+
REFERERS[domain] = {
9+
'name': ref,
10+
'params': map(unicode.lower, config['parameters']),
11+
}
12+
13+
14+
class Referer(object):
15+
def __init__(self, url):
16+
self.uri = urlparse(url)
17+
host = self.uri.netloc.split(':', 1)[0]
18+
self.known = False if host not in REFERERS else True
19+
self.referer = None
20+
self.search_parameter = ''
21+
self.search_term = ''
22+
if self.known:
23+
self.referer = REFERERS[host]['name']
24+
for param, val in parse_qsl(self.uri.query):
25+
if param.lower() in REFERERS[host]['params']:
26+
self.search_parameter = param
27+
self.search_term = val

setup.py

+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
import os
2+
from setuptools import setup, find_packages
3+
4+
setup_pth = os.path.dirname(__file__)
5+
readme_pth = os.path.join(setup_pth, 'README.md')
6+
7+
setup(
8+
name='referer-parser',
9+
version="0.0.3",
10+
long_description=open(readme_pth).read(),
11+
packages=find_packages(),
12+
include_package_data=True,
13+
zip_safe=False,
14+
)

0 commit comments

Comments
 (0)