This is a python reporting tool for the Udacity logs analysis project.
The purpose of the project is to use psycopg2 to query a mock PostgreSQL database for a fictional news website.
The tool answers three questions:
- What are the most popular three articles of all time?
- Who are the most popular article authors of all time?
- On which days did more than 1% of requests lead to errors?
The database has three tables:
authorsincludes information about the authors of articlesarticlesincludes the articles themselveslogincludes one entry for each time a user has accessed the site
The program is designed to be run via the Udacity linux virtual machine command line.
You can install VirtualBox via this link and vagrant, the command line utility to manage the virtual machine here.
- Create a folder to store the files for this project and then open the folder with the terminal.
- newsdata.zip contains the newsdata.sql database structure, save this, create_views.sql and tool.py in your project folder.
- Save Vagrantfile in your vagrant folder.
- Type
vagrant init ubuntu/trusty64to tell Vagrant what kind of Linux VM you want to run - Run the virtual machine by running
vagrant upandvagrant sshto log in. - To load the data
cdinto the project folder and use the commandpsql -d news -f newsdata.sql - To load the views required to run the program run the command
psql -d news -f create_views.sql
If you get an error message saying:
psql: FATAL: database "news" does not existpsql: could not connect to server: Connection refused
This means the database server isn't running or isn't set up correctly, you may need to download the virual machine again into a fresh directory.
The views required to run the program can be imported from create_views.sql for ease of use.
The views are as follows:
CREATE VIEW top_three AS
SELECT path, count(*) AS num
FROM log
WHERE path LIKE '%article%'
GROUP BY path
ORDER BY num DESC
LIMIT 3;
CREATE VIEW author_titles AS
SELECT authors.name, articles.title, articles.slug
FROM articles
JOIN authors
ON authors.id = articles.author
ORDER BY name;
CREATE VIEW article_count AS
SELECT path, count(*) AS num
FROM log
WHERE path LIKE '%article%'
GROUP BY path;
CREATE VIEW requests_made AS
SELECT date_trunc('day', time), count(*) AS num_requests
FROM log
GROUP BY date_trunc('day', time);
CREATE VIEW request_errors AS
SELECT date_trunc('day', time), status, count(*) AS num_errors
FROM log
WHERE status != '200 OK'
GROUP BY date_trunc('day', time), status
ORDER BY date_trunc('day', time);
CREATE VIEW error_comparisons AS
SELECT to_char(requests_made.date_trunc, 'FMMonth DD, YYYY'),
round(cast(num_errors AS numeric)/num_requests*100,2) AS percentage
FROM request_errors
JOIN requests_made
ON request_errors.date_trunc = requests_made.date_trunc;To run the program run python tool.py in the command line
A plain text example of what is returned by the tool program can be seen in results.txt