SELECT refs.repository_id
FROM refs
NATURAL JOIN commits
WHERE commits.commit_author_name = 'Johnny Bravo'
AND refs.ref_name = 'HEAD';
SELECT *
FROM refs
WHERE ref_name = 'HEAD';
SELECT file_path,
ref_commits.repository_id
FROM commit_files
NATURAL JOIN ref_commits
WHERE ref_commits.ref_name = 'HEAD'
AND ref_commits.history_index = 0;
SELECT *
FROM
(SELECT COUNT(c.commit_hash) AS num,
c.commit_hash
FROM ref_commits r
NATURAL JOIN commits c
GROUP BY c.commit_hash) t
WHERE num > 1;
SELECT COUNT(commit_hash),
commit_hash
FROM ref_commits
NATURAL JOIN commits
NATURAL JOIN commit_blobs
WHERE ref_name = 'HEAD'
GROUP BY commit_hash;
SELECT YEAR,
MONTH,
repo_id,
committer_email,
COUNT(*) AS num_commits
FROM
(SELECT YEAR(committer_when) AS YEAR,
MONTH(committer_when) AS MONTH,
repository_id AS repo_id,
committer_email
FROM ref_commits
NATURAL JOIN commits
WHERE ref_name = 'HEAD') AS t
GROUP BY committer_email,
YEAR,
MONTH,
repo_id;
SELECT
LANGUAGE(file_path, blob_content) as lang,
SUM(JSON_EXTRACT(LOC(file_path, blob_content), '$.Code')) as code,
SUM(JSON_EXTRACT(LOC(file_path, blob_content), '$.Comments')) as comments,
SUM(JSON_EXTRACT(LOC(file_path, blob_content), '$.Blanks')) as blanks,
COUNT(1) as files
FROM commit_files
NATURAL JOIN refs
NATURAL JOIN blobs
WHERE ref_name='HEAD'
GROUP BY lang;
Files from first 6 commits from HEAD references that contains some key and are not in vendor directory
SELECT file_path,
repository_id,
blob_content
FROM files
NATURAL JOIN commit_files
NATURAL JOIN ref_commits
WHERE ref_name = 'HEAD'
AND ref_commits.history_index BETWEEN 0 AND 5
AND is_binary(blob_content) = FALSE
AND files.file_path NOT REGEXP '^vendor.*'
AND (blob_content REGEXP '(?i)facebook.*[\'\\"][0-9a-f]{32}[\'\\"]'
OR blob_content REGEXP '(?i)twitter.*[\'\\"][0-9a-zA-Z]{35,44}[\'\\"]'
OR blob_content REGEXP '(?i)github.*[\'\\"][0-9a-zA-Z]{35,40}[\'\\"]'
OR blob_content REGEXP 'AKIA[0-9A-Z]{16}'
OR blob_content REGEXP '(?i)reddit.*[\'\\"][0-9a-zA-Z]{14}[\'\\"]'
OR blob_content REGEXP '(?i)heroku.*[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12}'
OR blob_content REGEXP '.*-----BEGIN PRIVATE KEY-----.*'
OR blob_content REGEXP '.*-----BEGIN RSA PRIVATE KEY-----.*'
OR blob_content REGEXP '.*-----BEGIN DSA PRIVATE KEY-----.*'
OR blob_content REGEXP '.*-----BEGIN OPENSSH PRIVATE KEY-----.*');
You can create an index either on a specific column or on several columns:
CREATE INDEX commits_hash_idx ON commits USING pilosa (commit_hash);
CREATE INDEX files_commit_path_blob_idx ON commit_files USING pilosa (commit_hash, file_path, blob_hash);
Note that just one expression at a time is allowed to be indexed.
CREATE INDEX files_lang_idx ON files USING pilosa (language(file_path, blob_content));
DROP INDEX files_lang_idx ON files;
First of all, you should check out the bblfsh documentation to get yourself familiar with UAST concepts.
Also, you can take a look to all the UDFs and their signatures in the functions section
SELECT repository_id,
file_path,
uast_extract(uast(blob_content, LANGUAGE(file_path), '//uast:Import/Path'), "Value") AS imports
FROM commit_files
NATURAL JOIN refs
NATURAL JOIN blobs
WHERE ref_name = 'HEAD'
AND LANGUAGE(file_path) = 'Go'
AND ARRAY_LENGTH(imports) > 0;
SELECT file_path,
uast_extract(uast(blob_content, LANGUAGE(file_path), '//uast:Identifier'), "Name") name
FROM commit_files
NATURAL JOIN refs
NATURAL JOIN blobs
WHERE ref_name='HEAD' AND LANGUAGE(file_path) = 'Go';
As result, you will get an array showing a list of the retrieved information. Each element in the list matches a node in the given sequence of nodes having a value for that property. It means that the length of the properties list may not be equal to the length of the given sequence of nodes:
+-------------------------------------------------------------------------------------------------------------------+
| file_path | name |
+-------------------+-----------------------------------------------------------------------------------------------+
| _example/main.go | ["main","driver","NewDefault","sqle","createTestDatabase","AddDatabase","driver","auth"] |
+-------------------+-----------------------------------------------------------------------------------------------+
You can monitor the progress of a gitbase query (either a regular query or an index creation query using SHOW PROCESSLIST
).
Let's say we do the following query over a huge repository:
SELECT file_path, LANGUAGE(file_path) lang FROM commit_files;
With this query we can monitor its progress:
SHOW PROCESSLIST;
We'll get the following output:
+-----+------+-----------------+---------+---------+------+-------------------+--------------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+-----+------+-----------------+---------+---------+------+-------------------+--------------------------------------------------------------+
| 168 | root | 127.0.0.1:53514 | gitbase | query | 36 | commit_files(8/9) | SELECT file_path, LANGUAGE(file_path) lang FROM commit_files |
| 169 | root | 127.0.0.1:53514 | gitbase | query | 0 | running | show processlist |
+-----+------+-----------------+---------+---------+------+-------------------+--------------------------------------------------------------+
From this output, we can obtain some information about our query:
- It's been running for 36 seconds.
- It's querying commit_files table and has processed 8 out of 9 partitions.
To kill a query that's currently running you can use the value in Id
. If we were to kill the previous query, we would need to use the following query:
KILL QUERY 168;