Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve benchmark #212

Open
grossir opened this issue Feb 14, 2025 · 0 comments
Open

Improve benchmark #212

grossir opened this issue Feb 14, 2025 · 0 comments

Comments

@grossir
Copy link
Contributor

grossir commented Feb 14, 2025

First, the readability can be improved. Branches are named "branch1", "branch2"; but which is "main" and which is the PR?

Second, we could return some statistics already implied in the processing
In this recent PR, there was a noticeable processing time increment, which is an important metric for the changes

Also, with the introduction of ReferenceCitations we get the possibility of overlapping citations; it would be interesting to return the citation type, besides the citation type, in the JSON itself

About processing time calculation, from the results JSON themselves

url1 = "https://raw.githubusercontent.com/freelawproject/eyecite/artifacts/203/results/8981703e7cc27067adcb39f66346dc62248974cf.json"
url2 = "https://raw.githubusercontent.com/freelawproject/eyecite/artifacts/203/results/bb9ca00f64c5aa47d0eb85a16e38bc03a6bf0b61.json"

def get_time_stats(url):
    import requests
    import statistics
    
    jason = requests.get(url).json()
    prev_start = jason[0]['time']
    actual_times = [prev_start]
    for item in jason[1:]:
        actual_times.append(item['time'] - prev_start)
        prev_start =  item['time']
    
    print("Mean: ", sum(actual_times)/len(actual_times))
    print("Median: ", statistics.median(actual_times))
    print("Sample size: ", len(actual_times))

get_time_stats(url1)
get_time_stats(url2)

Yields

Mean:  0.08226925288831836
Median:  0.014779999999994686
Sample size:  779

Mean:  0.05099591142490372
Median:  0.009577000000000169
Sample size:  779
@flooie flooie moved this to Buffer Zone in Case Law Sprint Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Buffer Zone
Development

No branches or pull requests

1 participant