Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(find_reference_citations_from_markup) #203

Merged
merged 17 commits into from
Feb 18, 2025

Conversation

grossir
Copy link
Contributor

@grossir grossir commented Feb 5, 2025

Solves #198
Solves problems made evident by first iteration of this PR and described here in #209

  • Implements a function to get name-only ReferenceCitations, taking advantage of style i/em tags on HTML sources
  • this new function will be triggered by passing an extra argument to the main function find.get_citations
  • Refactors ReferenceCitation.is_valid_name to utils.is_valid_name
  • adds regexes.PRE_FULL_CITATION_REGEX to account for single-name full case citations and for single-name-and-pincite-full-case-citations
  • add tests for the new function, to check both that it works as standalone, and that it does not collide with other citation types
  • resolved a bug in match_on_tokens where MAX_MATCH_CHARS was used incorrectly
  • updated tests that where invalidated, where what was identified as a Reference was actually a part of the FullCaseCitation

Solves #198

Implements a function to get name-only ReferenceCitations, taking advantage of style i/em tags on HTML sources

- Refactors ReferenceCitation.is_valid_name to utils.is_valid_name
- add tests for the new function
@grossir grossir force-pushed the 198-find-name-only-reference-citations branch from 81b4fa2 to 2e6ae84 Compare February 5, 2025 23:36
@grossir grossir requested a review from flooie February 5, 2025 23:40
@grossir grossir force-pushed the 198-find-name-only-reference-citations branch from 2e6ae84 to 24d6166 Compare February 5, 2025 23:44
@grossir grossir force-pushed the 198-find-name-only-reference-citations branch from 24d6166 to d8eb198 Compare February 5, 2025 23:47
grossir added a commit to freelawproject/courtlistener that referenced this pull request Feb 6, 2025
…w uses find_reference_citations_from_markup

Adds logic to use freelawproject/eyecite#203
@flooie flooie assigned grossir and unassigned flooie Feb 7, 2025
flooie and others added 8 commits February 7, 2025 10:50
apply refactor from code review #206
…ent pincites

This will help disambiguate adyacent ReferenceCitations

- add `helpers.add_pre_citation`
- add regex needed
- add test_FindTest where this is used
- resolved a bug in match_on_tokens where MAX_MATCH_CHARS was used incorrectly
- updated tests that where invalidated, where what was identified as a Reference was actually a part of the FullCaseCitation
This is passed to `extract_reference_citations`, which allows us to use `find_reference_citations_from_markup` inside that function, simplyfing the calls
Solves #209

- add test cases for full case citation with antecedent and no pincite
- fix span calculation on add_pre_citation
@grossir grossir force-pushed the 198-find-name-only-reference-citations branch 2 times, most recently from 8db0f0a to 71c42c6 Compare February 13, 2025 21:46
Bill noticed on testing that the HTML extraction on real data was slow; we were using a SpanUpdater for each full citation; code is now refactored to create the SpanUpdaters once, for each Opinion
@grossir grossir force-pushed the 198-find-name-only-reference-citations branch from 71c42c6 to 509c12a Compare February 13, 2025 21:48
@grossir grossir force-pushed the 198-find-name-only-reference-citations branch from 38fde4f to 61ddc22 Compare February 13, 2025 22:39
@grossir grossir mentioned this pull request Feb 14, 2025
@grossir grossir force-pushed the 198-find-name-only-reference-citations branch from 45981a7 to 8f72df6 Compare February 14, 2025 22:22
Copy link
Member

@quevon24 quevon24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me some time to understand the changes and the existing code but I think everything looks good, the code is well commented and structured, I tried the tests and there were no problems.

I think it's ready

great job @grossir

Copy link
Contributor

The Eyecite Report 👁️

Gains and Losses

There were 0 gains and 236 losses.

Click here to see details.
id Gain Loss
4746031 Llamas-Villa
4746031 Pena
4746031 Buie
4799679 Akins
4799679 Graham
5066102 Fredericks
5071459 Greene
5071459 Burks
5112424 Overmyer
5112424 Jones
5123092 Atwood
5160500 Sibley
5165179 McKinstrey
5165179 Rodriguez
5167616 Toney
5618955 Widincamp
5656104 Malouf
5750897 Preston
1996784 Caplin
2014564 Hanreddy
2060699 Frohlich
1917661 Doucet
3419420 Cunningham
3419420 Martin
3419420 Best
2303811 Butzberger
2303811 Lovett
2303811 Campos
2303811 Vanderweele
2303811 Miller
2303811 Campos
2387663 Murray
1662392 Jergnigan
1744543 Solem
1744543 Faretta
1804094 Mercer
1783747 Vallon
1783747 Kaperonis
2168388 Tomasek
1853016 Tyler
1137818 Cherney
1137818 Payne
1137818 Beekner
1341018 Looney
1537257 Pope
1537257 Greger
1546016 Vincenzi
1546016 Pettit
1546016 Davis
1929026 Walker
1940979 Wallace
1941966 LeBrane

Time Chart

image

Generated Files

Branch 1 Output
Branch 2 Output
Full Output CSV

@flooie
Copy link
Contributor

flooie commented Feb 18, 2025

This looks great. Cant wait to rerun the tests

@flooie flooie merged commit 645e527 into main Feb 18, 2025
13 checks passed
@flooie flooie deleted the 198-find-name-only-reference-citations branch February 18, 2025 15:42
@flooie
Copy link
Contributor

flooie commented Feb 18, 2025

@grossir @quevon24 thanks for this - this is going to be a great update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants