Releases: PatentsView/PatentsView-DB
Releases · PatentsView/PatentsView-DB
Historic Parser
v0.1 Raw Parser
Release for data update 06/30/2020
Data Changes
- H patents are temporarily removed from the database awaiting reparsing. Currently, H patent numbers are incorrectly loaded with the corresponding check digits (from the raw data) loaded as part of the patent number. The data has been reparsed to remove erraneous mapping and will be added to the database for the next update
- Claims data has been reparsed to include newlines in the text and improved dependent field extraction has been implemented. Data from 1976 - 2000 (newlines, dependent field improvement) and 2005 - 2020 (newlines) has been posted on the bulk download page. Data from 2001 -2004 is being processed and will be posted as and when they become available
Pregrant Publications Data
- A beta version of USPTO's pre-grant publication data is now available at : www.patentsview.org/download/pregrantpublications.html. Users should note that this is pre-release product and may be missing data elements. We encourage users to report any issues that they find in the data.
API Changes
- The API has moved to the Amazon's Beanstalk platform and consequently the URL has changed. The new URL is https://api.patentsiew.org/. Previous URLs will redirect to the new URL, but POST requests will not work. The redirection is a temporary failsafe and users should update their URL to the updated URL.
Querytool Changes
- In an effort to reduce the delay in communication during a Querytool failure, we have implemeted an email alert system. We hope to utilize this system to be bit more quicker in resolving any errors that the Querytool may face.
Release for data update 03/31/2020
Bulk Download Changes
- Line Breaks retained in text data:
- Claims: all text from 2001 and later will have the line breaks in the text
- Brief Summary Text:
* Data from 2020 and later will have the line breaks retained in the text.
* Line breaks for older data will get included when the first opportunity to reparse older data arises. - Detailed Description Text:
* Data from 2020 and later will have the line breaks retained in the text.
* Line breaks for older data will get included when the first opportunity to reparse older data arises. - Draw Description Text: Line breaks are not included at this time.
- Location ID added to patent_assignee and patent_inventor
- Previously to identify the location of a patent by the way of the assignee, patent_assignee needed to be joined with location_assignee and then with the location table. A similar join was needed for the patent inventor. To reduce the complexity, patent_assignee and patent_inventor tables will carry an additional field: location_id. This field will map to the id field from the location table. This makes the data in location_assignee and location_inventor redundant. Future releases will not carry these two tables.
- Read In Scripts:
- Example Python & R scripts that demonstrate reading each bulk download file will be available here: Read In Scripts This is a work in progress and will be updated over time.
- Planned changes after 2020.03.22v1 release (Documentation and details will be added with the release)
- Claims:
- Remove duplicates in some of the claims yearly files where the first set of records (about 300K) are duplicated.
- Remove NULL text data in some of the claims files.
- Recode NUM field and add documentation.
- Recode Exemplary field (replacing TRUE/FALSE with 0/1)
- Re-order header to be consistent with data dictionary
- Brief Summary Text:
- Break files into yearly files
- Draw Description Text:
- Break files into yearly files
- Include line breaks in the text
- Claims:
Table | File(s) | Data Contains Line Break | Field Separator | Quote Settings | Quote Character |
---|---|---|---|---|---|
claims | Yearly files from 1976 - 2005 | No | \t | Non Numeric Fields Quoted | " |
claims | Yearly files from 2005 - 2020 | Yes | \t | Non Numeric Fields Quoted | " |
brf_sum_text | Single bulk file | Yes | \t | Non Numeric Fields Quoted | " |
detail_desc_text | 2020 data file | Yes | \t | Non Numeric Fields Quoted | " |
detail_desc_text | 2019 data file | No | \t | Non Numeric Fields Quoted | " |
detail_desc_text | Yearly files from 1976 - 2018 | No | \t | Unquoted | N/A |
draw_desc_text | Single bulk file | No | \t | Non Numeric Fields Quoted | " |
all other tables | Single bulk file | No | \t | Non Numeric Fields Quoted | " |