Skip to content

Commit 6aeb74f

Browse files
authored
Adds Form Element (#4272)
The form class currently maps to NarrativeText. This updates it to Form and adds a new class for form.
1 parent ac14f57 commit 6aeb74f

File tree

4 files changed

+22
-2
lines changed

4 files changed

+22
-2
lines changed

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
## 0.21.10
2+
- **Add Form Class**: Adds a new form class in elements.py to deal with forms
3+
14
## 0.21.9
25
- Add a fallback to use the filetype library to recover from incorrect results form libmagic
36

test_unstructured/staging/test_base.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
ElementMetadata,
2222
ElementType,
2323
FigureCaption,
24+
Form,
2425
Image,
2526
Link,
2627
ListItem,
@@ -115,6 +116,16 @@ def test_elements_from_dicts():
115116
]
116117

117118

119+
def test_elements_from_dicts_form():
120+
element_dicts = [
121+
{"text": "Applicant Name: Jane Doe", "type": "Form"},
122+
]
123+
124+
elements = base.elements_from_dicts(element_dicts)
125+
126+
assert elements == [Form(text="Applicant Name: Jane Doe")]
127+
128+
118129
def test_convert_to_csv(tmp_path: str):
119130
output_csv_path = os.path.join(tmp_path, "isd_data.csv")
120131
elements = [Title(text="Title 1"), NarrativeText(text="Narrative 1")]

unstructured/__version__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.21.9" # pragma: no cover
1+
__version__ = "0.21.10" # pragma: no cover

unstructured/documents/elements.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -900,6 +900,12 @@ class NarrativeText(Text):
900900
category = "NarrativeText"
901901

902902

903+
class Form(Text):
904+
"""An element for capturing form text."""
905+
906+
category = "Form"
907+
908+
903909
class ListItem(Text):
904910
"""ListItem is a NarrativeText element that is part of a list."""
905911

@@ -1000,7 +1006,7 @@ class DocumentData(Text):
10001006
# this mapping favors ensures yolox produces backward compatible categories
10011007
ElementType.ABSTRACT: NarrativeText,
10021008
ElementType.THREADING: NarrativeText,
1003-
ElementType.FORM: NarrativeText,
1009+
ElementType.FORM: Form,
10041010
ElementType.VALUE: NarrativeText,
10051011
ElementType.LINK: NarrativeText,
10061012
ElementType.LIST_ITEM: ListItem,

0 commit comments

Comments
 (0)