Address Coding Project

This project is designed to process and code address data from the CBDB address code table. The main script, code_addr.py, reads input data, processes it, and outputs coded address information.

Files

code_addr.py: Main script for processing and coding address data.
addr_data_schema.xlsx: Schema for address data.
ADDRESSES.txt: Processed address data.
cbdb_entity_address_types.csv: List of address types.
input_small.txt: Small input dataset for testing.
input.txt: Main input dataset.
output.txt: Output file containing coded address data.
ZZZ_ADDRESSES.xlsx: Original address data in Excel format.

Usage

Install Dependencies
Ensure the required dependencies are installed. Use the following command to install them:
```
pip install pandas char-converter
```

Load Your Input Data
Prepare your input data based on the following schema:

id    dy    addr_name    addr_belong    time
1     宋    甌寧          建州            1279
2     清    江南太平府     no_info         no_info

Save the input data in input.txt.

Run the Script
Execute the script to process the input data and generate the output:
```
python code_addr.py
```

Notes

To convert variants to simplified Chinese as part of a standardization step, modify the script:
Change
```
use_char_converter = False
```
to
```
use_char_converter = True
```
The script processes address data by reading from ZZZ_ADDRESSES.xlsx. You can download the latest version of ZZZ_ADDRESSES.xlsx from CBDB on Hugging Face.

License

This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Address Coding Project

Files

Usage

Notes

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.gitattributes		.gitattributes
ADDRESSES.txt		ADDRESSES.txt
README.md		README.md
ZZZ_ADDRESSES.xlsx		ZZZ_ADDRESSES.xlsx
addr_data_schema.xlsx		addr_data_schema.xlsx
cbdb_entity_address_types.csv		cbdb_entity_address_types.csv
code_addr.py		code_addr.py
input.txt		input.txt
input_small.txt		input_small.txt
output.txt		output.txt

cbdb-project/code-addr

Folders and files

Latest commit

History

Repository files navigation

Address Coding Project

Files

Usage

Notes

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages