Structured Datasets for Singapore's Parliament Speeches

This project aims to make parliament speeches from Singapore's Parliament Hansard structured and accessible.

A structured format is an enabler. There are applications in computational linguistic analysis, classification, and political science (Dritsa et. al., 2022). Further empirical research on parliamentary discourse and its wider societal impact in recent times is ever more important, given the decisive role of parlimanets and their rapidly changing relations with the public and media (Erjavec et. al., 2023).

This effort addresses the lack of a centralised dataset for Singapore's parliamentary data analysis. Rauh et. al. (2022) observed that while more and more political text is available online in principle, bringing the various, often only rather loosely structured sources into a machine-readable format that is readily amenable to automated analysis still presents a major hurdle. Therefore, this initiative seeks to overcome that hurdle.

Disclaimer

Please note that this is an entirely independent effort, and this initiative is by no means affiliated with the Singapore Parliament nor Singapore Government.

While best efforts are made to ensure the information is accurate, there may be inevitable parsing errors. Please use the information here with caution and check the underlying data.

This repository

This repository contains code for the data pipeline which performs the following:

Extract information from the Singapore parliament's API into a JSON file.
Transform the information, primarily by way of cleaning speech text and standardising the member's names.
Load raw files into a database (BigQuery), which are modelled in a dbt repository.

The raw data which is generated includes the following:

model	description
attendance	By member, by sitting date, whether the member attended the parliamentary sitting or not.
sittings	By sitting date, the associated parliamentary sitting information (parliament number, session number, etc.)
topics	Each row represents one topic which was discussed during the parliamentary sitting.
speeches	Each row represents one paragraph of text, based on the hansard, during the parliamentary sitting. This text corresponds to a speech (or part of a speech) made by a Member of Parliament on a given topic.

The following

Cloud Composer (Airflow) (Access required)

How to contribute

If you are interested to contribute, please reach out to [email protected].

References

Dritsa, K., Thoma, A., Pavlopoulos, I., & Louridas, P. (2022). A Greek Parliament Proceedings Dataset for Computational Linguistics and Political Analysis. Advances in Neural Information Processing Systems, 35, 28874-28888.
Erjavec, T., Ogrodniczuk, M., Osenova, P., Ljubešić, N., Simov, K., Pančur, A., ... & Fišer, D. (2023). The ParlaMint corpora of parliamentary proceedings. Language resources and evaluation, 57(1), 415-448.
Rauh, C., & Schwalbach, J. (2020). The ParlSpeech V2 data set: Full-text corpora of 6.3 million parliamentary speeches in the key legislative chambers of nine representative democracies.

Name		Name	Last commit message	Last commit date
Latest commit History 135 Commits
.github/workflows		.github/workflows
notebooks		notebooks
schema		schema
scripts		scripts
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
nltk_req.py		nltk_req.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Structured Datasets for Singapore's Parliament Speeches

Disclaimer

This repository

How to contribute

References

About

Releases

Packages

Contributors 3

Languages

License

parleh-mate/singapore-parliament-speeches

Folders and files

Latest commit

History

Repository files navigation

Structured Datasets for Singapore's Parliament Speeches

Disclaimer

This repository

How to contribute

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages