Skip to content

This repository contains a Python function for analyzing text, identifying the most frequent word and its occurrence count using regular expressions and the Counter class from the collections module.

Notifications You must be signed in to change notification settings

Kinetics20/the_most_common_word

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📚 The Most Common Word

Python PyCharm Ubuntu Kali Linux

This project provides a Python function that identifies the most frequently occurring word in a given text and returns its count. It includes tests to validate its functionality, ensuring reliability and robustness.

🛠️ Features

  • Extracts and counts words using regular expressions and Python's Counter class.
  • Handles case insensitivity, special characters, and edge cases like empty text or ties.
  • Fully tested with pytest and type-checked using mypy.

🔧 Installation

This project uses Poetry for dependency management. To set up the environment, follow these steps:

  1. Clone the repository:
    git clone [email protected]:Kinetics20/the_most_common_word.git
    cd the_most_common_word

Install Dependencies

To install Poetry, Mypy and Pytest on Linux or macOS, use the following command:

pipx install poetry
poetry add mypy
poetry add 'pytest==8.3.4'

🚀 Usage

The main function most_common_word analyzes a given text and returns the most frequent word.
Example Code:

from common_word import most_common_word

text = "Home is where I feel safe, but the house I grew up in will always be home to me."
result = most_common_word(text)
print(result)  # Output: ('home', 2)

✅ Tests

The project includes unit tests written using pytest to validate the function’s behavior.
Run Tests:

pytest test_common_word.py -vv

Test Results:

================================================================ test session starts =================================================================
platform linux -- Python 3.12.3, pytest-8.3.4
collected 6 items

test_common_word.py::test_most_common_word_basic PASSED                                                                                        
test_common_word.py::test_most_common_word_case_insensitivity PASSED                                                                           
test_common_word.py::test_most_common_word_empty_text PASSED                                                                                   
test_common_word.py::test_most_common_word_special_characters PASSED                                                                           
test_common_word.py::test_most_common_word_tie PASSED                                                                                          
test_common_word.py::test_most_common_word_single_word PASSED                                                                                  

================================================================= 6 passed in 0.02s ================================================================

🧪 Type Checking

Static typing was enforced with mypy for code clarity and safety.
Run mypy:

mypy common_word.py

📄 Function Description

import re
from collections import Counter

def most_common_word(txt: str) -> tuple[str, int]:
    """
    Function returns the most frequently occurring word in the text and its count.

    Args:
         txt (str): The text to analyse.

    Returns:
        tuple[str, int]: The most frequently occurring word in the text.
    """
    words: list[str] = re.findall(r'\b\w+\b', txt.lower())
    if not words:
        return '', 0

    word_counts: Counter[str] = Counter(words)
    return word_counts.most_common(n=1)[0]

📝 Test Cases

Test Case Input Expected Output
Basic Word Count "Home is where I feel safe, but the house I grew up in will always be home to me." ('home', 2)
Case Insensitivity "In a small city... the City of Angels... city’s vibrant energy." ('city', 3)
Empty Text "" ('', 0)
Special Characters "@ The sun rises... $sun is$ high in the sky... feel the warmth^ of the sun..." ('the', 6)
Tie Between Words "Birds fly high, and fish swim deep while birds and fish explore nature." ('birds', 2)
Single Word "Home" ('home', 1)

💻 Technologies Used

  • Python (3.12.3)
  • Poetry for dependency management
  • pytest for unit testing
  • mypy for static type checking
  • re module for regular expressions
  • collections.Counter for word counting

🎉 Results

All tests passed successfully, confirming that the function works as expected across various scenarios.

🧑‍💻 Author

Piotr Lipiński

Feel free to contribute, submit issues, or ask questions! 😊

About

This repository contains a Python function for analyzing text, identifying the most frequent word and its occurrence count using regular expressions and the Counter class from the collections module.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages