Assignment complete #1

printscreen · 2025-08-10T21:15:11Z

Solace Advocate Search Improvements

This document outlines the backend and data schema changes I made to improve scalability, maintainability, and performance for the Solace assignment. We'll start with the backend and data model, then cover the frontend in a later section.

NOTE: The goal of this exercise is to communicate my ability and proficiency in building systems. I am a big fan of small, digestible pull requests. I felt that making lots of small PRs with the amount of improvements I wanted to do would make it very hard to communicate the goal of this exercise. It would be taxing to ask someone to glue 5-8 small PRs in their head, all while trying to understand all the changes. I decided to spend more time making this doc and stepping through everything I did.

Introduction

The original backend schema used a flat structure with embedded JSONB arrays for specialties and stored city and degree as plain text fields. While this works for small datasets, it does not scale well for larger, relational data or for efficient querying and filtering. My changes focus on leveraging relational database best practices, improving query performance, and making the API more maintainable and extensible. I learned early on that you are not smarter than the people who wrote PostgreSQL. This is relational data, and relational databases handle relational data really, really well.

Backend & Data Schema Changes

1. Normalized Data Model

Original Schema Example

const advocates = pgTable('advocates', {
  id: serial('id').primaryKey(),
  firstName: text('first_name').notNull(),
  lastName: text('last_name').notNull(),
  city: text('city').notNull(),
  degree: text('degree').notNull(),
  specialties: jsonb('payload').default([]).notNull(),
  yearsOfExperience: integer('years_of_experience').notNull(),
  phoneNumber: bigint('phone_number', { mode: 'number' }).notNull(),
  createdAt: timestamp('created_at').default(sql`CURRENT_TIMESTAMP`),
});

My Improved Schema

Cities and Degrees are now separate tables, referenced by foreign keys in the advocates table.
Specialties are modeled as a many-to-many relationship using a linking table (advocate_specialties).
Advocates reference cityId and degreeId as FKs, and their specialties are joined via the linking table.

export const cities = pgTable('cities', {
  id: integer('id').primaryKey(),
  name: varchar('name', { length: 255 }).notNull().unique(),
});

export const degrees = pgTable('degrees', {
  id: integer('id').primaryKey(),
  name: varchar('name', { length: 255 }).notNull().unique(),
});

export const specialties = pgTable('specialties', {
  id: integer('id').primaryKey(),
  name: varchat('name', { length: 255 }).notNull().unique(),
});

export const advocates = pgTable('advocates', {
  id: serial('id').primaryKey(),
  firstName: varchar('first_name', { length: 255 }).notNull(),
  lastName: varchar('last_name', { length: 255 }).notNull(),
  cityId: integer('city_id')
    .notNull()
    .references(() => cities.id),
  degreeId: integer('degree_id')
    .notNull()
    .references(() => degrees.id),
  yearsOfExperience: integer('years_of_experience').notNull(),
  phoneNumber: bigint('phone_number', { mode: 'number' }).notNull(),
  createdAt: timestamp('created_at').default(sql`CURRENT_TIMESTAMP`),
});

export const advocateSpecialties = pgTable('advocate_specialties', {
  advocateId: integer('advocate_id')
    .notNull()
    .references(() => advocates.id),
  specialtyId: integer('specialty_id')
    .notNull()
    .references(() => specialties.id),
});

Why Normalize?

Performance: Foreign key (FK) indexes are much faster for filtering and joining than scanning and matching strings or arrays in a JSONB column.
Scalability: As the dataset grows, relational queries with proper indexes remain fast, while JSONB array scans become slow.
Data Integrity: Referential integrity is enforced by the database, preventing orphaned or invalid references.
Flexibility: Adding, removing, or updating specialties, cities, or degrees is trivial and does not require updating every advocate record.

Why Not JSONB for Specialties?

While Postgres can index JSONB, querying for "all advocates with specialty X" requires converting arrays and using special operators, which is more complex and less efficient than a join on a linking table. Converting the JSONB to an array and then dealing with all the scalar headache isn't worth it. Plus, we would be doing that conversion every query, which is unnecessary repeated work. I went with the KISS approach (Keep it simple, stupid)
Relational databases are designed to handle relationships, and using a linking table is idiomatic, efficient, and easy to query.

2. API Endpoints for Reference Data

Created separate API routes for cities, degrees, and specialties (e.g., /api/cities, /api/degrees, /api/specialties).
These endpoints return lists of options for use in dropdowns and filters.
This separation allows for easy caching and reduces unnecessary database load.

3. Caching for Static Data

Implemented a simple in-memory cache for endpoints like cities, degrees, and specialties since this data rarely changes.
This reduces database queries and improves response times.
For production, a distributed cache like Redis would be recommended for scalability and multi-instance support. For this exercise, I wrote a simple in-memory cache.

4. Advocate Query Improvements

The /api/advocates endpoint supports filtering by all fields, including specialties, cities, degrees, and years of experience.
Used efficient SQL queries with joins and filters on indexed columns.
For specialties, used a subquery to filter advocates by selected specialties via the linking table.

Counting Results

To get the total count for pagination, I run a separate SELECT COUNT(*) query with the same filters.
While Postgres supports window functions to get the count in a single query, Drizzle's documentation recommends two queries for clarity and compatibility.
If performance becomes an issue, we could switch to a window function like:
```
SELECT *, COUNT(*) OVER() as total_count
FROM advocates
LEFT JOIN ...
WHERE ...
LIMIT ... OFFSET ...
```
This would return the total count with each row, eliminating the need for a second query.

5. Indexing for Fast Search

Added GIN indexes with the pg_trgm extension on first_name and last_name to support fast, case-insensitive substring search (ILIKE '%name%').
This is critical for performance at scale and is handled via a migration file, not in the schema definition.
Without a GIN index, the Postgres query planner would perform a sequential scan of the entire table, evaluating every record for filtering if no other selective indexes are present. This results in linear time complexity (O(n)), which does not scale well as the table grows.
Chose varchar over text for first and last name fields to semantically indicate these are short strings. In PostgreSQL, both types are stored inline for small values, but using varchar makes the intent clearer and can help prevent accidental misuse for large text data. If a large text value is accidentally stored, PostgreSQL will move it to the pg_toast table for out-of-line storage, which adds overhead and can negatively impact query performance

6. Preventing Too Many Database Connections in Development

During development, I encountered the error PostgresError: sorry, too many clients already. This happens because, in environments with hot reloading (like Next.js), the backend code can be re-executed multiple times, causing a new Postgres client to be created on each reload. As a result, the database quickly hits its connection limit, leading to this error.

To fix this, I used the globalThis object to store the Postgres client and Drizzle instance. Before creating a new client, the code checks if one already exists on globalThis and reuses it if available. This ensures that only a single connection pool is maintained during development, preventing connection leaks and excessive client creation. This pattern is widely recommended for Node.js/Next.js projects and is safe because globalThis persists across module reloads in development, but not in production.

Summary

Normalized the schema for scalability and performance.
Split out reference data (cities, degrees, specialties) into their own tables and API endpoints.
Used a linking table for specialties to support efficient many-to-many queries.
Implemented caching for static data.
Optimized advocate queries for filtering and pagination.
Added proper indexes for fast search.

Frontend & UX Improvements

The original frontend was a simple React page with a single search box, a reset button, and a table that displayed all advocates. Filtering was done entirely on the client, and all data was fetched and held in memory. While this works for small datasets, it does not scale and does not provide a modern, user-friendly experience.

1. Advanced Filtering & UX Rationale

Specialty, Degree, and Location Filters:
I introduced multi-select dropdowns for specialties, degrees, and cities. This allows users to filter advocates by any combination of these attributes.
OR Logic: I chose to implement all filters (specialties, degrees, and locations) as an "OR" condition. This means advocates are shown if they match any of the selected specialties, any of the selected degrees, or any of the selected cities. I intentionally avoided mixing "AND" and "OR" logic between filter types, as this can be confusing for users, especially since some fields (like location and degree) are mutually exclusive for a single advocate. Keeping all filters as "OR" conditions provides a consistent and predictable experience, similar to what users expect from major platforms like Amazon.
Pillboxes for Active Filters:
I added pillboxes to visually represent each active filter. This provides immediate feedback to the user about what filters are applied and allows for quick removal of any filter by clicking the "X" on the pill. This is a familiar pattern from e-commerce and search UIs, reducing cognitive load and making the interface more intuitive.

2. Pagination for Scalability

Why Pagination:
The original implementation fetched all advocates and filtered them on the client. This approach does not scale; fetching and rendering millions of records is not feasible and would result in poor performance and user experience.
Server-Side Pagination:
I implemented server-side pagination, fetching only the advocates needed for the current page. This keeps the UI fast and responsive, regardless of the total dataset size, and is a best practice for scalable web applications. We can control how beefy our servers are, but we can't control the specs of the client's computer. Always better to have control and push the heavy lifting on systems we can control.

3. Multi-Select Dropdowns

Componentization:
I built a reusable MultiSelectFilter component for specialties, degrees, and cities. This component supports selecting multiple options, displays the count of selected items, and is keyboard accessible.
UX:
Multi-select dropdowns are a familiar and efficient way for users to select multiple filters without cluttering the UI with dozens of checkboxes.

4. Branding & Visual Design

Tailwind Customization:
I configured Tailwind CSS to use the Solace brand color (#347866) throughout the UI for buttons, pillboxes, and highlights. This ensures a cohesive and professional look that matches the company's identity. One concern I did not address with the colors is any a11y issues with the color scheme. It might not be a11y friendly for people with red/green color blindness.
Brand Logo:
I extracted the SVG logo from the Solace website and created a reusable React component for it, placing it prominently in the page header.

5. Other UX Enhancements

Clear Filter Feedback:
The UI always shows which filters are active, and users can remove any filter with a single click.
Responsive Layout:
The filter section and table are responsive and look good on various screen sizes.
Accessible Controls:
All form controls are labeled, and the dropdowns are keyboard navigable.
No Sorting for Now:
I chose not to implement column sorting, as most fields are text-based and do not lend themselves to meaningful ordering. The only numeric field, years of experience, can be filtered by setting a minimum value, which is more useful for this context.

6. Specialties Display in the Table

Reducing Visual Noise:
Initially, displaying all specialties for each advocate in the table made the UI cluttered and overwhelming, especially for advocates with many specialties. To improve readability and reduce noise, I truncated the list to show only the first two specialties by default. If an advocate has more specialties, a "+N more" link appears, allowing users to expand and view the full list on demand. This keeps the table clean and focused, while still making all information accessible.
Prioritizing Filtered Specialties:
When a user filters by specialty, advocates who match the filter may have many specialties. To make the filtered results more meaningful and user-friendly, I biased the display order so that the specialties matching the user's filter appear first in the truncated list. This ensures that the most relevant information is immediately visible, and users can quickly see why a particular advocate matched their search criteria, even before expanding the full list.

7. Summary of Frontend Improvements

Modern, scalable filtering UI with multi-selects and pillboxes.
Server-side pagination for performance and scalability.
Brand-consistent design with custom colors and logo.
Component-based architecture for maintainability and reuse.
User-centric UX decisions inspired by best practices from leading platforms.

Assignment complete

187aa81

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Assignment complete #1

Assignment complete #1

Uh oh!

printscreen commented Aug 10, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Assignment complete #1

Are you sure you want to change the base?

Assignment complete #1

Uh oh!

Conversation

printscreen commented Aug 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Solace Advocate Search Improvements

Introduction

Backend & Data Schema Changes

1. Normalized Data Model

Original Schema Example

My Improved Schema

Why Normalize?

Why Not JSONB for Specialties?

2. API Endpoints for Reference Data

3. Caching for Static Data

4. Advocate Query Improvements

Counting Results

5. Indexing for Fast Search

6. Preventing Too Many Database Connections in Development

Summary

Frontend & UX Improvements

1. Advanced Filtering & UX Rationale

2. Pagination for Scalability

3. Multi-Select Dropdowns

4. Branding & Visual Design

5. Other UX Enhancements

6. Specialties Display in the Table

7. Summary of Frontend Improvements

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

printscreen commented Aug 10, 2025 •

edited

Loading