Skip to content

Compress search index files using gzip 🗜#400

Merged
CoralPink merged 7 commits intomainfrom
use-brotli
Apr 2, 2025
Merged

Compress search index files using gzip 🗜#400
CoralPink merged 7 commits intomainfrom
use-brotli

Conversation

@CoralPink
Copy link
Owner

@CoralPink CoralPink commented Apr 2, 2025

This PR prepares a compressed search index file using gzip and also modifies the site to use it.

This dramatically reduces the index file size!!
2265296 -> 404117 (One digit off...! 🤣)

There is a possibility that the use of DecompressionStream may limit the browsers that can use the search function,
but it is not a problem as long as the browsers are used after 2023 May.

refs: https://developer.mozilla.org/ja/docs/Web/API/DecompressionStream

I really wanted to use brotli instead of gzip, but I found out after creating the PR that Firefox and Chromium cannot be used as is...😓

However, with brotli, the same index file is further compressed down to `243971'.

I'll rethink how to use brotli after I merge this PR!

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Enhanced search functionality by loading a compressed configuration file, which may improve performance and reliability.
  • Chores

    • Updated the deployment process to include a new step for compressing search index files and improved cleanup of unnecessary files.
    • Upgraded caching mechanisms to version 7.0.0, ensuring users receive the most up-to-date resources.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Apr 2, 2025

Walkthrough

The changes update the deployment and search configuration processes. In the GitHub Actions workflow, a new "Compress Search Index" step is added to compress the generated JSON file, and the cleanup step now removes additional files. The debug script similarly adds compression and expands its file cleanup. In the search module, a new asynchronous function is introduced to fetch and decompress a compressed file, while the service worker’s cache version is updated.

Changes

File(s) Change Summary
.github/workflows/gh-pages.yml, debug.sh Workflow & Script Updates: Added a step to compress searchindex.json using gzip; updated cleanup to remove searchindex.js.
js/searcher.ts Search Feature Update: Introduced a new async function fetchAndDecompress to fetch compressed search index data and modified initSearch to load searchindex.json.gz via decompression.
js/serviceworker.ts Service Worker Cache Update: Updated CACHE_VERSION from v6.1.0 to v7.0.0.

Sequence Diagram(s)

sequenceDiagram
    participant Dev as Developer
    participant GitHub as GitHub Actions
    participant Build as Build Process

    Dev->>GitHub: Push commit (any branch)
    GitHub->>Build: Trigger build-and-deploy job
    Build->>Build: Execute build steps
    Build->>Build: "Compress Search Index" (gzip, compress searchindex.json)
    Build->>Build: Cleanup (remove searchindex.js)
    Build->>GitHub: Deploy site changes
Loading
sequenceDiagram
    participant Client as Browser
    participant Searcher as fetchAndDecompress()
    participant Network as fetchRequest()

    Client->>Searcher: Invoke initSearch()
    Searcher->>Network: fetchRequest(url: searchindex.json.gz)
    Network-->>Searcher: Return compressed data
    Searcher->>Searcher: Decompress using DecompressionStream
    Searcher->>Client: Return parsed JSON
Loading

Possibly related PRs

Poem

I’m a rabbit in the code field, leaping with delight,
Compressing files with gzip, making everything right.
The branch filter’s gone, cleanup is neat,
Search data’s decompressed – a truly nifty feat!
Hop on through deployment, where bugs take flight,
Celebrating changes with a bounce and byte! 🐰


📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 88290ad and 11acd0f.

📒 Files selected for processing (3)
  • .github/workflows/gh-pages.yml (2 hunks)
  • debug.sh (2 hunks)
  • js/searcher.ts (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • debug.sh
  • .github/workflows/gh-pages.yml
  • js/searcher.ts

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
js/searcher.ts (1)

223-224: Consider adding proper TypeScript type definition

The lint ignore comment for explicit any type suggests TypeScript doesn't have proper type definitions for the 'brotli' format in DecompressionStream.

Consider creating a custom type declaration for DecompressionStream that includes 'brotli' as a valid format to avoid using 'any':

- /* biome-ignore lint: no-explicit-any */
- const stream = response.body.pipeThrough(new DecompressionStream('brotli' as any));
+ // Add to a .d.ts file:
+ // declare type CompressionFormat = 'gzip' | 'deflate' | 'deflate-raw' | 'brotli';
+ // declare class DecompressionStream {
+ //   constructor(format: CompressionFormat);
+ // }
+ const stream = response.body.pipeThrough(new DecompressionStream('brotli'));
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae0b12b and e56f51c.

📒 Files selected for processing (4)
  • .github/workflows/gh-pages.yml (3 hunks)
  • debug.sh (2 hunks)
  • js/searcher.ts (2 hunks)
  • js/serviceworker.ts (1 hunks)
🔇 Additional comments (8)
js/serviceworker.ts (1)

3-3: CACHE_VERSION update appropriately reflects major changes

The cache version has been bumped from v6.1.0 to v7.0.0, which is appropriate for a significant change in the caching strategy. This major version increment ensures that clients will fetch the new compressed search index file rather than using cached versions of the uncompressed file.

debug.sh (2)

54-54: Brotli compression matches the PR objective

Adding Brotli compression for the search index after generation is consistent with the goal of reducing file size. This addition correctly compresses the file and creates a .br extension file that will be used by the site.


66-67: Cleanup of original files is appropriate

Uncommented lines to remove both the original JavaScript and JSON files after compression is good practice to keep only the compressed version in the final build.

js/searcher.ts (2)

244-244: Search index file path correctly updated

Updating the file path to fetch the compressed .br file matches the changes made in the build process. This change ensures the site uses the smaller compressed file, achieving the PR's primary objective.


216-228:

Details

✅ Verification successful

New decompression implementation looks good

The implementation of fetchAndDecompress correctly handles fetching and decompressing the Brotli-compressed search index file. The function properly:

  1. Uses the existing fetchRequest function with error handling
  2. Checks for null response body
  3. Decompresses the data using DecompressionStream
  4. Parses the result as JSON

🌐 Web query:

What browsers support DecompressionStream with brotli?

💡 Result:

Brotli decompression via the DecompressionStream API is supported in most modern browsers as of 2025, though implementation details vary. Below is a breakdown of support across major browsers:

Browser Support for DecompressionStream with Brotli

  • Chrome: Supported from version 50 onward[5].
  • Edge: Supported from version 15 onward[5].
  • Firefox: Supported from version 44 onward[5].
  • Safari: Supported from version 11 onward (including iOS 11+)[5][1].
  • Opera: Supported from version 38 onward[5].
  • Android Browser: Supported from version 97 onward[1].

Key Considerations

  • HTTPS Requirement: Brotli compression only works over HTTPS connections[2][7].
  • Performance: Brotli typically achieves 18-26% better compression ratios than Gzip, especially at higher compression levels (e.g., level 11)[2][7].
  • Fallback Handling: Browsers without Brotli support automatically fall back to Gzip or uncompressed content[2][5].

Limitations

  • Some security/firewall solutions like Palo Alto Networks only added Brotli decompression support in November 2024[8].
  • Older browsers (e.g., IE, Safari ≤10) and legacy mobile browsers lack support[1][5].
  • Stream decompression in Web Workers requires WASM polyfills like brotli-wasm for full cross-browser compatibility[3][6].

For developers, the recommended implementation pattern is:

const decompressionStream = new DecompressionStream('br');
const decompressedStream = response.body.pipeThrough(decompressionStream);

This works natively in supported browsers, while unsupported environments can use the brotli-wasm library[3].

Citations:


Decompression Implementation Verified: Browser Support Confirmed

The updated fetchAndDecompress function in js/searcher.ts (lines 216–228) correctly handles fetching, error checking, decompression via the DecompressionStream API, and JSON parsing of the Brotli-compressed search index file. Our verification confirms that browsers including Chrome (v50+), Edge (v15+), Firefox (v44+), Safari (v11+), Opera (v38+), and Android Browser (v97+) support stream decompression with Brotli as of 2025.

  • The function properly checks for a null response body.
  • It decompresses the fetched data and decodes it into JSON.
  • The explicit cast ('brotli' as any) bypasses type-checking; consider using the standard identifier ('br') if possible for clarity and conformity with common usage.

Overall, the implementation meets the requirements.

.github/workflows/gh-pages.yml (3)

277-277: Verify if branch restriction removal is intentional

Commenting out the branch filter for the build-and-deploy job means it will run on any branch or PR, not just main. This could lead to deployment attempts from non-main branches.

Is this intentional? Typically, GitHub Pages deployments are restricted to specific branches. Check if this change is meant for testing during development and should be restored before merging.


344-349: Brotli integration looks good

The new step to install and use Brotli for compressing the search index in the CI/CD pipeline is well-implemented. Using Homebrew to install Brotli is appropriate for the macOS runner.


361-362: Appropriate cleanup of original files

The removal of both the original JavaScript and JSON search index files is appropriate since they're no longer needed after compression, keeping the deployment clean and minimizing unnecessary files.

@CoralPink
Copy link
Owner Author

I don't know if it's “I'm glad I noticed it before you did” or “You should have noticed it before you did PR”,
but apparently it only works well in Safari...

I'll have to think about what to do...😓

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
js/searcher.ts (2)

223-224: Improve type safety by avoiding as any cast

The linter ignore comment suggests a problem with type definitions. TypeScript may not recognize 'gzip' as a valid format string for DecompressionStream.

Consider creating a proper type definition or using a more type-safe approach:

-  /* biome-ignore lint: no-explicit-any */
-  const stream = response.body.pipeThrough(new DecompressionStream('gzip' as any));
+  // Define acceptable compression formats
+  type CompressionFormat = 'gzip' | 'deflate' | 'br';
+  const format: CompressionFormat = 'gzip';
+  const stream = response.body.pipeThrough(new DecompressionStream(format));

226-227: Optimize TextDecoder usage

A new TextDecoder instance is created for each function call, which is inefficient for repeated usage.

Consider creating a reusable TextDecoder instance:

+// Create a reusable TextDecoder instance
+const decoder = new TextDecoder();
+
const fetchAndDecompress = async (url: string) => {
  const response = await fetchRequest(url);

  if (!response.body) {
    throw new Error('Response body is null');
  }

  /* biome-ignore lint: no-explicit-any */
  const stream = response.body.pipeThrough(new DecompressionStream('gzip' as any));
  const decompressed = await new Response(stream).arrayBuffer();

-  return JSON.parse(new TextDecoder().decode(decompressed));
+  return JSON.parse(decoder.decode(decompressed));
};
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e56f51c and 8c6302b.

⛔ Files ignored due to path filters (1)
  • js/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (3)
  • .github/workflows/gh-pages.yml (3 hunks)
  • debug.sh (2 hunks)
  • js/searcher.ts (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • debug.sh
  • .github/workflows/gh-pages.yml
🔇 Additional comments (1)
js/searcher.ts (1)

244-244:

Details

✅ Verification successful

Update file extension to match compression algorithm

The file extension should match the compression algorithm being used.

Verify that the build process is correctly generating a file with the appropriate extension. If using gzip, the file should be searchindex.json.gz; if using Brotli, it should be searchindex.json.br.


🏁 Script executed:

#!/bin/bash
# Check if the correct compressed file exists and what compression is being used

# Look for compressed search index files
echo "Compressed search index files:"
fd "searchindex.json" --extension gz
fd "searchindex.json" --extension br

# Examine file type of compressed files if they exist
echo "File type information:"
fd "searchindex.json.gz" --exec file {} \; 2>/dev/null || echo "No .gz file found"
fd "searchindex.json.br" --exec file {} \; 2>/dev/null || echo "No .br file found"

Length of output: 291


File extension matches the current compression setup

The verification indicates that only a gzip-compressed file (using the .gz extension) is present. Since no Brotli-compressed (.br) version was found, the current reference to searchindex.json.gz is appropriate. If the compression algorithm changes in the future, please update the file extension accordingly.

@CoralPink CoralPink changed the title Use brotli Compress search index files using gzip 🗜 Apr 2, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8c6302b and 88290ad.

📒 Files selected for processing (1)
  • .github/workflows/gh-pages.yml (3 hunks)
🔇 Additional comments (1)
.github/workflows/gh-pages.yml (1)

277-277: Clarify Branch Restriction Removal

The branch filter condition on line 277 has been commented out. Please confirm that removing the restriction (i.e. allowing the build-and-deploy job to run from every branch instead of only main) is intentional. This could potentially lead to deployments from unintended branches if not monitored carefully.

@CoralPink
Copy link
Owner Author

Makes me a little nervous, but I guess it's ok...

Let's merge!!

@CoralPink CoralPink merged commit ffe585d into main Apr 2, 2025
12 checks passed
@CoralPink CoralPink deleted the use-brotli branch April 2, 2025 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant