This repository was archived by the owner on Sep 30, 2024. It is now read-only.
This repository was archived by the owner on Sep 30, 2024. It is now read-only.
Syntactic indexing with heuristics #58727
Closed as not planned
Description
Subpart of https://github.com/sourcegraph/sourcegraph/issues/58005
For this quarter:
- Add CLI which can generate SCIP indexes based on heuristics https://github.com/sourcegraph/sourcegraph/pull/57664
Build and publish CLIThe CLI executable can be consumed directly via Bazel, so this isn't needed.- Syntactic indexing: add an option to CLI for reading data directly from git objects #60048
- Create new worker for batch indexing
- Scaffold the worker and Bazel build for it: https://github.com/sourcegraph/sourcegraph/pull/59747
- Add new DB table to hold an indexing queue https://github.com/sourcegraph/sourcegraph/issues/59794
- Create scheduling worker (similar to autoindexing) to periodically schedule jobs for matching repositories https://github.com/sourcegraph/sourcegraph/issues/59795
- MVP (1/2): add indexing functionality to the worker - with eager repo
cloning as part of MVP, moving to a better strategy later https://github.com/sourcegraph/sourcegraph/issues/59802 - MVP (2/2): add upload functionality to the worker https://github.com/sourcegraph/sourcegraph/issues/59801
- Implement monitoring and Grafana dashboards for the worker https://github.com/sourcegraph/sourcegraph/issues/59796
- Implement persistent Git caching - we can consider
searcher
component for inspiration https://github.com/sourcegraph/sourcegraph/issues/59800 - Add gRPC (or just HTTP?) endpoint to the worker service to queue a reindex (internal API) (for GRPC see https://sourcegraph.com/docs/dev/background-information/grpc_tutorial)
- Add a GraphQL mutation to schedule reindexing (public API)
- Add worker Kubernetes configuration: https://github.com/sourcegraph/deploy-sourcegraph/tree/master/base
- Add worker docker-compose configuration: https://github.com/sourcegraph/deploy-sourcegraph-docker/
- Add worker to single-container
cmd/server
configuration: https://github.com/sourcegraph/sourcegraph/tree/main/cmd/server - Update architecture diagram: https://docs.sourcegraph.com/dev/background-information/architecture
- Update documentation as per the new service checklist: https://docs.sourcegraph.com/dev/background-information/architecture/introducing_a_new_service
- Support ingesting & storing syntactic SCIP indexes
- https://github.com/sourcegraph/sourcegraph/issues/61422
- Update "precise-code-intel-worker" (name is now wrong?)
Stretch goals (these will most likely be pushed to the next quarter)
- Update worker with policies & scheduling for batch indexing jobs
- New GraphQL resolver or fields to request batch indexing
- Database schema changes (+ associated migration) for tracking heuristics-based indexes
- Changes to frontend code to prefer heuristics-based results before falling back to text based search
- Observability
- Analytics
- User-facing documentation
Once the feature works end-to-end (which should hopefully be sometime next quarter), we can start rolling out support for the most popular languages first: Java, TypeScript, JavaScript, Python, Go.