https://cs.uwaterloo.ca/~jimmylin/publications/Busch_etal_ICDE2012.pdf
References
Summary:
- Requirements:
- low latency, high throughout query evaluation, real-time ability to ingest content rapidly.
- 2B queries a day → 2 * 10^9 / 10^5 = 2 * 10^4 = ~20k QPS
- Indexes only last week worth of Tweets in memory. No historical index
- Replaced MySQL based inverted index that leveraged B-Trees and transactions.
- Shards built on top of Lucene with single-writer multiple-reader lock free algorithm. Support for real-time indexing and highly-concurrent query evaluation
- Simple solution to concurrent reads and writes can be deploying indexes in atomic swaps.
- Queries are first routed to appropriate DC based on various query considerations (query load, link latencies, cost of electricity, etc). Each DC has cluster with hundreds to thousands of machines. Machines are organized in a replicated, broker-coordinated, document-partitioned
- Search engines typically have 2 phases → Cheap algo to generate candidate lists and more expensive one to resort.
- Partitioning strategies
- Verticals → news, images, etc → can improve customization and throughtput but makes merging and sorting of different verticals in the results set hard.
- Divide indexes into tiers in terms of quality. Not used in twitter
- 3 types of signal
- static, added at indexing time
- Resonance → dynamically updated over time
- info about the searcher, provided with the query at query time
- At any one time at most 1 index segment is updated by a single thread
- Server data
- 64 GB heap, Segment is 16M documents, 9 segments up to 140M documents on a single machine. ( 10 machines is 1.5 B documents , for 1 Trillion documents → 10 000 machines, assumes document is capped at 140 characters.)
- 1 Segment → 17k QPS with 8 searcher threads. Fully loaded server (9 segments) → 5k QPS with ~ 120 ms latency p95
Strategy
- Isolate the scope of index updates. E.g only 1 segment is updated at any 1 time. The rest are read-only optimized data structure
Decisions
- Sort the posting list in reverse chronological order so that the query workers will always traverse the newest documents first. (This is opposed to the default, also some strategies sort by the frequency and other metrics)
- Have write optimized data structure for the live segment and convert it to immutable read-optimized data structure once rdy.