Misplaced Pages

Yahoo Sherpa

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Cloud storage platform
This article contains promotional content. Please help improve it by removing promotional language and inappropriate external links, and by adding encyclopedic text written from a neutral point of view. (June 2018) (Learn how and when to remove this message)
This article relies largely or entirely on a single source. Relevant discussion may be found on the talk page. Please help improve this article by introducing citations to additional sources.
Find sources: "Yahoo Sherpa" – news · newspapers · books · scholar · JSTOR (June 2024)
Sherpa
Developer(s)Yahoo!
Written inC++, PHP
Operating systemLinux
Typekey-value store

Sherpa is a cloud storage platform developed by Yahoo!. It is a hosted, distributed, and geographically replicated key-value data store. The service is a NoSQL system that address the scalability, availability, and latency needs of the conglomerate's websites. Sherpa has abilities such as elastic growth, multi-tenancy, global footprint for local low-latency access, asynchronous replication, representational state transfer (REST) based web service APIs, novel per-record consistency knobs, high availability, compression, secondary indexes, and record-level replication.

Architecture

Sherpa is a multi-tenant system. An application can store data in a table, which is a collection of records. A table is shared into smaller pieces called tablets. Data is shared based on the hash value of the key, or range partitioned. Tablets are stored on nodes referred to as storage units. A software routing layer keeps track of mapping between applications tablets and storage units. Applications send requests to the router, which forwards them to the correct storage unit based on the tablet map. Clients can get, set, delete, and scan records via unique record primary keys.

Data model

Sherpa's data model is a key-value store where data is stored as JSON blobs. Data is organized in tables where primary key uniqueness can be enforced, but other than that, there are no fixed schemas. It supports single-table scans with predicates. Customers can choose a variety of table types: distributed hash table, distributed ordered table, and mastered and merging tables. Application-specific access patterns determine the suitability of each table type. Query patterns affect key definition.

Features

Scalability

Sherpa scales by partitioning data: data partitions are called tablets. Each customer-defined table is partitioned into tablets. Thus, tablets are both units of work assignment and tenancy. Each tablet contains a range of records. Sherpa can scale to very large numbers of tables, tablets and records.

Elasticity

The system scales horizontally as newer machines are added, with no downtime to applications. Other elasticity operations include data partition assignment, reassignment and splitting.

Fault-tolerance

Data is automatically replicated to multiple nodes for fault tolerance. Replication across multiple data centers is supported. Single-node failure is transparent to the applications. Sherpa relies on a reliable transaction message bus for replicating transactions. This message bus guarantees at-least-once delivery of transaction messages.

Tunable consistency

Sherpa supports different levels of consistency, ranging from per-record timeline consistency where all writes are serialized to a master copy, to eventual consistency.

Selective record replication

Replication granularity occurs at the levels of records and tables.

Backup

The Backup feature allows multiple old copies of the full table to be saved in offline storage. From this offline storage, customers may retrieve old versions of individual records.

Secondary indexes

Many applications need to access data via non-primary key data fields. Sherpa supports asynchronous secondary indexes.

References

  1. Klems, Markus; Silberstein, Adam; Chen, Jianjun; Mortazavi, Masood; Albert, Sahaya Andrews; Narayan, P.P.S.; Tumbde, Adwait; Cooper, Brian (2012-10-29). "The Yahoo!". Proceedings of the fourth international workshop on Cloud data management. CloudDB '12. New York, NY, USA: Association for Computing Machinery. pp. 33–40. doi:10.1145/2390021.2390028. ISBN 978-1-4503-1708-5. S2CID 16219164.


Yahoo
Websites
Communication
Corporate
Defunct services
Related people
Related
Category
Categories: