YCSB paper states: We also hope to foster the development of additional cloud benchmark suites that represent other classes of applications by making our benchmark tool available via open source. In this regard, a key feature of the YCSB framework/tool is that it is extensible—it supports easy definition of new workloads, in addition to making it easy to benchmark new systems.
Sample customer document Document Key: 100_advjson { "_id": "100_advjson", "doc_id": 100, "gid": "48a8e177-15e5-5116-95d0-41478601bbdd", "first_name": "Stella", "middle_name": "Jackson", "last_name": "Toy", "ballance_current": "$1084.94", "dob": "2016-05-11", "email": "[email protected]", "isActive": true, "linear_score": 31, "weighted_score": 40, "phone_country": "fr", "phone_by_country": "01 80 03 25 39", "age_group": "child", "age_by_group": 12, "url_protocol": "http", "url_site": "twitter", "url_domain": "gov", "url": "http://www.twitter.gov/Stella", "devices": [ "EE-245", "FF-012", "GG-789", "HH-246" ], "linked_devices": [ [ "AA-038", "BB-577" ], [ "OO-565", "KK-448", "FF-281" ], [ "BB-495", "AA-374" ], [ "BB-609", "VV-899", "LL-675", "BB-291" ], [ "CC-048" ] ], "address": { "street": "6392 Crona Rue Curve", "city": "Simeonland", "zip": "98316", "country": "Bahrain", "prev_address": { "street": "9063 Johns Islands Divide", "city": "South Jayme", "zip": "34950-8194", "country": "Bulgaria", "property_current_owner": { "first_name": "Weston", "middle_name": "Clyde", "last_name": "Considine", "phone": "(665) 343-9468" } } }, "children": [ { "first_name": "Darrel", "gender": null, "age": 10 }, { "first_name": "Shea", "gender": null, "age": 6 } ], "visited_places": [ { "country": "Iran", "cities": [ "Heidenreichshire", "West Luciano", "Haroldmouth", "West Jakeburgh" ] }, { "country": "Comoros", "cities": [ "New Valliemouth", "East Kaleighland" ] }, { "country": "Israel", "cities": [ "East Kali", "Pabloport" ] }, { "country": "French Guiana", "cities": [ "North Zachary", "Kielmouth" ] } ] } See the appendix for the YAML file used to define the data model and domain.
KV=true: KV call to insert KV=false: INSERT INTO customer VALUES(...)
KV=true: KV call to UPDATE a single document. KV=false: UPDATE customer SET field1 = value USE KEYS [documentkey]Read: Read a JSON document, either one randomly chosen field in the document or all the fields.
KV=true: KV call to fetch a single document. KV=false: SELECT * FROM customer USE KEYS [documentkey]
KV=true: KV call to fetch a single document. KV=false: SELECT * FROM customer USE KEYS [documentkey]
KV=true: KV call to fetch a single document. KV=false: DELETE FROM customer USE KEYS [documentkey]
KV=TRUE: SELECT META().id FROM customer WHERE META().id > “val” ORDER BY META().id LIMIT <num> Fetch the actual documents directly using KV calls from the benchmark driver. KV=false: SELECT * FROM customer WHERE META().id > “val” ORDER BY META().id LIMIT <num>
All customers in address.zip with randomly chosen OFFSET and LIMIT in SQL, N1QL KV=TRUE: SELECT META().id FROM customer WHERE address.zip = “value” OFFSET <num> LIMIT <num> Fetch the actual documents directly using KV calls from the benchmark driver. KV=false: SELECT * FROM customer WHERE address.zip = “value” OFFSET <num> LIMIT <num>
All customers WHERE (country = “value1” AND age_group = “value2” and YEAR(dob) = “value” ) All customers retrieved with randomly chosen OFFSET and LIMIT in SQL, N1QL KV=TRUE: SELECT META().id FROM customer WHERE country = “value1” AND age_group = “value2” and YEAR(dob) = “value” ORDER BY country, age_group, YEAR(dob) OFFSET <num> LIMIT <num> Fetch the actual documents directly using KV calls from the benchmark driver. KV=false: SELECT * FROM customer WHERE WHERE country = “value1” AND age_group = “value2” and YEAR(dob) = “value” ORDER BY country, age_group, YEAR(dob) OFFSET <num> LIMIT <num>
KV=TRUE: SELECT META().id FROM customer WHERE address.prev_address.zip = “value” LIMIT <num> Fetch the actual documents directly using KV calls from the benchmark driver. KV=false: SELECT * FROM customer WHERE address.prev_address.zip = “value” LIMIT <num>
Find all customers who have devices with a value. E.g. FF-012 Sample devices field "devices": [ "EE-245", "FF-012", "GG-789", "HH-246" ], KV=TRUE: SELECT META().id FROM customer WHERE ANY v IN devices SATISFIES v = “FF-012” END ORDER BY META().id LIMIT <num> Fetch the actual documents directly using KV calls from the benchmark driver. KV=false: SELECT * FROM customer WHERE ANY v IN devices SATISFIES v = “FF-012” ORDER BY META().id END LIMIT <num>
SELECT META().id FROM customer WHERE ANY v in visited_places SATISFIES v.country = “France” AND ANY c in v.cities SATISFIES c = “Paris” END ORDER BY META().id LIMIT <num>
SELECT * FROM customer WHERE ANY v in visited_places SATISFIES v.country = “France” AND ANY c in v.cities SATISFIES c = “Paris” END END ORDER BY META().id LIMIT <num>
Each customer has multiple orders. Order document has order details. KV=TRUE: Not possible (easily without significant perf impact. KV=false: SELECT * FROM customer c INNER JOIN orders o ON (META(id) IN c.order_list) WHERE address.zip = "val" ANSI JOIN with HASH join: SELECT * FROM customer c INNER JOIN orders o USE HASH (probe) ON (META(id) IN c.order_list) WHERE address.zip = “val”
KV=TRUE: Need to write a program KV=false: SELECT o.day, c.zip, SUM(o.salesamt) FROM customer c INNER JOIN orders o ON (META(id) IN c.order_list) WHERE c.zip = “value” AND o.day = “value” GROUP BY c.day, c.zip ORDER BY SUM(o.sales_amt) ----ANSI join SELECT o.day, c.zip, SUM(o.salesamt) FROM customer c INNER JOIN orders o ON (META(id) IN c.order_list) WHERE c.zip = “value” AND o.day = “value” GROUP BY c.day, c.zip ORDER BY SUM(o.sales_amt) ------ANSI join with HASH join SELECT o.day, c.zip, SUM(o.salesamt) FROM customer c INNER JOIN orders o USE HASH (probe) ON (META(id) IN c.order_list) WHERE c.zip = “value” AND o.day = “value” GROUP BY c.day, c.zip ORDER BY SUM(o.sales_amt)
---Group Query 1 SELECT c.zip, COUNT(1) FROM customer c WHERE c.zip between "value1" and "value2" GROUP BY c.zip
---GROUP BY query 2 SELECT o.day, SUM(o.salesamt) FROM orders o WHERE o.day between “value1” and “value2” GROUP BY o.day;
Workload | Operations | Record selection | Application Example |
SA — Update heavy | Read: 50%
Update 50% |
Zipfian | Session store recording recent actions in a user session |
SB — Read heavy | Read: 95%
Update: 5% |
Zipfian | Photo tagging; add a tag is an update, but most operations
Update: 5% are to read tags |
SC — Read only | Read: 100% | Zipfian | User profile cache, where profiles are constructed elsewhere (e.g., Hadoop) |
SD — Read latest | Read: 95%
Insert 5% |
Latest | User status updates; people want to read the latest statuses |
SE — Short ranges | Scan: 95%
Insert: 5% |
Zipfian/Uniform | Threaded conversations, where each scan is for the posts in a given thread (assumed to be clustered by thread id) |
SF — Read, modify, write | Read: 50%
Write: 50% |
Zipfian | user database, where user records are read and modified by the user or to record user activity. |
SG — Page heavy | Page: 90%
Insert: 5%
Update:5% |
Zipfian | User database, where new users are added, existing records are updated, pagination queries on the system. |
SH — Search heavy | Search: 90%
Insert: 5%
Update: 5% |
Zipfian | User database, where new users are added, existing records are updated, search queries on the system. |
SI — NestScan heavy | Nestscan: 90%
Insert: 5%
Update: 5% |
Zipfian | User database, where new users are added, existing records are updated, nestscan queries on the system. |
SJ — Arrayscan heavy | Arrayscan: 90%
Insert: 5%
Update: 5% |
Zipfian | |
SK — ArrayDeepscan heavy | ArrayDeepScan: 90%
Insert: 5%
Update: 5% |
Zipfian | |
SL — Report | Report: 100% | ||
SL — Report2 | Report2: 100% | ||
SLoad — Load | Load: 100% | Everything | Data load to setup SoE |
SN — Aggregate
(SN1, SN2) |
Aggregation: 90%
Insert: 5%
Update: 5% |
||
SMIX — Mixed workload | Page:20%
Search:20%
Nestscan:15% Arrayscan:15%
ArrayDeepscan:10%
Aggregate: 10%
Report: 10% |
See below. | |
SSync — Sync | Sync: 100%
Merge/Update: 70%
New/Insert: 30% |
Continuous sync of data from other systems to systems of engagement. See below. |
recordcount=1000 operationcount=1000 workload=com.yahoo.ycsb.workloads.CoreWorkload Filternumlow = 2 Filternumhigh = 14 Sortnumlow = 3 Sortnumhigh = 6 page1propotion=0.95 insertproportion=0.05 requestdistribution=zipfian maxscanlength=100 scanlengthdistribution=uniform
name: AdvJSON type: object key: _id data: fixed: 10000 properties: _id: type: string data: post_build: "return '' + this.doc_id + '_advjson';" doc_id: type: integer description: The document id data: build: "return document_index + 1" gid: type: description: "guid" data: build: "return chance.guid();" first_name: type: string description: "First name - string, linked to url as the personal page" data: fake: "{{name.firstName}}" middle_name: type: string description: "Middle name - string" data: build: "return chance.bool() ? chance.name({middle: true}).split(' ')[1] : null;" last_name: type: string description: "Last name - string" data: fake: "{{name.lastName}}" ballance_current: type: string description: "currency" data: build: "return chance.dollar();" dob: type: string description: "Date" data: build: "return chance.bool() ? new Date(faker.date.past()).toISOString().split('T')[0] : null;" email: type: string description: "email" data: fake: "{{internet.email}}" isActive: type: boolean description: "active boolean" data: build: "return chance.bool();" linear_score: type: integer description: "integer 0 - 100" data: build: "return chance.integer({min: 0, max: 100});" weighted_score: type: integer description: "integer 0 - 100 with zipf distribution" data: build: "return chance.weighted([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 0.4, 0.3, 0.25, 0.2, 0.17, 0.13, 0.11, 0.1, 0.09]) * 10 + chance.integer({min: 0, max: 10});" phone_country: type: string description: "field linked to phone, choices: us, uk, fr" data: build: "return chance.pickone(['us', 'uk', 'fr']);" phone_by_country: type: string description: "phone number by country code, linked to phone_country field" data: post_build: "return chance.phone({country: this.phone_country});" age_group: type: string description: "field linked to age, choices: child, teen, adult, senior" data: build: "return chance.pickone(['child', 'teen', 'adult', 'senior']);" age_by_group: type: integer description: "age by group, linked to age_group field" data: post_build: "return chance.age({type: this.age_group});" url_protocol: type: string description: "lined to url" data: build: "return chance.pickone(['http', 'https']);" url_site: type: string description: "lined to url" data: build: "return chance.pickone(['twitter', 'facebook', 'flixter', 'instagram', 'last', 'linkedin', 'xing', 'google', 'snapchat', 'tumblr', 'pinterest', 'youtube', 'vine', 'whatsapp']);" url_domain: type: string description: "lined to url" data: build: "return chance.pickone(['com', 'org', 'net', 'int', 'edu', 'gov', 'mil', 'us', 'uk', 'ft', 'it', 'de']);" url: type: string description: "user profile url, linked to other document fields" data: post_build: "return '' + this.url_protocol + '://www.' + this.url_site + '.' + this.url_domain + '/' + this.first_name;" devices: type: array description: "Array of strings - device" items: $ref: '#/definitions/Device' data: min: 2 max: 6 linked_devices: type: array description: "Array of array of string" items: $ref: '#/definitions/Device' data: min: 3 max: 6 submin: 1 submax: 4 address: type: object description: An object of the Address schema: $ref: '#/definitions/Address' children: type: array description: "An array of Children objects" items: $ref: '#/definitions/Children' data: min: 0 max: 5 visited_places: type: array description: "Array of objects with arrays" items: $ref: '#/definitions/Visited_places' data: min: 1 max: 4 definitions: Device: type: string description: "string AA-001 with zipf step distribution" data: build: "return chance.weighted(['AA', 'BB', 'CC', 'DD', 'EE', 'FF', 'GG', 'HH', 'II', 'JJ', 'KK', 'LL', 'MM', 'NN', 'OO', 'PP', 'QQ', 'RR', 'SS', 'TT', 'UU', 'VV', 'WW', 'XX', 'YY', 'ZZ'], [1, 0.5, 0.333, 0.25, 0.2, 0.167, 0.143, 0.125, 0.111, 0.1, 0.091, 0.083, 0.077, 0.071, 0.067, 0.063, 0.059, 0.056, 0.053, 0.050, 0.048, 0.045, 0.043, 0.042, 0.04, 0.038]).concat('-').concat(chance.string({length: 3, pool: '0123456789'}));" Address: type: object properties: street: type: string description: The address 1 data: build: "return faker.address.streetAddress() + ' ' + faker.address.streetSuffix();" city: type: string description: The locality data: build: "return faker.address.city();" zip: type: string description: The zip code / postal code data: build: "return faker.address.zipCode();" country: type: string description: The country data: build: "return faker.address.country();" prev_address: type: object description: An object of the Address schema: $ref: '#/definitions/Previous_address' Previous_address: type: object properties: street: type: string description: The address 1 data: build: "return faker.address.streetAddress() + ' ' + faker.address.streetSuffix();" city: type: string description: The locality data: build: "return faker.address.city();" zip: type: string description: The zip code / postal code data: build: "return faker.address.zipCode();" country: type: string description: The country data: build: "return faker.address.country();" property_current_owner: type: object description: "owner object" schema: $ref: '#/definitions/Property_owner' Children: type: object properties: first_name: type: string description: "first name - string" data: fake: "{{name.firstName}}" gender: type: string description: "gender M or F" data: build: "return chance.bool({likelihood: 50})? faker.random.arrayElement(['M', 'F']) : null;" age: type: integer description: "age - 1 to 17" data: build: "return chance.integer({min: 1, max: 17})" Visited_cities: type: string description: "city" data: build: "return faker.address.city();" Visited_places: type: object properties: country: type: string data: build: "return faker.address.country();" cities: type: array description: "Array of strings - device id" items: $ref: '#/definitions/Visited_cities' data: min: 1 max: 5 Property_owner: type: object properties: first_name: type: string description: "First name - string, linked to url as the personal page" data: fake: "{{name.firstName}}" middle_name: type: string description: "Middle name - string" data: build: "return chance.bool() ? chance.name({middle: true}).split(' ')[1] : null;" last_name: type: string description: "Last name - string" data: fake: "{{name.lastName}}" phone: type: string description: "phone" data: build: "return chance.phone();"