Skip to main content

YCSB

An extensible workload generator

Yahoo Cloud Serving Benchmark (YCSB) project is to develop a framework and common set of workloads for evaluating the performance of different "key-value" and "cloud" serving stores.


Loading YCSB:

$ ./bin/ycsb load (database name) -P workloads/workloada -p recordcount=10000000

.dat files can be created to specify recordcount property.

large.dat: recordcount = 100000000

$ ./bin/ycsb load (database name) -P workloads/workloada -P large.dat

When using this strategy to update recordcount during the load phase it must also be used during the run/transaction phase

To see output while loading data:

If dealing with a big data load it can be helpful to make sure everything is going well with status updates.

-s will require the Client to produce status report on stderr.

The character > will send load data to a file (aka. load.dat below)

$ ./bin/ycsb load (database name) -P workloads/workloada -P large.dat -s > load.dat

Command Line Options (Load):

-P Load Property files Running Multiple Clients in Parallel: -p insertstart The index of the record to start at -p insertcount The number of records to insert

Executing YCSB:

Executing a workload/running transaction phase:

$./bin/ycsb run (database name) -P workloads/workloada -P large.dat -s -threads 10 -target 100 -p operationcount=50000000 -p measurementtype=timeseries -p timeseries.granularity=2000 > transactions.dat

Command Line Options (Tx):

-threads number of client threads (default: 1) -target throttle ops (default: un-throttled)       (used to generate latency vs throughput curves) -p Set parameters -s ... > transacations.day output on stderr & transactions.dat -p operationcount=10000000 Amount of ops to run -p maxexecutiontime=300 Run length regardless of operation count in. (default=off) (Seconds) -p measurementtype=timeseries sets latency reporting to timeseries (default: histogram) -p timeseries.granularity=2000 sets latency report rate (default: 1000)

Running Multiple Clients in Parallel:

Loading database from multiple clients:

Loading the database from multiple clients is done by partitioning the workload records.

Normally YCSB just loads all the records, but Command Line Options (Load) allow the ability to cut up the records in a workload.

Example: Loading 10 million records/ 2 clients

First Client: -p insertstart=0 -p insertcount=5000000

Second Client: -p insertstart=5000000 -p insertcount=5000000

Executing from multiple clients:

Run transaction phase of the workload from multiple servers. Start up multiple client servers, each running the same workload targeted at the same database server.

DEPRICATED - Just connect using below format *Excecuting remotely*
1. Get connection string from host server
2. Boot client, install YCSB
3. Connect to server using connection string and Mongosh

Executing remotely: Add: -p mongodb.url="mongodb://{serverIP}:27017/{dbname}?w=0"

Ex. Synchronous

$./bin/ycsb run mongodb -P workloads/workloada -P large.dat -s -threads 10 -p mongodb.url="mongodb://10.7.0.18:27017/ycsb?w=0" > transactions.dat

Data Collection:

This tool spits out the following type of telemetry:

Under tags*:

OVERALL:

  • Total Execution time (ms)
  • Average throughput across all threads (ops/s)
  • Garbage Collection data
    UPDATE/CLEANUP/READ:
  • Total operations (num operations)
  • Average latency (ms)
  • Max latency (ms)
  • 95th percentile latency (ms)
  • 99th percentile latency (ms)
  • Return code counts (num codes)
  • Histogram/Time series (optional) of operation times

Mongo DB Specific Config Options:

mongodb.batchsize - Submits inserts in batches (improving throughput). Good for insert heavy workloads. Default is 1.


Resources:

YCSB Properties:

https://github.com/brianfrankcooper/YCSB/wiki/Core-Properties#core-workload-package-properties https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads#running-the-workloads

Sharding (For Implementation of Client/Server):

https://www.mongodb.com/resources/products/capabilities/database-sharding-explained https://www.youtube.com/watch?v=aBaD0qHK1as&list=PLIRAZAlr4cfY1gugVw2enf6uVXyJaWwwv https://github.com/neerajg5/mongodb-tutorial/blob/main/mongodb-sharding-ubuntu-git.txt https://www.mongodb.com/docs/manual/sharding/#shard-keys https://github.com/brianfrankcooper/YCSB/wiki/Running-a-Workload-in-Parallel https://github.com/brianfrankcooper/YCSB/blob/master/mongodb/README.md https://www.digitalocean.com/community/tutorials/how-to-configure-remote-access-for-mongodb-on-ubuntu-20-04

Helpful MongoDB commands:


# For stop/start/restart/status MongoDB Server, command depends on if your system uses service or systemctl
sudo service mongod {command}
sudo systemctl {command} mongod

# Often time when the server won't start after restarting the VM it is on this is a good fix
sudo chown -R mongodb:mongodb /var/lib/mongodb
sudo chown mongodb:mongodb /tmp/mongodb-27017.sock

# Great Command for checking attached clients (while in thge mongoshell):
db.currentOp(true).inprog.reduce((accumulator, connection) => { ipaddress = connection.client ? connection.client.split(":")[0] : "Internal"; accumulator[ipaddress] = (accumulator[ipaddress] || 0) + 1; accumulator["TOTAL_CONNECTION_COUNT"]++; return accumulator; }, { TOTAL_CONNECTION_COUNT: 0 })

# Drop YCSB database
mongosh --host localhost --eval "use ycsb" --eval "db.dropDatabase()"