Using TPC-DS to generate test data procedures

Source: Internet
Author: User
Tags benchmark

# # #原文地址: http://www.innovation-brigade.com/index.php?module=Content&type=user&func=display&tid=1 &pid=3&lang=en


If you ever find yourself in need to generate massive quantities of benchmark data to test your database ' s data-import or Query performance, the TPC (Transaction Processing Performance Council) provides a handy tool which can easily generate GI Gabytes of data. Yes, the data it generates and the queries it provides is geared towards decision support applications, but that doesn ' t Prevent these scripts from being a good testing ground for your database; Especially if you wish to compare performance on several database platforms.

While the TPC provides a whole range of benchmark suites for various purposes, the TPC-DS benchmark is probably the easies T to implement and use. Best of all, it's free (but does require-submit your data for a download request) and on a modern Linux box, Compil Es out of the box without have to resort to any hacking. If you ' re using Mac OS X, it's not quite as easy as it'll generate compile errors which you've had to fix manually. So for the purposes of using the Tpc-ds benchmark, does yourself a favour and use a Linux box to generate the data.

Here's what does need to does in order to set the Tpc-ds benchmark up (on a Linux box):

-Download The Dsgen utility (duh!)
-Extract The downloaded archive
-CD tpc-ds
-CD Tools
-Make (ignore the compile warnings, just ensure the B Uild Process completes successfully)

OK, you've now built the required utlities the benchmark. At this point it's probably useful to download and read the provided documentation in order to better understand the scope Of the benchmark provides, but here's a quick rundown of how to generate the testdata.

./dsdgen

Would simply generate the test data (which is generated to | delimited text files with the extension *.dat) at the Defaul T scale factor (which is 1). Each scale factor corresponds to roughly 1GB of data, so, for example, the command

./dsdgen-scale 5-force

Would generate 5 GB of data and the-force option would overwrite previously generated data. Without The-force option, Dsdgen'll refuse to overwrite existing test data and simply does nothing.

Now it has your test data ready and you can load it into your database. For MYSQL This rougly involves the following:

Mysql-u <your_mysql_user>-P < tpcds.sql
</your_mysql_user>

Then for each *.dat file which is generated, do the following (see Thispage for details:

LOAD DATA INFILE ' your_dat_filename ' into
TABLE table_the_dat_file_is_for fields
TERMINATED by ' | '
LINES TERMINATED by ' \ n '

To load the data. See this previous blog post in how to setup MYSQL with InnoDB tables for good import performance so you don ' t spend lots o F time waiting just because your database writes to disk after every insert. So the is your the quick ' n ' dirty Guide in How to use Tpc-ds to generate and load lots of test data. Hopefully someone out there finds this useful. Enjoy.


Related parameters:


[Root@ht-hadoop3 tools]#./dsdgen--help
Error:option '-help ' or its argument unknown.
DBGEN2 Population Generator (Version 2.0.0)
Copyright Transaction Processing Performance Council (TPC) 2001-2015




usage:dbgen2 [Options]


Note:when defined in a parameter file (using-p), Parmeters should
Use the form below. Each option can also is set from the command
Line, using a form of '-param [optional argument] '
Unique anchored substrings of options is also recognized, and
Case was ignored, so '-SC ' was equivalent to '-scale '


General Options
===============
Abreviation = <s>--Build table with Abreviation <s>
DIR = <s>--Generate tables in directory <s>
Help = <n>--Display this message
PARAMS = <s>--read parameters from file <s>
QUIET = [y| N]--Disable all output to Stdout/stderr
Scale = <n> – Volume of data to generate in GB
Table = <s>--Build only table <s>
Update = <n>--Generate update data set <n>
VERBOSE = [y| N]--Enable verbose output
PARALLEL = <n>--build data in <n> separate chunks
Child = <n>--Generate <n>th chunk of the parallelized data
RELEASE = [y| N]--Display the release information
_filter = [y| N]--output data to stdout
VALIDATE = [y| N]--produce rows for data validation


Advanced Options
===============
DELIMITER = <s>--use <s> as output field separator
Distributions = <s>--read distributions from File <s>
force = [y| N]--over-write data files without prompting
SUFFIX = <s>--use <s> as output file SUFFIX
TERMINATE = [y| N]--end each record with a field delimiter
Vcount = <n>--Set number of validation rows to be produced
Vsuffix = <s>--set file suffix for data validation
Rngseed = <n>--Set RNG seed

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.