Generating sample data in PostgreSQL with EuroDB tools

Generating sample data in PostgreSQL with EuroDB tools

Random data generation is one of the most important features of databases used in companies. It is used on various technical as well as business levels. It provides an opportunity for secure testing of databases and helps to develop competence in database administration.

Random data generation is one of the most important features of databases used in companies. It is used on various technical as well as business levels. It provides an opportunity for secure testing of databases and helps to develop competence in database administration.

It is well known that a successful IT system consists of a database structure, carefully prepared tests and well-trained staff able to use the full potential of the database. Random data generation plays a huge role in all of this. Today we will show how the tools available within the EuroDB database platform make this process easier.

The application of data generator

The first and most important use of the data generator is to create test data on which to validate the database system, to test its performance and robustness to borderline cases, without the fear of losing important data. Another equally important aspect is the ability to introduce new team members into the database structure without having to expose production data. This is especially important in the era of GDPR, when an administrator can accidentally disclose personal data of customers and, in critical situations, even data of key partners of the company. This is an unacceptable situation with potential legal consequences. Of course, we cannot forget about inexperienced administrators who start their adventure with databases. Thanks to data generation, they can develop their skills without exposing the production infrastructure to danger or downtime.

PostgreSQL database data generator

The random data generator included in the EuroDB set of tools has been created with every customer in mind. Therefore, it is very intuitive and easy to use. Despite this simplicity, it is a powerful tool with many data generation options, the most interesting of which are:

  • generation of data with a predefined offset
  • per-table scaling (e.g. 100 products, 10,000 customers, one million session IDs)
  • setting percentage of NULL value (e.g. no customer phone number)
  • concatenation of values of a few generators
  • BLOB creation
  • a generator compatible with Luhn algorithm (generating valid credit card numbers).

What’s interesting, the generation of “interruptions in data access” simulating failures of dependent systems is also supported. This provides even more possibilities for database testing.

How it works?

With the help of a tool from the EuroDB package, an administrator can generate random data based on the database schema itself. The algorithm responsible for this searches, among others, the types of fields, their parameters and foreign keys contained in the mentioned schema, creating a virtual overview of the database. Based on the image it generates data in accordance with the schema requirements. The main data types are supported, including, importantly, IP addresses and MAC addresses. The administrator has the possibility to fine-tune the generated data by means of parameters or by using regular expressions.

Practical usage

Creating sample data using the generator available in the EuroDB package is very easy. Just run the following command after installing it:

> eurodb-datagenerator schema.sql > result.sql

In place of schema.sql, specify the path to the database schema for which you want to generate sample data. In place of result.sql give the name of the output file.

To transfer the generated data to the database, run the following command, first changing database_name to the name of the database to which we are going to add freshly generated random data. In addition, it is worth noting that you should change the user used for logging into the database console. The default is postgres. If when calling the previous command in place of the output file we specify our own name, other than result.sql, be sure to change it here as well.

> psql -U postgres -d database_name -a -f result.sql

Podsumowanie

The generation of sample data in a database is the foundation for the entire structure of the organization. Therefore, it is not surprising that the database platform EuroDB is equipped with a mechanism for generating random data, which makes testing and implementing personnel into the database structure simple and safe.

Authors

The blog articles are written by people from the EuroLinux team. We owe 80% of the content to our developers, the rest is prepared by the sales or marketing department. We make every effort to ensure that the content is the best in terms of content and language, but we are not infallible. If you see anything that needs to be corrected or clarified, we'd love to hear from you.