Your cart is currently empty!
Category: orientdb
-
Khóa học miễn phí OrientDB – Caching nhận dự án làm có lương
OrientDB – Caching
Caching is a concept that will create a copy of the database table structure providing a comfortable environment for the user applications. OrientDB has several caching mechanisms at different levels.
The following illustration gives an idea about what caching is.
In the above illustration DB1, DB2, DB3 are the three different database instances used in an application.
Level-1 cache is a Local cache which stores all the entities known by a specific session. If you have three transactions in this session, it will hold all entities used by all three transactions. This cache gets cleared when you close the session or when you perform the “clear” method. It reduces the burden of the I/O operations between the application and the database and in turn increases the performance.
Level-2 cache is a Real cache that works by using third party provider. You can have full control over the contents of the cache, i.e. you will be able to specify which entries should be removed, which ones should be stored longer and so on. It is a full shared cache among multiple threads.
Storage model is nothing but storage device that is disk, memory, or remote server.
How Cache Works in OrientDB?
OrientDB caching provides different methodologies in different environments. Caching is mainly used for faster database transactions, reducing the processing time of a transaction and increasing the performance. The following flow diagrams show how caching works in local mode and client-server mode.
Local Mode (Embedded Database)
The following flow diagram tells you how the record is in-between storage and used application in the local mode i.e., when your database server is in your localhost.
When the client application asks for a record OrientDB checks for the following −
-
If a transaction has begun, then it searches inside the transaction for changed records and returns it if found.
-
If the local cache is enabled and contains the requested record, then returns it.
-
If at this point the record is not in cache, then asks for it to the Storage (disk, memory).
Client Server Mode (Remote Database)
The following flow diagram tells you how the record is in-between storage and used application in the client-server mode i.e., when your database server is in remote location.
When the client application asks for a record, OrientDB checks for the following −
-
If a transaction has begun, then it searches inside the transaction for changed records and returns it if found.
-
If the local cache is enabled and contains the requested record, then returns it.
-
At this point, if the record is not in cache, then asks for it to the Server through a TCP/IP call.
-
In the server, if the local cache is enabled and contains the requested record, then returns it.
-
At this point, still the record is not cached in the server, then asks for it to the Storage (disk, memory).
Khóa học lập trình tại Toidayhoc vừa học vừa làm dự án vừa nhận lương: Khóa học lập trình nhận lương tại trung tâm Toidayhoc
-
Khóa học miễn phí OrientDB – Performance Tuning nhận dự án làm có lương
OrientDB – Performance Tuning
In this chapter, you can get some general tips on how to optimize your application that uses OrientDB. There are three ways to increase the performance for different types of database.
-
Document Database Performance Tuning − It uses a technique that helps avoid document creation for every new document.
-
Object Database Performance Tuning − It uses the generic techniques to improve performance.
-
Distributed Configuration Tuning − It uses different methodologies to improve performance in distributed configuration.
You can achieve generic performance tuning by changing the Memory, JVM, and Remote connection settings.
Memory Settings
There are different strategies in memory setting to improve performance.
Server and Embedded Settings
These settings are valid for both Server component and the JVM where the Java application is run using OrientDB in Embedded mode, by directly using plocal.
The most important thing on tuning is assuring the memory settings are correct. What can make a real difference is the right balancing between the heap and the virtual memory used by Memory Mapping, especially on large datasets (GBs, TBs and more) where the inmemory cache structures count less than raw IO.
For example, if you can assign maximum 8GB to the Java process, it”s usually better assigning small heap and large disk cache buffer (off-heap memory).
Try the following command to increase the heap memory.
java -Xmx800m -Dstorage.diskCache.bufferSize=7200 ...
The storage.diskCache.bufferSize setting (with old “local” storage it was file.mmap.maxMemory) is in MB and tells how much memory to use for Disk Cache component. By default it is 4GB.
NOTE − If the sum of maximum heap and disk cache buffer is too high, it could cause the OS to swap with huge slowdown.
JVM Settings
JVM settings are encoded in server.sh (and server.bat) batch files. You can change them to tune the JVM according to your usage and hw/sw settings. Add the following line in server.bat file.
-server -XX:+PerfDisableSharedMem
This setting will disable writing debug information about the JVM. In case you need to profile the JVM, just remove this setting.
Remote Connections
There are many ways to improve performance when you access the database using a remote connection.
Fetching Strategy
When you work with a remote database you have to pay attention to the fetching strategy used. By default, OrientDB client loads only the record contained in the resultset. For example, if a query returns 100 elements, but if you cross these elements from the client, then OrientDB client lazily loads the elements with one more network call to the server for each missed record.
Network Connection Pool
Each client, by default, uses only one network connection to talk with the server. Multiple threads on the same client share the same network connection pool.
When you have multiple threads, there could be a bottleneck since a lot of time is spent waiting for a free network connection. This is the reason why it is important to configure the network connection pool.
The configuration is very simple, just 2 parameters −
-
minPool − It is the initial size of the connection pool. The default value is configured as global parameters “client.channel.minPool”.
-
maxPool − It is the maximum size the connection pool can reach. The default value is configured as global parameters “client.channel.maxPool”.
If all the pool connections are busy, then the client thread will wait for the first free connection.
Example command of configuration by using database properties.
database = new ODatabaseDocumentTx("remote:localhost/demo"); database.setProperty("minPool", 2); database.setProperty("maxPool", 5); database.open("admin", "admin");
Distributed Configuration Tuning
There are many ways to improve performance on distributed configuration.
Use Transactions
Even when you update graphs, you should always work in transactions. OrientDB allows you to work outside of them. Common cases are read-only queries or massive and nonconcurrent operations can be restored in case of failure. When you run on distributed configuration, using transactions helps to reduce latency. This is because the distributed operation happens only at commit time. Distributing one big operation is much efficient than transferring small multiple operations, because of the latency.
Replication vs Sharding
OrientDB distributed configuration is set to full replication. Having multiple nodes with the same copy of database is important for scale reads. In fact, each server is independent on executing reads and queries. If you have 10 server nodes, the read throughput is 10x.
With writes, it”s the opposite: having multiple nodes with full replication slows down the operations, if the replication is synchronous. In this case, sharding the database across multiple nodes allows you to scale up writes, because only a subset of nodes are involved on write. Furthermore, you could have a database bigger than one server node HD.
Scale up on Writes
If you have a slow network and you have a synchronous (default) replication, you could pay the cost of latency. In fact when OrientDB runs synchronously, it waits at least for the writeQuorum. This means that if the writeQuorum is 3, and you have 5 nodes, the coordinator server node (where the distributed operation is started) has to wait for the answer from at least 3 nodes in order to provide the answer to the client.
In order to maintain the consistency, the writeQuorum should be set to the majority. If you have 5 nodes the majority is 3. With 4 nodes, it is still 3. Setting the writeQuorum to 3 instead of 4 or 5 allows to reduce the latency cost and still maintain the consistency.
Asynchronous Replication
To speed things up, you can set up Asynchronous Replication to remove the latency bottleneck. In this case, the coordinator server node executes the operation locally and gives the answer to the client. The entire replication will be in the background. In case the quorum is not reached, the changes will be rolled back transparently.
Scale up on Reads
If you already set the writeQuorum to the majority of nodes, you can leave the readQuorum to 1 (the default). This speeds up all the reads.
Khóa học lập trình tại Toidayhoc vừa học vừa làm dự án vừa nhận lương: Khóa học lập trình nhận lương tại trung tâm Toidayhoc