My colleagues in Serpro are worried about the viability of taking to our production environment the infrastructure concepts of Tine. This concern is almost exclusively in relation to option to store in database headers of all messages.
I bring you the facts and considerations of our infrastructure team:
About the performance:
- With PostgreSQL, one user was able to make a processor to operate at 100%. This occurs specially when clicked mailboxes with more than 3,000 messages.
- They changed the database to Mysql and obtained a visible gain. However, we continue with the problem of processing resources exhaustion when accessing a mailbox with more than 3,000 emails. The time of exhaustion decreases, but still occurs.
We can improve the performance of the database. An index created on table "tine20_felamimail_cache_message_flag" on field "message_id", for example, was able to improve its query almost 50%. Our development team is already doing the beating between the application queries and indexes automatically created when we install the product.
- We made a comparison between the size of the database before and after making the cache of 10 thousend messages with 7kB each one. We obtained an increase of 7MB on the database. Assuming the email body is not under any circumstances in the database, even when queries and manipulations are performed on a message, this value is a good indication for us to make some extrapolations:
- Today we have 19.4 million files in our IMAP. For storing the current cache (ignoring the tables of contacts, preferences, schedules and others) would need nearly 15.5 GB in database.
Our email database now has 0.8 GB. As recently there was a increase in email quotas, we are using only 21.7% of our available quota. Assuming that the migration would have occupied 90% of the quota, the database cache would be 56GB while the database of current groupware (based on EGroupware 1) would continue with the same 0.8 GB (70 times smaller).
- If we migrate to Tine 2.0 now, some tables (tine20_felamimail_cache_message and tine20_felamimail_cache_message_to, for example) would have both 19.4 million entries while the largest table of current webmail (phpgw_cc_contact_conns) has 650,000 records (30 times smaller).
- Migration of all customers for the cloud with Tine 2.0:
Today would be 32 million files (19.4 ours and 12.6 from our customers) which would require a database of 22.4 GB.
- We have a future view of a cloud environment where we will have more than six hundreds of thousands of users, and we are selecting solutions that support this scenario. Today we need to migrate only our company, that has about 10 thousands of users.
- It is necessary we weave some considerations about the option of storing cache of messages in the database:
- The Cyrus IMAP system is extremely scalable. He has caches the headers of messages distributed under the subfolders for each user. In other words, its database is highly distributed, and the subsequent readings of each of these cache files are managed by the Linux kernel, where there is a huge probability that such files are already in memory.
- If we continue with the option to cache the headers in the database, we must, somehow, increasingly distribute the database, but will get even close to the performance of Cyrus. Our infrastructure team considered the cache in relational database as a limitation inherent in the current design of the infrastructure of Tine 2.0.
Below are some research on the performance of Tine 2.0 in the international community. It should be noted that Tine 2.0 had no a large case in June 2011 (at least the size of our company).(in question made by our colleague Victor Beust)
- The size of the database cache is questioned at the site of Tine 2.0 viewtopic.php?f=12&t=10322&start=20 # p40234. This post reports that there is a great loss of performance in the database because a single 30GB box message generates 2.4 GB of database cache.
- A bug report derived from the poor performance forge.tine20.org/mantisbt/view.php? Id = 5540 shows that the application uses "count (*)" (no where), a known killer performance for transactional ACID DBMS .
- URLs http://www.mysqlperformanceblog.com/200 ... db-tables/ and http://www.mysqlperformanceblog.com/200 ... -countcol/ tunings and also try to find workarounds to improve performance. At the end of bug report, sent patches to change some files and avoid count (*) in those situations. There may be others. These patches were applied on some files (Message.php, Abstract.php and MessageFilter.php).
- The discussion in viewtopic.php?f=12&t=10322 questions Tine 2.0 performance even on Mysql, with a user on i7 machines with 16 GB ram and 32 GBram. It also presents several optimization tips, how to use ramdisks to try to overcome, adapt.
Gentlement, these are the facts.
Felamimail is strongly coupled with the cache database. We understand that was a decision for improving the performance. However, in presented environment, this solution is limited.
My questions now are are about the better solution to make Tine 2.0 works in our environment and to be a success case for Tine 2.0 community. I would like to have your advice to choose an option that is aligned with the roadmap or that is architecturally interesting for Tine 2.0 community.
We have some ways that we can walk:
- Alternative 1: To maintain compatibility with the current implementation of Felamimail we can create a false database backend. In other words, we can create a Tinebase_Backend_Imap_Abstract, that would have the same interface of Tinebase_Backend_Sql_Abstract. That class works as talkin with the database, but would translate the SQL statements for IMAP expressions.
- Alternative 2: Decouple the cache in Felamimail, turning it into an option that can be enabled and disabled via configuration file. I think that is easier create a new application than to do it.
- Alternative 3: Create an adapter for a No(or New)SQL database, more appropriate for distributed and cloud environments e decouple the Felamimail from relational model. The SQL statements would be replaced by a abstraction (TQL - Tine Query Language) that translate expressions for SQL language or NoSQL languages. This could be the first step and experience to prepare the Tine 2.0 for working with new database models.
Well, what do you think about this madness? We would like to build a solution for community, no something that solves only our problem. Because the last alternative is creating another application that does not use the Felamimail. But we don't wish create a concurrent implementation.