Felamimail: alternatives to the use of caching

Discussion about the PHP backend based on Zend-Framework

Felamimail: alternatives to the use of caching

Postby fgsl » Mon Apr 30, 2012 8:06 pm

Friends,

My colleagues in Serpro are worried about the viability of taking to our production environment the infrastructure concepts of Tine. This concern is almost exclusively in relation to option to store in database headers of all messages.

I bring you the facts and considerations of our infrastructure team:

About the performance:
  • With PostgreSQL, one user was able to make a processor to operate at 100%. This occurs specially when clicked mailboxes with more than 3,000 messages.
  • They changed the database to Mysql and obtained a visible gain. However, we continue with the problem of processing resources exhaustion when accessing a mailbox with more than 3,000 emails. The time of exhaustion decreases, but still occurs.
    We can improve the performance of the database. An index created on table "tine20_felamimail_cache_message_flag" on field "message_id", for example, was able to improve its query almost 50%. Our development team is already doing the beating between the application queries and indexes automatically created when we install the product.
Size of database:
    We made a comparison between the size of the database before and after making the cache of 10 thousend messages with 7kB each one. We obtained an increase of 7MB on the database. Assuming the email body is not under any circumstances in the database, even when queries and manipulations are performed on a message, this value is a good indication for us to make some extrapolations:
      We have a future view of a cloud environment where we will have more than six hundreds of thousands of users, and we are selecting solutions that support this scenario. Today we need to migrate only our company, that has about 10 thousands of users.
      • Today we have 19.4 million files in our IMAP. For storing the current cache (ignoring the tables of contacts, preferences, schedules and others) would need nearly 15.5 GB in database.
        Our email database now has 0.8 GB. As recently there was a increase in email quotas, we are using only 21.7% of our available quota. Assuming that the migration would have occupied 90% of the quota, the database cache would be 56GB while the database of current groupware (based on EGroupware 1) would continue with the same 0.8 GB (70 times smaller).
      • If we migrate to Tine 2.0 now, some tables (tine20_felamimail_cache_message and tine20_felamimail_cache_message_to, for example) would have both 19.4 million entries while the largest table of current webmail (phpgw_cc_contact_conns) has 650,000 records (30 times smaller).
    • Migration of all customers for the cloud with Tine 2.0:
      Today would be 32 million files (19.4 ours and 12.6 from our customers) which would require a database of 22.4 GB.
Cache database versus cache IMAP:
    It is necessary we weave some considerations about the option of storing cache of messages in the database:

    • The Cyrus IMAP system is extremely scalable. He has caches the headers of messages distributed under the subfolders for each user. In other words, its database is highly distributed, and the subsequent readings of each of these cache files are managed by the Linux kernel, where there is a huge probability that such files are already in memory.
    • If we continue with the option to cache the headers in the database, we must, somehow, increasingly distribute the database, but will get even close to the performance of Cyrus. Our infrastructure team considered the cache in relational database as a limitation inherent in the current design of the infrastructure of Tine 2.0.

Below are some research on the performance of Tine 2.0 in the international community. It should be noted that Tine 2.0 had no a large case in June 2011 (at least the size of our company).(in question made by our colleague Victor Beust)

  • The size of the database cache is questioned at the site of Tine 2.0 viewtopic.php?f=12&t=10322&start=20 # p40234. This post reports that there is a great loss of performance in the database because a single 30GB box message generates 2.4 GB of database cache.
  • A bug report derived from the poor performance forge.tine20.org/mantisbt/view.php? Id = 5540 shows that the application uses "count (*)" (no where), a known killer performance for transactional ACID DBMS .
  • URLs http://www.mysqlperformanceblog.com/200 ... db-tables/ and http://www.mysqlperformanceblog.com/200 ... -countcol/ tunings and also try to find workarounds to improve performance. At the end of bug report, sent patches to change some files and avoid count (*) in those situations. There may be others. These patches were applied on some files (Message.php, Abstract.php and MessageFilter.php).
  • The discussion in viewtopic.php?f=12&t=10322 questions Tine 2.0 performance even on Mysql, with a user on i7 machines with 16 GB ram and 32 GBram. It also presents several optimization tips, how to use ramdisks to try to overcome, adapt.

Gentlement, these are the facts.

Felamimail is strongly coupled with the cache database. We understand that was a decision for improving the performance. However, in presented environment, this solution is limited.

My questions now are are about the better solution to make Tine 2.0 works in our environment and to be a success case for Tine 2.0 community. I would like to have your advice to choose an option that is aligned with the roadmap or that is architecturally interesting for Tine 2.0 community.

We have some ways that we can walk:

  • Alternative 1: To maintain compatibility with the current implementation of Felamimail we can create a false database backend. In other words, we can create a Tinebase_Backend_Imap_Abstract, that would have the same interface of Tinebase_Backend_Sql_Abstract. That class works as talkin with the database, but would translate the SQL statements for IMAP expressions.
  • Alternative 2: Decouple the cache in Felamimail, turning it into an option that can be enabled and disabled via configuration file. I think that is easier create a new application than to do it.
  • Alternative 3: Create an adapter for a No(or New)SQL database, more appropriate for distributed and cloud environments e decouple the Felamimail from relational model. The SQL statements would be replaced by a abstraction (TQL - Tine Query Language) that translate expressions for SQL language or NoSQL languages. This could be the first step and experience to prepare the Tine 2.0 for working with new database models.

Well, what do you think about this madness? We would like to build a solution for community, no something that solves only our problem. Because the last alternative is creating another application that does not use the Felamimail. But we don't wish create a concurrent implementation.

Avengers, assemble!
Flávio Gomes da Silva Lisboa
BS in computer science
postgraduate degree in enterprise applications using object oriented programming and Java technology

Zend PHP Certified Engineer
Zend Framework Certified Engineer
User avatar
fgsl
Tine 2.0 Community Contributor
 
Posts: 44
Joined: Thu Jul 07, 2011 2:24 pm
Location: Brazil

Re: Felamimail: alternatives to the use of caching

Postby lkneschke » Wed May 09, 2012 10:54 am

Hello Flavio!

You are right. The current situation is not perfect and needs improvement.

I would suggest starting with alternative 1. This way we can focus on improving the perfomance of the imap client, by eliminating the cache. If we make good progress we can think about how to redesign Felamimail to work without a cache in general.
Do you agree with that?

Some background information.

During the life cycle of Felamimail, we switched multiple times back and forth between an implementation with SQL cache and without. There are situations where a cache is useful and situations where not.
Removing the cache makes TIne 2.0 only as fast as the imap server. If you have a slow IMAP server, also Felamimail will be slow.

During our initial tests the database outperfomed the imap server. But now that have to much data in the database it gets slower than the imap server and also consumes to much memory.
Maybe it's better to store only a limited set of data(has the email attachments?) in the database and read all other informations(for example the header) on demand. The later informations can be put in the Tine 2.0 cache, which is stored on the filesystem. This way we can keep expensive calculations in the database and the cheap data can be read from the imap server on demand and will be cached for faster access in the Tine 2.0 cache.

These are just thoughts, from someone who implemented Felamimail already multiple times from scratch and still did not found the best implementation.

We are very curious about your implementation.

I would propose following. We create a new branch in our Gerrit where you can push your code to. This way we can also have a look at it and find the best implementation together.
Lars Kneschke
Head of Tine 2.0

Visit tine20.com for commercial support / consulting / development.
Visit tine20.net for Tine 2.0 hosting.
User avatar
lkneschke
Tine 2.0 Core Developer
 
Posts: 974
Joined: Tue Nov 06, 2007 7:31 pm
Location: Hamburg, Germany

Re: Felamimail: alternatives to the use of caching

Postby zapa » Fri May 11, 2012 2:39 am

Tchë! Lars and Flavio...

The activesync uses the cache structure and tinebase_backend_sql_abstract? If so it should be considered an alternative, provided that includes activesync to sync with mobile devices without cache.

Tanks.
==================================================================
http://softwarelivre.org/fisl13, the largest forum of free software latin america.
http://trac.expressolivre.org/wiki/NovoExpresso, the new Express, tine20 based groupware
==================================================================
User avatar
zapa
 
Posts: 17
Joined: Mon Jul 26, 2010 10:41 pm
Location: BRASIL

Re: Felamimail: alternatives to the use of caching

Postby ph_il » Mon May 14, 2012 10:44 am

zapa wrote:The activesync uses the cache structure and tinebase_backend_sql_abstract?


yes, activesync uses the same backends/controllers (folder/message/account) as felamimail.
Philipp Schüle
Tine 2.0 Core Developer

Visit http://www.tine20.com (commercial support, consulting and development)
Visit http://www.officespot20.com (Tine 2.0 hosting)
User avatar
ph_il
Tine 2.0 Core Developer
 
Posts: 3450
Joined: Fri Mar 07, 2008 11:41 am

Re: Felamimail: alternatives to the use of caching

Postby ph_il » Mon Aug 06, 2012 3:03 pm

hi flavio,

did you already make some progress with this?

we talked a little bit about this in our last meetings and would like to do some profiling first. we still think that the email cache is generally a good idea (almost all email clients are doing it). but we need to improve the performance of the queries and decrease the used database space. i already detected a column that might be unessassary for the cache: 'structure' (it uses about 1/3 of the total table space). perhaps it would be a good first step to move this data to the filesystem/memory cache and only keep this for 1 day.

what do you think?
Philipp Schüle
Tine 2.0 Core Developer

Visit http://www.tine20.com (commercial support, consulting and development)
Visit http://www.officespot20.com (Tine 2.0 hosting)
User avatar
ph_il
Tine 2.0 Core Developer
 
Posts: 3450
Joined: Fri Mar 07, 2008 11:41 am

Re: Felamimail: alternatives to the use of caching

Postby zapa » Sat Sep 08, 2012 2:55 am

Buenas Tchê!

We're with an alpha version of millan uncached databese email messages, but we will have to integrate with future version of tine20. This perfectly functional but we are still in the testing phase. Still need to add configuration to use the cache or not.

As already discussed in this forum, the use of cache or not is a matter of political service, ie with external IMAP server versus cache headers place to manage this service.

Our feature service, predicts that we have total control of the emails from customers, including providing backup and recovery, so the domain imap.

We are testing solutions (alpha version) to use the postgresql database backend solution and direct use of imap (cyrus murder) without cache. We believe that these two changes are essential to our business in final implementation.
User avatar
zapa
 
Posts: 17
Joined: Mon Jul 26, 2010 10:41 pm
Location: BRASIL

Re: Felamimail: alternatives to the use of caching

Postby zapa » Tue Oct 09, 2012 3:05 pm

Tchê!

Today I learned that setting to use imap or db cache is configured in config.inc.

Tanks Cassiano!
User avatar
zapa
 
Posts: 17
Joined: Mon Jul 26, 2010 10:41 pm
Location: BRASIL

Re: Felamimail: alternatives to the use of caching

Postby fgsl » Thu Dec 13, 2012 2:34 pm

Hello, friends.

After a time like Sheldon and Rajesh [1], I am still indebted with some questions.

The implementation of direct IMAP access backend is available in ExpressoLivre3 GIT repository:

https://github.com/explivre/ExpressoLivre3/tree/master/tine20/Felamimail/Backend

The selection of backend is based on configuration item returned by Tinebase_Core::getConfig()->messagecache.

The idea is to maintain the same interface and to determine the backend through configuration file (key 'messagecache'):

https://github.com/explivre/ExpressoLivre3/blob/master/tine20/config.inc.php.dist

This makes the SQL database cache optional and don't requires structure change of the current implementation.

[1] http://www.youtube.com/watch?v=ggophThnBFE
Flávio Gomes da Silva Lisboa
BS in computer science
postgraduate degree in enterprise applications using object oriented programming and Java technology

Zend PHP Certified Engineer
Zend Framework Certified Engineer
User avatar
fgsl
Tine 2.0 Community Contributor
 
Posts: 44
Joined: Thu Jul 07, 2011 2:24 pm
Location: Brazil

Re: Felamimail: alternatives to the use of caching

Postby ph_il » Wed Dec 19, 2012 5:39 pm

hi flavio,

that's great news!

we would like to include that in the upcoming Krisitina version. can you help us bringing this in our current git master (using gerrit for example)?
Philipp Schüle
Tine 2.0 Core Developer

Visit http://www.tine20.com (commercial support, consulting and development)
Visit http://www.officespot20.com (Tine 2.0 hosting)
User avatar
ph_il
Tine 2.0 Core Developer
 
Posts: 3450
Joined: Fri Mar 07, 2008 11:41 am

Re: Felamimail: alternatives to the use of caching

Postby fgsl » Wed May 22, 2013 7:20 pm

Hi, friends.

We have at first some difficult to work with current review system, but now it is getting better. Almost all Expresso guys are submitting changes directly.

However, the changes that weren't submitted generates a new mail appllication, called Expressomail.

I tried to remove the dependency of Felamimail in other applications, so it was possible to choose other mail application, but the current architecture of them turns it limited, I want to say that I only confirmed that the coupling among Felamimail and other application is very strong for turning easy to provide mail service to them. [1]

I am writing this to my colleagues now, and will suggest for them that the initial guidance of Phillip will be applied. I think that now the changes must be submitted to master aiming Collin version.

[1] https://forge.tine20.org/mantisbt/view.php?id=8370
Flávio Gomes da Silva Lisboa
BS in computer science
postgraduate degree in enterprise applications using object oriented programming and Java technology

Zend PHP Certified Engineer
Zend Framework Certified Engineer
User avatar
fgsl
Tine 2.0 Community Contributor
 
Posts: 44
Joined: Thu Jul 07, 2011 2:24 pm
Location: Brazil


Return to PHP backend discussion

Who is online

Users browsing this forum: No registered users and 1 guest

Startseite
NewsDemoDownloadForumWikiBlog
Support
Support at first hand!
If the forum does not help anymore ... Professional support is available directly from our Tine2.0 core Developers.

more »