[MTOS-dev] Performance
Brad Choate
brad at sixapart.com
Mon Mar 10 11:42:23 PDT 2008
On Mar 9, 2008, at 3:57 PM, Alvar Freude wrote:
> The following tests are made with the latest code from SVN and the
> dev branch. As the last time, my desciption is from the SQL point of
> view.
>
>
> For all tables with auto_increment there are Sequences generated
> when using PostgreSQL; the next values from the sequences are
> queried manually and gthen inserted into the xxx_id field.
>
> It would be better (and easyer) to change the datatype from INTEGER
> to SERIAL in this cases. This can be used in the same way as an
> auto_increment column in MySQL (with the exception, that you can
> have as much as you want, not only one).
> This generates automagically a sequence and a default value of
> "nextval('sequence_name')" for this column.
>
> Alternatively the sequence may created manually, but the default
> value should be changed to the nextval statement. With this it isn't
> necessary to do it manually.
>
> ...
> id INTEGER DEFAULT nextval('sequence_name') NOT NULL,
> ...
What versions of PostgreSQL support that syntax? All 7.x and 8.x?
Also, I think 'unique' or 'primary key' should also be specified to
ensure the column is unique.
> There are a lot of queries of the following style:
>
> SELECT xxx_id FROM mt_xxx WHERE <where> ORDER BY <order>
> SELECT <everything> FROM mt_xxx WHERE xxx_id IN (<result from above>);
>
>
> The ORDER BY gets lost in the second query, the result is usually
> not in the same order then the IDs in the subselect!
>
>
> And this could be written as the following, which is faster and the
> result is in the right order:
>
> SELECT <needed columns>
> FROM mt_xxx
> WHERE xxx_id IN (SELECT xxx_id FROM mt_xxx WHERE <where>)
> ORDER BY <order>
>
> Which is the same as this simple statement:
>
> SELECT <needed columns>
> FROM mt_xxx
> WHERE <where>
> ORDER BY <order>
>
>
> This is about twice as fast as the two separate statements.
>
>
> Such statements seem to appear for nearly everything, e.g.
> mt_category, mt_author, mt_permission, mt_session, ...
The separate queries are a mechanism of Data::ObjectDriver. It gathers
IDs of objects first and then checks internally for any cached objects
for those IDs. Then, for objects that are not cached, it will fetch
those from the database with a subsequent SELECT. So if it gathers a
list of IDs like 4, 8, 15, 16, 23, 42 and finds that IDs 23 and 42 are
cached, it will only SELECT from the database rows with the IDs 4, 8,
15 and 16. The order of the returned objects from the second query is
irrelevant; the object driver uses the order from the first to return
the objects in the right order.
Changing all that into a query with a subselect would defeat the
caching mechanisms in D::OD. In other words, D::OD doesn't always
optimize for the database.
> When adding a category, the new category is inserted first, and
> after this there is an update for the category basename. (similar
> when inserting a new entry)
MT first saves the category (and entry, and soon, authors), then
attempts to calculate a unique basename. For locales where the
dirified category label returns an empty string (ie, a Japanese
label), the ID of the category is used to set the basename. So the
object has to be saved first to have the id, then it is updated with
the basename value.
> As the queries are all single transactions (AutoCommit), this is not
> only slow, but also dangerous: when the process crashes or gets
> killed after the insert and before the update, then the database
> consistency is broken.
>
> In total there are 24 queries for adding a category and 26 "SELECT
> 'DBD::Pg ping test'". I think the minimum would be two queries: one
> for the session data and one insert.
Agreed, but I believe this is a behavior in DBD::Pg and out of our
control.
> So, as a conclusion: I think the most important things for better
> performance are:
>
> * reducing the queries for in template processing
We're working on this.
> * use transactions and guarantee data integrity
Transaction support is interesting and we'll consider it. There are
very few cases where it is actually warranted, but cascading deletions
is one of them (ie, removing all related objects of an entry record,
then removing the entry itself). And Data::ObjectDriver already
supports transactions. The driver supplies 'begin_work', 'rollback'
and 'commit' calls already. They're used by TheSchwartz in particular.
> As far as I understand the mails here in the last days (ups, I did
> not read all), there is something on the way for this two things.
>
> The next important things are IMHO:
>
> * better PostgreSQL support, e.g. change the manual sequence
> processing to SERIAL data type (like auto_increment in MySQL)
> or DEFAULT nextval('...')
>
> * concentrate the SQL queries and reduce redundant queries
>
> * Memory Leaks and FastCGI/mod_perl support
And thanks for the feedback!
-Brad
--
Brad Choate
Engineering Manager, Movable Type, Six Apart, Ltd.
http://www.sixapart.com/movabletype/
Mobile: (918) 271-0105 - AIM: bschoate
More information about the MTOS-dev
mailing list