[MTOS-dev] Queries on Very Large Tables
Brad Choate
brad at sixapart.com
Mon May 12 18:19:48 PDT 2008
My first guess is that you're missing an index of blog_id +
created_on, if you're specifying blog_id as a term. (Any other terms
in use here?) Find the query that is being generated (with the MT 4.15
beta, you can use DebugMode 4 to see all queries that are run; or as
suggested, take a look at the mysql slow queries log). Use the
'explain' MySQL command to analyze the query and see which indexes
MySQL uses for it. If the right indexes aren't used, MySQL will scan
the whole set of 3,000,000 rows to process the query.
You can create a concatenated index with MT 4.1 and later, adding this
to your object properties:
MyClass->install_properties({
.....
indexes => {
blog_created => {
columns => [ 'blog_id', 'created_on' ],
},
blog_id => 1,
created_on => 1,
},
});
-Brad
On May 12, 2008, at 11:19 AM, Mark Carey wrote:
> MT version: 4.1
>
> While troubleshooting some mt.cgi timeouts, in my Visitor Stats Pro
> plugin on a site, I have isolated the delay to a number of container
> tags. The plugin adds two graphs to the dashboard for "Visitors" and
> "Views" (the graphs are *not* the problem in this case). Below each
> graph, I have a list for "recent visitors" and "recent views", each
> list of 10 most-recent objects is powered by a container tag
> (<mt:Visits>, <mt:Views>), which use fairly typical syntax. For
> example a snippet from the "visits" routine:
>
> if ($blog_id) {
> $vs_terms->{blog_id} = $blog_id;
> }
> my $lastn = $args->{lastn} || 10;
> $vs_args->{limit} = $lastn;
> $vs_args->{sort} = 'created_on';
> $vs_args->{direction} = 'descend';
> use VisitorStats::Visitor;
> my $visitor_iter = VisitorStats::Visitor->load_iter($vs_terms,
> $vs_args);
>
> I timed this tag, and it took 41 seconds to render on this site.
>
> I should mention that the site in question has more than 3,000,000
> records in the Visitor table. (The schema for this object includes
> indexes on blog_id and created_on). I have not noticed such a delay
> on other sites with much fewer records, so my first thought it that
> the cause is the size of the table -- but the query seems relatively
> simple, asking for only 10 items sorted by date.
>
> The query that counts the data for graph, on the other hand, seems
> like a more complicated one, grouping the number of visits by date,
> for the most recent 120 days (there are over 2.5 million records in
> the past 120 days). But this query takes about 5-6 seconds.
>
> So I am confused. A few things come to mind as being different in
> each case:
>
> -one uses an iterator, the other a count with group_by
> -one happens as part of the template engine (being a template
> container tag), and the other happens independent of the template
> rendering process (i actually calculate the graph data via a schedule
> task, and thus, via cron)
>
> Thoughts? I would have thought that pulling a very small number of
> recent records from a large table would be efficient, but I am not a
> database expert...
>
> Thanks,
>
> Mark
> _______________________________________________
> MTOS-dev mailing list
> MTOS-dev at sixapart.com
> http://www.sixapart.com/mailman/listinfo/mtos-dev
--
Brad Choate
Engineering Manager, Movable Type, Six Apart, Ltd.
http://www.sixapart.com/movabletype/
Mobile: (925) 271-0105 - AIM: bschoate
More information about the MTOS-dev
mailing list