[MTOS-dev] Queries on Very Large Tables
David Scott
ds94103 at earthlink.net
Mon May 12 18:52:08 PDT 2008
...bearing in mind that indexes can be expensive if you have a lot of
inserts. A good rule of thumb is to use as many as you need...but no more.
d
Brad Choate wrote:
> My first guess is that you're missing an index of blog_id +
> created_on, if you're specifying blog_id as a term. (Any other terms
> in use here?) Find the query that is being generated (with the MT 4.15
> beta, you can use DebugMode 4 to see all queries that are run; or as
> suggested, take a look at the mysql slow queries log). Use the
> 'explain' MySQL command to analyze the query and see which indexes
> MySQL uses for it. If the right indexes aren't used, MySQL will scan
> the whole set of 3,000,000 rows to process the query.
>
> You can create a concatenated index with MT 4.1 and later, adding this
> to your object properties:
>
> MyClass->install_properties({
> .....
> indexes => {
> blog_created => {
> columns => [ 'blog_id', 'created_on' ],
> },
> blog_id => 1,
> created_on => 1,
> },
> });
>
> -Brad
>
>
> On May 12, 2008, at 11:19 AM, Mark Carey wrote:
>
>
>> MT version: 4.1
>>
>> While troubleshooting some mt.cgi timeouts, in my Visitor Stats Pro
>> plugin on a site, I have isolated the delay to a number of container
>> tags. The plugin adds two graphs to the dashboard for "Visitors" and
>> "Views" (the graphs are *not* the problem in this case). Below each
>> graph, I have a list for "recent visitors" and "recent views", each
>> list of 10 most-recent objects is powered by a container tag
>> (<mt:Visits>, <mt:Views>), which use fairly typical syntax. For
>> example a snippet from the "visits" routine:
>>
>> if ($blog_id) {
>> $vs_terms->{blog_id} = $blog_id;
>> }
>> my $lastn = $args->{lastn} || 10;
>> $vs_args->{limit} = $lastn;
>> $vs_args->{sort} = 'created_on';
>> $vs_args->{direction} = 'descend';
>> use VisitorStats::Visitor;
>> my $visitor_iter = VisitorStats::Visitor->load_iter($vs_terms,
>> $vs_args);
>>
>> I timed this tag, and it took 41 seconds to render on this site.
>>
>> I should mention that the site in question has more than 3,000,000
>> records in the Visitor table. (The schema for this object includes
>> indexes on blog_id and created_on). I have not noticed such a delay
>> on other sites with much fewer records, so my first thought it that
>> the cause is the size of the table -- but the query seems relatively
>> simple, asking for only 10 items sorted by date.
>>
>> The query that counts the data for graph, on the other hand, seems
>> like a more complicated one, grouping the number of visits by date,
>> for the most recent 120 days (there are over 2.5 million records in
>> the past 120 days). But this query takes about 5-6 seconds.
>>
>> So I am confused. A few things come to mind as being different in
>> each case:
>>
>> -one uses an iterator, the other a count with group_by
>> -one happens as part of the template engine (being a template
>> container tag), and the other happens independent of the template
>> rendering process (i actually calculate the graph data via a schedule
>> task, and thus, via cron)
>>
>> Thoughts? I would have thought that pulling a very small number of
>> recent records from a large table would be efficient, but I am not a
>> database expert...
>>
>> Thanks,
>>
>> Mark
>> _______________________________________________
>> MTOS-dev mailing list
>> MTOS-dev at sixapart.com
>> http://www.sixapart.com/mailman/listinfo/mtos-dev
>>
>
> --
> Brad Choate
> Engineering Manager, Movable Type, Six Apart, Ltd.
> http://www.sixapart.com/movabletype/
> Mobile: (925) 271-0105 - AIM: bschoate
>
>
>
>
> _______________________________________________
> MTOS-dev mailing list
> MTOS-dev at sixapart.com
> http://www.sixapart.com/mailman/listinfo/mtos-dev
>
>
More information about the MTOS-dev
mailing list