We had a website with >500,000 old revisions in their database. There is a nice module for knocking out small numbers of revisions called http://drupal.org/project/revision_deletion. But, to quickly remove a large set we wrote this script.
The general idea is to remove all revisions older than a certain date and then do the same for each CCK table. No matter what date limit you pick the current revision is not deleted.
WARNING: it makes assumptions about your table naming conventions -- use only on a backup
<?php
// Note - this is for Drupal 6.x
function remove_revisions() {
// Set limit for revision age
$days_ago = 30;
// delete all revisions from node_revisions
$limit = time() - (60*60*24*$days_ago);
$sql = "DELETE FROM node_revisions WHERE timestamp < $limit AND vid NOT IN (SELECT vid FROM node)";
db_query($sql);
$rows = db_affected_rows();
drupal_set_message("$rows revisions deleted -- node_revisions");
$table_types = array('content_field_%%', 'content_type_%%');
// remove orphaned data from CCK content tables
foreach($table_types as $table_type) {
$result= db_query("SHOW TABLES LIKE '$table_type'");
while ($row = db_fetch_array($result)) {
$table = current($row);
$sql = "DELETE FROM $table WHERE vid NOT IN (SELECT vid FROM node_revisions)";
db_query($sql);
$rows = db_affected_rows();
drupal_set_message("$rows revisions deleted -- $table");
}
}
}
?>