I have a table with ~500,000 rows and I receive a new copy of the file every day with a small number of changes. I want to update the table with the changes and track the changes so that I can retrieve all the rows that have been updated since any date.
e.g. select * from testdata where last_updated > ’2011-02-01′
It needs to include new rows and updated rows. I’ll never delete rows, but it should be pretty obvious how to add this in if required.
CREATE TABLE IF NOT EXISTS `testdata` (
`testdata_id` int(11) NOT NULL,
`data1` varchar(100) NOT NULL,
`data2` varchar(100) NOT NULL,
`last_updated` datetime NOT NULL,
PRIMARY KEY (`testdata_id`));
CREATE TABLE IF NOT EXISTS `testdata_staging` (
`testdata_id` int(11) NOT NULL,
`data1` varchar(100) NOT NULL,
`data2` varchar(100) NOT NULL,
PRIMARY KEY (`testdata_id`))
Bulk load the data into testdata_staging using whatever method is required.
Then to update the table:
/* updates */
update testdata, (select ts.*,now()
from testdata_staging ts
inner join testdata t
on ts.testdata_id = t.testdata_id
where ((ts.data1 != t.data1)
or (ts.data2 != t.data2))
) chg
set testdata.data1 = chg.data1,
testdata.data2 = chg.data2,
testdata.last_updated = now()
where testdata.testdata_id = chg.testdata_id;
/* inserts */
insert into testdata
select ts.*,now()
from testdata_staging ts
left outer join testdata t
on ts.testdata_id = t.testdata_id
where t.testdata_id is null;
There you go.