This article is contributed. See the original author and article here.

Due to performance and locking reasons, change datatype column can be a long-running operation.

 

Supose table PRU with bigserial id column and column A, with integer data saved as Text that you want to change to Integer type:

 

 

 

 

SELECT * FROM pg_stat_statements_reset();

CREATE TABLE PRU (id bigserial, A TEXT);
INSERT INTO PRU (A) VALUES ('111');
INSERT INTO PRU (A) VALUES ('111');



 

 

 

Generate rows until 2M, repeating this sentence:

 

 

 

INSERT INTO PRU SELECT * FROM PRU;

 

 

 

 

We want to change the column datatype:

 

 

 

 

ALTER /*optionA*/ TABLE PRU ALTER COLUMN A TYPE INTEGER USING A::INTEGER;

 

 

 

 

We could review stats from this command with following command:

 

 

 

 

select * from pg_stat_statements where query like '%optionA%';

 

 

 

 

This method is the easiest one, but could generate high contention due to required exclusive lock for the table that could generate errors in applications, most of them will require stop applications to perform this type of long running operations.

 

Another approach to change the datatype of the column could be to add some extra columns, migrate the data to those columns and finally drop old column and rename new one.

 

The advantages of doing in this way is that the customer has more control about the process, it can be executed during multiples hours/days.

 

 

 

 

//
// Add 2 auxiliary columns
//
ALTER TABLE PRU ADD COLUMN A1 INTEGER, ADD COLUMN A1_CHANGED BOOLEAN;

//
// Trigger to take care of ongoing changes from the applications
//
CREATE OR REPLACE FUNCTION set_a1()
  RETURNS TRIGGER AS
$func$
BEGIN
    IF (TG_OP='INSERT') THEN
        NEW.a1:=NEW.a::integer;
    ELSEIF (TG_OP='UPDATE') THEN
        IF (NEW.a <> OLD.a) THEN
            NEW.a1:=NEW.a::integer;
        ELSEIF (NEW.a is null and OLD.a is not null) THEN
            NEW.a1:=null;
        ELSEIF (NEW.a is not null and OLD.a is null) THEN        
            NEW.a1:=NEW.a::integer;
        END IF;
    END IF;
    NEW.a1_changed:=true;
    RETURN NEW;
END
$func$ LANGUAGE plpgsql;

DROP TRIGGER IF EXISTS set_a1 ON PRU;

CREATE TRIGGER set_a1
BEFORE INSERT OR UPDATE ON pru
FOR EACH ROW EXECUTE PROCEDURE set_a1();

//
// Update sentence with limit number or fows to update in single transaction
//
// This sentence must be repeated multiple times until all rows were updated
//

UPDATE /*optionB*/ PRU SET A1=A::INTEGER, A1_CHANGED=true WHERE id IN (SELECT id FROM PRU WHERE A1_CHANGED is null limit 100000);

//
// Check the process
//
// Current changed rows:

SELECT COUNT(1) FROM PRU WHERE A1_CHANGED=true;

// Current pending rows:
SELECT COUNT(1) FROM PRU WHERE A1_CHANGED is null;

//
// Final work
//
// After no rows need changes, we can switch the columns
BEGIN WORK;
LOCK TABLE PRU IN SHARE MODE;
ALTER /*optionB*/ TABLE PRU DROP COLUMN A;
ALTER /*optionB*/ TABLE PRU DROP COLUMN A1_CHANGED;
ALTER /*optionB*/ TABLE PRU RENAME A1 TO A;
DROP TRIGGER set_a1 ON PRU;
DROP FUNCTION set_a1();
COMMIT WORK;

 

 

 

We could review stats from this command with following command:

 

 

 

 

select * from pg_stat_statements where query like '%optionB%';

 

 

 

 

Here we could observe more resources used, more space allocated in table, but application has been working most of the time without high contention locking.

Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.