07 — BigQuery Migration

BigQuery Migration Guide

Migration patterns and technical mappings for moving the legacy SQL Server LR database to BigQuery. Documents what needs to be done and why, based on actual legacy system analysis. Does NOT prescribe a specific target schema — that should be determined during design.

11
Migration Topics
18
Triggers Mapped
4
BQ Patterns
13
HI* Tables
1 Critical Pre-Migration Prerequisite: LRNO → IMO Mapping
Most important migration task. The legacy system uses LRNO (Lloyd's Register 7-char identifier) as the universal primary key, while international standards use IMO Number. In VSL_ tables, IMO_NUM and LRNO contain the same value — but this is not documented anywhere in the schema itself, only in trigger code and views.

Required action before any BigQuery load — build the complete cross-reference:

-- Build the complete cross-reference SELECT ABSD_OVGE.LRNO AS lrno, VSL_DESIGNATION.IMO_NUM AS imo_number, VSL_DESIGNATION.VESSEL_NAME AS vessel_name FROM ABSD_OVGE JOIN VSL_DESIGNATION ON ABSD_OVGE.LRNO = VSL_DESIGNATION.IMO_NUM
All BigQuery tables should use IMO number as the primary vessel identifier, with LRNO retained as a legacy cross-reference column for traceability.
2 Date Format Conversion
All ABSD_ dates are stored as NVARCHAR, not SQL date types. The conversion must be applied during ETL. See 01 Overview §Date Storage for the full special value table.
Legacy FormatLengthExampleBigQuery TypeConversion Rule
YYYYMM6 chars202603DATEPARSE_DATE('%Y%m', value) — set day=01
YYYYMMDD8 chars20260315DATEPARSE_DATE('%Y%m%d', value)
YYYYMMDD (special)8 chars20260300DATEDay=00 → set to last day of month
YYYYMMDD (special)8 chars20269900DATEMonth=99 → treat as December
YYMMDD6 chars260315DATETIMEJ06_LNCHDATE — prepend '20' for year
000000 / 00000000NULLMap to NULL
999999 / 99999999NULLMap to NULL (open-ended)
198099NULLSpecial sentinel — means "not set"

BigQuery UDF equivalent of fnLRDateToRealDate:

CREATE OR REPLACE FUNCTION maritime.lr_date_to_date(lr_date STRING) AS ( CASE WHEN lr_date IS NULL OR lr_date IN ('000000','00000000','999999','99999999') THEN NULL WHEN LENGTH(lr_date) = 6 THEN PARSE_DATE('%Y%m%d', SUBSTR(lr_date,1,4) || CASE WHEN SUBSTR(lr_date,5,2) IN ('00','99') THEN '01' ELSE SUBSTR(lr_date,5,2) END || '01') WHEN LENGTH(lr_date) = 8 THEN DATE( CAST(SUBSTR(lr_date,1,4) AS INT64), CASE WHEN SUBSTR(lr_date,5,2) IN ('00','99') THEN 1 ELSE CAST(SUBSTR(lr_date,5,2) AS INT64) END, CASE WHEN SUBSTR(lr_date,7,2) IN ('00','99') THEN 1 ELSE CAST(SUBSTR(lr_date,7,2) AS INT64) END ) ELSE NULL END );
3 SEQNO History Model → Bi-Temporal Pattern

The legacy system is mono-temporal: SEQNO '00' = current, '01'+ = historical descending. It tracks valid time but does NOT systematically track transaction time (though J06_LNCHDATE provides a rudimentary marker).

Target BigQuery bi-temporal structure:

-- Example: ownership history bi-temporal table CREATE TABLE maritime.vessel_ownership_history ( lrno STRING, imo_number STRING, owner_code STRING, -- Valid time (real-world effective period) valid_from DATE, -- from H01_EFD (converted) valid_to DATE, -- from next record's H01_EFD, or NULL for current -- Transaction time (system ingestion) record_loaded_at TIMESTAMP, -- when this row was written to BQ record_superseded_at TIMESTAMP, -- when replaced, or NULL -- Source tracking source_system STRING, confidence_code STRING, source_code STRING );

Conversion rule for SEQNO history (using LEAD window function):

SELECT LRNO, H01_OWNER_CODE AS owner_code, maritime.lr_date_to_date(H01_EFD) AS valid_from, -- valid_to = next record's EFD - 1 day DATE_SUB( LEAD(maritime.lr_date_to_date(H01_EFD)) OVER (PARTITION BY LRNO ORDER BY H01_EFD), INTERVAL 1 DAY ) AS valid_to, H01_CC AS confidence_code, H01_SRCE AS source_code FROM ABSD_HIOW ORDER BY LRNO, H01_EFD

The SEQNO='00' record will have valid_to = NULL (still current). This pattern applies to all 13 HI* history tables:

ABSD_HIOW, ABSD_HIMA, ABSD_HISM, ABSD_HIOP, ABSD_HIPP, ABSD_HIFL, ABSD_HITL, ABSD_HIST, ABSD_HIFC1, ABSD_HIBBC, ABSD_HIGBO, ABSD_HIBR, ABSD_STSE

4 Trigger Logic → BigQuery Equivalents

Triggers cannot be replicated directly in BigQuery (no DML triggers). Each trigger's business logic must be reimplemented in one of four patterns:

Pattern A — ETL Pipeline Validation
Trigger validations become pipeline-level data quality checks.

Example: OVGE callsign must match flag country range → dbt test or Dataform assertion checking callsign against country range table.
Pattern B — Scheduled BigQuery Job
Cascade updates become scheduled BigQuery jobs/stored procedures.

Example: HIOW_Update recalculates OWST statistics → Scheduled job that recalculates fleet_statistics table nightly.
Pattern C — BigQuery Stored Procedure
Complex multi-step logic becomes BigQuery stored procedures.

Example: CBPPROOT trigger syncs port details to CBPD and CBTO1 → BQ SP: merge_port_details(port_key).
Pattern D — Application Layer
Rules that protect UI data entry become API/application validations.

Example: Current name must be UPPERCASE → API layer: validate name.equals(name.toUpperCase()).
Legacy TriggerEffectBigQuery Approach
ABSD_OVGE_UpdateUpdates SHIP_SEARCH callsign, officialNoPattern B ETL pipeline sync job
ABSD_OVGE_UpdateValidates callsign vs flagPattern A dbt test + Dataform assertion
ABSD_OVNA_UpdateEnforces name uniqueness per vesselPattern D Pipeline validation + application layer
ABSD_OVNA_UpdateUpdates SHIP_SEARCH vessel namePattern B ETL sync job
ABSD_HIOW_UpdateOwner-manager conflict preventionPattern D Application layer validation
ABSD_HIOW_UpdateRecalculates OWST fleet countsPattern B Scheduled fleet_stats job
ABSD_HIOW_UpdateCascades to HIPP/HITPPattern C Ownership resolution procedure
ABSD_CAGE1_UpdateCross-vessel incident creationPattern D Application layer logic
ABSD_CAGE1_UpdateManages disposal/casualty mutual exclusionPattern D Application layer validation
ABSD_CBCOROOT_UpdateSyncs company data to CBSB, BLDR_TEXTPattern B Company master sync job
ABSD_CBPPROOT_UpdateSyncs port to CBPD, CBTO1 with coordinatesPattern C Port master sync job
ABSD_FUGE_UpdateUpdates SHIP_SEARCH capacity fieldsPattern B ETL capacity sync job
ABSD_TANKER TANKER_UPDATEHull type manages FUSF2 feature codesPattern B Ship type feature management job
All triggersJ06_LNCHDATE updateBigQuery _record_updated_at TIMESTAMP column
All triggers*_UPDATES staging tablesBigQuery streaming insert change feed or Pub/Sub
All triggerstblChanges audit logBigQuery audit log table (append-only, partitioned by date)
5 Function Catalog → BigQuery UDFs

All SQL Server scalar functions can be migrated to BigQuery SQL UDFs or JavaScript UDFs. See 05 Functions for the full function catalog.

CategoryMigration ApproachPriorityNotes
Date (fnLRDateToRealDate etc.) BigQuery SQL UDFs High Full UDF code in Section 2 above. Used 200+ times across the codebase.
Point-in-time ownership (fnGetRegOwnerAtDate) Inline BigQuery window queries High ROW_NUMBER() OVER (PARTITION BY LRNO ORDER BY H01_EFD DESC) pattern replaces scalar UDF.
String normalization (fnTidyShipname etc.) BigQuery JavaScript UDFs Medium JS regex-based replacement: name.replace(/[.,?']/g, ' ').replace(/\s+/g, ' ').trim()
Fuzzy matching (LevenshteinEditDistance, HowSimilar) Built-in ML.EDIT_DISTANCE Medium BigQuery has native ML.EDIT_DISTANCE(str1, str2) as of 2024. For NYSIIS phonetic matching, use a Cloud Function or Dataproc UDF.
Geographic (fnConvertDegreesMinsToDecimal, fnDistanceCalc) BigQuery native geography types Medium ST_GEOGPOINT(lon, lat) and ST_DISTANCE(pt1, pt2) are native. fnMarsdenGrid has no native equivalent — needs custom UDF.
Classification (fnCLASSLIST etc.) BigQuery table lookups Medium STRING_AGG(class_notation, ', ') FROM vessel_classification_history WHERE is_current = TRUE
Narrative generation (fnAPSNarrativePrimeMoverGrouped etc.) Application layer or Vertex AI Low Options: (1) BQ stored procedure building text strings, (2) Go/Python application layer, (3) Vertex AI Gemini for natural language generation (recommended for modernization).

Point-in-time ownership inline pattern (replaces fnGetRegOwnerAtDate):

-- fnGetRegOwnerAtDate → inline query pattern SELECT LRNO, H01_OWNER_CODE FROM ( SELECT LRNO, H01_OWNER_CODE, ROW_NUMBER() OVER (PARTITION BY LRNO ORDER BY H01_EFD DESC) AS rn FROM maritime.vessel_ownership_history WHERE maritime.lr_date_to_date(H01_EFD) < @query_date ) WHERE rn = 1
6 Stored Procedure → BigQuery Equivalents

GROUP_FLEET Suite → BigQuery Scheduled Procedure:

-- BigQuery equivalent as a stored procedure CREATE OR REPLACE PROCEDURE maritime.rebuild_fleet_hierarchy() BEGIN -- Step 1: Build working ship groups (sp_GROUP_FLEET_p_wsg equivalent) CREATE OR REPLACE TEMP TABLE work_wsg AS SELECT vh.lrno, ow.owcode, ... FROM maritime.vessel_ownership_history vh JOIN maritime.vessel_status_current vs ON vh.lrno = vs.lrno AND vs.status IN ('S','L','R','T','C','V','P','O','U','F','E') ...; -- Step 2: EFD enrichment (sp_GROUP_FLEET_p_wsg_update equivalent) -- Step 3: Deduplication (sp_GROUP_FLEET_remove_groups_dups equivalent) -- Step 4: Write final result MERGE maritime.fleet_hierarchy AS target USING work_wsg AS source ON target.lrno = source.lrno WHEN MATCHED THEN UPDATE SET ... WHEN NOT MATCHED THEN INSERT ...; END; -- Schedule via BigQuery Scheduled Queries or Cloud Composer/Airflow

spAnnotateChange → BigQuery Audit Log Table:

CREATE TABLE maritime.audit_change_log ( change_id STRING DEFAULT GENERATE_UUID(), table_name STRING, field_name STRING, record_id STRING, old_value STRING, new_value STRING, user_email STRING, source_system STRING, changed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP() ) PARTITION BY DATE(changed_at) CLUSTER BY table_name, record_id;
spABSD_SHIP_SEARCH_SYNC should be replaced by: (1) a BigQuery materialized view for the compliance summary, (2) a CDC pipeline (Datastream or Debezium) to capture SQL Server changes, and (3) a BigQuery scheduled query to apply compliance flag calculations.
7 Data Quality Rules to Migrate

The following validations are enforced by triggers and must be reimplemented in the target system.

Identity / Format Validation

RuleLegacy LocationPriority
LRNO is 7 charsSchema constraintHigh
OWCODE is 7 charsSchema constraintHigh
Callsign valid for flag countryOVGE triggerHigh
MMSI valid for flag countryfnMMSICheckHigh
Official number: no hyphen for PanamaOVGE triggerMedium
Date YYYYMMDD is validfnDateIsInvalid, trigger validationsHigh
SEQNO '00' name must be uppercaseOVNA triggerMedium
One LNCHD_AS='Y' per vesselOVNA triggerMedium
One COMP_AS='Y' per vesselOVNA triggerMedium
Tank coating: sub-type requires parentTANKER triggerMedium

Cross-Table Consistency

RuleLegacy LocationPriority
Owner ≠ Manager (unless code '9991001')HIOW triggerHigh
Name EFD ≥ Construction EFD (STSE)OVNA triggerMedium
Winch count > 0 only if lifting gear type DE or AFFUGE triggerLow
Casualty and disposal are mutually exclusiveCAGE1 triggerHigh
Only one demolition record per vesselCAGE1 triggerHigh
D22_MANBOW_LEN ≤ LBP or LOAFUDI1 triggerLow
Cargo tank coating required if sub-coatings setTANKER triggerMedium
AC generator requires frequency 50 or 60 HzHIGE triggerLow

Reference Integrity

RuleLegacy LocationPriority
Hull section code in CBUB1 FLDI='42'OVNA triggerMedium
Casualty code in CBUB1 FLDI='57'OVCS triggerMedium
Generator manufacturer in CBEBHIGE triggerLow
Flag country in T_REF_COUNTRYHIFL (application)High
Owner code in OWGEHIOW triggerHigh
Port key exactly 7 charsCBPPROOT triggerMedium
9 EDM Integration Handling

The legacy database is mid-migration to an EDM (Enterprise Data Management) platform. Views already use:

Migration recommendation: For entity data (companies, countries, ports), source directly from EDM — it will become the master. For vessel data (all OV*, HI*, FU* tables), source from ABSD_ tables.

The legacy database uses mixed collations:

CollationBehaviourWhere used
Latin1_General_CI_AICase-insensitive, accent-insensitiveMost ABSD_ tables
SQL_Latin1_General_CP1_CI_ASCase-insensitive, accent-sensitiveSome columns
SQL_Latin1_General_CP1_CS_ASCase-sensitiveRare
BigQuery strings are Unicode (UTF-8) and case-sensitive by default. String comparisons that were case-insensitive in SQL Server require explicit LOWER() wrapping in BigQuery. This affects:
• All owner code lookups: WHERE LOWER(owcode) = LOWER(@param)
• LRNO cross-reference joins: WHERE LOWER(lrno) = LOWER(imo_num)
• Name matching queries

Recommended CDC architecture for the migration:

SQL Server
LRUpdate DB
Debezium
or Azure Event Hubs CDC
Cloud Pub/Sub
topics
Dataflow
streaming pipeline
BQ Bronze
raw, immutable
BQ Silver
validated, normalized
BQ Gold
business-ready views
Each ABSD_ table maps to a BigQuery bronze table preserving all original columns plus CDC metadata: _op (INSERT/UPDATE/DELETE), _source_ts, _kafka_offset. dbt or Dataform transforms bronze → silver → gold.
LayerContentsTooling
BronzeRaw CDC events, immutable, all original columns + _op, _source_ts, _kafka_offsetDataflow streaming + BQ streaming insert
SilverValidated, normalized, date-converted, LRNO→IMO mapped, collation-correcteddbt or Dataform transformation
GoldBusiness-ready views: current vessel state, ownership history, fleet hierarchy, compliance flagsDataform materialization + BigQuery scheduled queries