BigQuery Migration Guide
Migration patterns and technical mappings for moving the legacy SQL Server LR database to BigQuery. Documents what needs to be done and why, based on actual legacy system analysis. Does NOT prescribe a specific target schema — that should be determined during design.
LRNO (Lloyd's Register 7-char identifier) as the universal primary key, while international standards use IMO Number. In VSL_ tables, IMO_NUM and LRNO contain the same value — but this is not documented anywhere in the schema itself, only in trigger code and views.Required action before any BigQuery load — build the complete cross-reference:
| Legacy Format | Length | Example | BigQuery Type | Conversion Rule |
|---|---|---|---|---|
| YYYYMM | 6 chars | 202603 | DATE | PARSE_DATE('%Y%m', value) — set day=01 |
| YYYYMMDD | 8 chars | 20260315 | DATE | PARSE_DATE('%Y%m%d', value) |
| YYYYMMDD (special) | 8 chars | 20260300 | DATE | Day=00 → set to last day of month |
| YYYYMMDD (special) | 8 chars | 20269900 | DATE | Month=99 → treat as December |
| YYMMDD | 6 chars | 260315 | DATETIME | J06_LNCHDATE — prepend '20' for year |
000000 / 00000000 | NULL | Map to NULL | ||
999999 / 99999999 | NULL | Map to NULL (open-ended) | ||
198099 | NULL | Special sentinel — means "not set" |
BigQuery UDF equivalent of fnLRDateToRealDate:
The legacy system is mono-temporal: SEQNO '00' = current, '01'+ = historical descending. It tracks valid time but does NOT systematically track transaction time (though J06_LNCHDATE provides a rudimentary marker).
Target BigQuery bi-temporal structure:
Conversion rule for SEQNO history (using LEAD window function):
The SEQNO='00' record will have valid_to = NULL (still current). This pattern applies to all 13 HI* history tables:
ABSD_HIOW, ABSD_HIMA, ABSD_HISM, ABSD_HIOP, ABSD_HIPP, ABSD_HIFL, ABSD_HITL, ABSD_HIST, ABSD_HIFC1, ABSD_HIBBC, ABSD_HIGBO, ABSD_HIBR, ABSD_STSE
Triggers cannot be replicated directly in BigQuery (no DML triggers). Each trigger's business logic must be reimplemented in one of four patterns:
Example: OVGE callsign must match flag country range → dbt test or Dataform assertion checking callsign against country range table.
Example: HIOW_Update recalculates OWST statistics → Scheduled job that recalculates fleet_statistics table nightly.
Example: CBPPROOT trigger syncs port details to CBPD and CBTO1 → BQ SP:
merge_port_details(port_key).Example: Current name must be UPPERCASE → API layer:
validate name.equals(name.toUpperCase()).| Legacy Trigger | Effect | BigQuery Approach |
|---|---|---|
ABSD_OVGE_Update | Updates SHIP_SEARCH callsign, officialNo | Pattern B ETL pipeline sync job |
ABSD_OVGE_Update | Validates callsign vs flag | Pattern A dbt test + Dataform assertion |
ABSD_OVNA_Update | Enforces name uniqueness per vessel | Pattern D Pipeline validation + application layer |
ABSD_OVNA_Update | Updates SHIP_SEARCH vessel name | Pattern B ETL sync job |
ABSD_HIOW_Update | Owner-manager conflict prevention | Pattern D Application layer validation |
ABSD_HIOW_Update | Recalculates OWST fleet counts | Pattern B Scheduled fleet_stats job |
ABSD_HIOW_Update | Cascades to HIPP/HITP | Pattern C Ownership resolution procedure |
ABSD_CAGE1_Update | Cross-vessel incident creation | Pattern D Application layer logic |
ABSD_CAGE1_Update | Manages disposal/casualty mutual exclusion | Pattern D Application layer validation |
ABSD_CBCOROOT_Update | Syncs company data to CBSB, BLDR_TEXT | Pattern B Company master sync job |
ABSD_CBPPROOT_Update | Syncs port to CBPD, CBTO1 with coordinates | Pattern C Port master sync job |
ABSD_FUGE_Update | Updates SHIP_SEARCH capacity fields | Pattern B ETL capacity sync job |
ABSD_TANKER TANKER_UPDATE | Hull type manages FUSF2 feature codes | Pattern B Ship type feature management job |
| All triggers | J06_LNCHDATE update | BigQuery _record_updated_at TIMESTAMP column |
| All triggers | *_UPDATES staging tables | BigQuery streaming insert change feed or Pub/Sub |
| All triggers | tblChanges audit log | BigQuery audit log table (append-only, partitioned by date) |
All SQL Server scalar functions can be migrated to BigQuery SQL UDFs or JavaScript UDFs. See 05 Functions for the full function catalog.
| Category | Migration Approach | Priority | Notes |
|---|---|---|---|
| Date (fnLRDateToRealDate etc.) | BigQuery SQL UDFs | High | Full UDF code in Section 2 above. Used 200+ times across the codebase. |
| Point-in-time ownership (fnGetRegOwnerAtDate) | Inline BigQuery window queries | High | ROW_NUMBER() OVER (PARTITION BY LRNO ORDER BY H01_EFD DESC) pattern replaces scalar UDF. |
| String normalization (fnTidyShipname etc.) | BigQuery JavaScript UDFs | Medium | JS regex-based replacement: name.replace(/[.,?']/g, ' ').replace(/\s+/g, ' ').trim() |
| Fuzzy matching (LevenshteinEditDistance, HowSimilar) | Built-in ML.EDIT_DISTANCE |
Medium | BigQuery has native ML.EDIT_DISTANCE(str1, str2) as of 2024. For NYSIIS phonetic matching, use a Cloud Function or Dataproc UDF. |
| Geographic (fnConvertDegreesMinsToDecimal, fnDistanceCalc) | BigQuery native geography types | Medium | ST_GEOGPOINT(lon, lat) and ST_DISTANCE(pt1, pt2) are native. fnMarsdenGrid has no native equivalent — needs custom UDF. |
| Classification (fnCLASSLIST etc.) | BigQuery table lookups | Medium | STRING_AGG(class_notation, ', ') FROM vessel_classification_history WHERE is_current = TRUE |
| Narrative generation (fnAPSNarrativePrimeMoverGrouped etc.) | Application layer or Vertex AI | Low | Options: (1) BQ stored procedure building text strings, (2) Go/Python application layer, (3) Vertex AI Gemini for natural language generation (recommended for modernization). |
Point-in-time ownership inline pattern (replaces fnGetRegOwnerAtDate):
GROUP_FLEET Suite → BigQuery Scheduled Procedure:
spAnnotateChange → BigQuery Audit Log Table:
The following validations are enforced by triggers and must be reimplemented in the target system.
Identity / Format Validation
| Rule | Legacy Location | Priority |
|---|---|---|
| LRNO is 7 chars | Schema constraint | High |
| OWCODE is 7 chars | Schema constraint | High |
| Callsign valid for flag country | OVGE trigger | High |
| MMSI valid for flag country | fnMMSICheck | High |
| Official number: no hyphen for Panama | OVGE trigger | Medium |
| Date YYYYMMDD is valid | fnDateIsInvalid, trigger validations | High |
| SEQNO '00' name must be uppercase | OVNA trigger | Medium |
| One LNCHD_AS='Y' per vessel | OVNA trigger | Medium |
| One COMP_AS='Y' per vessel | OVNA trigger | Medium |
| Tank coating: sub-type requires parent | TANKER trigger | Medium |
Cross-Table Consistency
| Rule | Legacy Location | Priority |
|---|---|---|
Owner ≠ Manager (unless code '9991001') | HIOW trigger | High |
| Name EFD ≥ Construction EFD (STSE) | OVNA trigger | Medium |
| Winch count > 0 only if lifting gear type DE or AF | FUGE trigger | Low |
| Casualty and disposal are mutually exclusive | CAGE1 trigger | High |
| Only one demolition record per vessel | CAGE1 trigger | High |
| D22_MANBOW_LEN ≤ LBP or LOA | FUDI1 trigger | Low |
| Cargo tank coating required if sub-coatings set | TANKER trigger | Medium |
| AC generator requires frequency 50 or 60 Hz | HIGE trigger | Low |
Reference Integrity
| Rule | Legacy Location | Priority |
|---|---|---|
Hull section code in CBUB1 FLDI='42' | OVNA trigger | Medium |
Casualty code in CBUB1 FLDI='57' | OVCS trigger | Medium |
| Generator manufacturer in CBEB | HIGE trigger | Low |
Flag country in T_REF_COUNTRY | HIFL (application) | High |
| Owner code in OWGE | HIOW trigger | High |
| Port key exactly 7 chars | CBPPROOT trigger | Medium |
ABSD_SHIP_SEARCH is a denormalized search index maintained by 20+ triggers. In BigQuery, three options:
| Option | Approach | When to use |
|---|---|---|
| Option A — BigQuery SEARCH | SELECT * FROM maritime.vessel_search_index WHERE SEARCH(vessel_search_index, @query_term) |
Simple keyword/token search. Requires maintaining a vessel_search_index table updated by batch job. |
| Option B — Vertex AI Search | Index vessels in Vertex AI Search for semantic and full-text search | Recommended for full-text and semantic search requirements (vessel name matching, company name fuzzy search). |
| Option C — Materialized View | BigQuery materialized view joining ABSD_OVGE + ABSD_OVNA + ABSD_HIFL + ABSD_HITL + ABSD_HIOW at SEQNO='00' | Structured attribute queries (filter by GT, flag, status). Not suitable for text search. |
The legacy database is mid-migration to an EDM (Enterprise Data Management) platform. Views already use:
EDM.dbo.T_ENT_MASTERinstead ofABSD_OWGEEDM.dbo.T_REF_COUNTRYinstead ofABSD_CBCYEDM.dbo.T_REF_CBUB3instead ofABSD_CBUB3
The legacy database uses mixed collations:
| Collation | Behaviour | Where used |
|---|---|---|
Latin1_General_CI_AI | Case-insensitive, accent-insensitive | Most ABSD_ tables |
SQL_Latin1_General_CP1_CI_AS | Case-insensitive, accent-sensitive | Some columns |
SQL_Latin1_General_CP1_CS_AS | Case-sensitive | Rare |
LOWER() wrapping in BigQuery. This affects:• All owner code lookups:
WHERE LOWER(owcode) = LOWER(@param)• LRNO cross-reference joins:
WHERE LOWER(lrno) = LOWER(imo_num)• Name matching queries
Recommended CDC architecture for the migration:
LRUpdate DB
or Azure Event Hubs CDC
topics
streaming pipeline
raw, immutable
validated, normalized
business-ready views
_op (INSERT/UPDATE/DELETE), _source_ts, _kafka_offset. dbt or Dataform transforms bronze → silver → gold.| Layer | Contents | Tooling |
|---|---|---|
| Bronze | Raw CDC events, immutable, all original columns + _op, _source_ts, _kafka_offset | Dataflow streaming + BQ streaming insert |
| Silver | Validated, normalized, date-converted, LRNO→IMO mapped, collation-corrected | dbt or Dataform transformation |
| Gold | Business-ready views: current vessel state, ownership history, fleet hierarchy, compliance flags | Dataform materialization + BigQuery scheduled queries |