⏱ 24 min read
At the outset, we told ourselves this was an infrastructure migration.
That is usually how these things start. An aging integration estate. Data center contracts nearing renewal. Middleware vendors becoming more expensive and somehow less persuasive every year. A few outage postmortems that nobody is eager to revisit. Then a new analytics program arrives and asks for elastic compute, event ingestion, and near-real-time data products as if the current estate had simply been waiting for a cloud landing zone all along.
In our case, the trigger was fairly typical for a mid-to-large energy enterprise. Two on-prem data centers. A legacy ESB doing far more than it ever should have. Market messaging, meter data, outage notifications, trading interfaces, batch orchestration, partner integrations—all threaded through a landscape shaped more by acquisition and necessity than by any deliberate design. At the same time, the digital edge had already moved on: customer apps were consuming some cloud-native APIs, product teams were asking for managed services, and leadership had decided—sensibly enough—that the next five years should not be funded as though the previous fifteen had gone exactly to plan.
I was leading integration architecture, and at first I assumed the difficult part would be platform choice. Managed integration versus containers. Kafka versus a cloud event service. API gateway strategy. IAM federation. Network patterns into operational sites.
It wasn’t.
The real problem was visibility. We had no credible way to show what would break, what could move, what had to stay, and what would need to coexist for much longer than anyone wanted to say out loud. Every stakeholder held part of the truth. Operations knew which flows mattered when a storm hit. Security knew which regional data constraints were simply non-negotiable. Application teams knew their own interfaces—mostly. Program leadership wanted sequencing, funding, and risk reduction. But nobody had the full dependency story in one place.
That is where ArchiMate 3 stopped being optional.
Not because ArchiMate is magical. It isn’t. In the wrong hands, it turns into a diagramming exercise with delusions of rigor. But for this migration, with too many stakeholder views and too many conflicting assumptions, it gave us something we genuinely needed: one coherent model connecting business impact, application dependencies, technology choices, constraints, and transition states.
And that last part matters most. Cloud migration is not a future-state picture. It is a set of temporary architectures that have to be safe enough to run in production.
This article is a step-by-step account of how we modeled that migration. Not the polished version. The version with mistakes, rework, awkward stakeholder sessions, and a few expensive lessons about what not to model.
The architecture problem we actually had
The starting point was messy. It is worth saying that plainly, because too many migration articles quietly begin from a baseline that already looks curated.
Ours did not.
The enterprise service bus supported market messaging, meter reads, outage notifications, and a pile of transformations that had drifted a long way from their original owners. There was an on-prem integration platform running brittle batch jobs, some old enough that nobody wanted to touch the scheduling logic unless absolutely necessary. SCADA and operational platforms were deeply embedded in control processes and could not simply be lifted and shifted because somebody had a cloud KPI to meet. Meanwhile, customer-facing channels had already gone partly modern, consuming APIs through a gateway that sat outside the old integration center of gravity.
Then add the energy-sector specifics. Regulatory reporting deadlines. Regional data residency constraints. OT and IT boundaries that are technically clear in some places and maddeningly blurred in others. Telemetry that looks like enterprise data until the moment it turns out to be operationally critical. Partner ecosystems still dependent on file exchanges because market operators, for all their modernization rhetoric, often move at the speed of governance.
Everybody wanted something different.
Business leaders wanted speed and flexibility. Security wanted control and traceability. Operations wanted stability and fallback paths. Delivery teams wanted simpler patterns, not another metamodel presentation from architecture.
And the usual artifacts did not answer the real questions. Spreadsheets captured application inventories but not behavioral dependencies. Static diagrams looked neat but aged almost instantly. CMDB extracts were useful as raw input and nearly useless as migration logic. None of them helped us answer questions like: if we move this API domain first, which overnight reconciliation path gets exposed? If we replace this broker path with event streaming, where do replay and ordering stop being technical concerns and start becoming business risks?
That is the point where architecture either becomes useful or decorative.
Why ArchiMate 3 helped, specifically
I’m not going to turn this into a notation tutorial. There are plenty of those already, and most of them lose people before they get to the useful part.
What mattered in practice was that ArchiMate 3 let us connect layers without inventing our own semantics every week. We could represent drivers and constraints. We could tie business processes and services to application services and data objects. We could map technology services without jumping straight into vendor logos. And, crucially, we could model implementation and migration in a way that made transition states first-class, instead of an afterthought tacked on at the end.
That combination made the difference.
There is also an opinion here, earned the expensive way: ArchiMate is only valuable if you resist the temptation to model everything. The notation invites completeness. Migration programs do not need completeness. They need decision support. ArchiMate training
So before drawing anything, we forced ourselves to define the questions the model had to answer.
Before drawing anything: define the migration questions
This is where a lot of architecture efforts quietly fail. Not because the diagrams are wrong, but because the model was never designed to support decisions in the first place.
We wrote down the migration questions first. Around ten of them.
Which integration capabilities had to move first to reduce operational risk? Which systems were realistic candidates for rehost, refactor, replace, or retain? Where were the hard OT/IT boundaries, not the conceptual ones? Which business processes would be exposed if the migration sequence was wrong? Which interfaces needed coexistence patterns during transition? What did “cloud” actually mean for each workload—basic IaaS, managed integration, Kafka-style event streaming, container platform, managed API gateway, cloud IAM?
That last question created more friction than expected. People say “move it to cloud” as if cloud were a single destination. In reality, the migration choices were very different for customer APIs, batch settlement jobs, meter event processing, and OT-adjacent dispatch interfaces.
We also narrowed scope early. This was not a full enterprise reinvention. It was an integration-heavy migration focused on selected value streams in energy operations and customer service. That sounds obvious in hindsight. At the time, it saved us from a lot of bad modeling behavior.
My rule of thumb now is simple: define 8 to 12 decision questions before choosing viewpoints. If an element does not help answer one of those questions, it probably does not belong in the migration model.
Harsh, maybe. But useful.
The case-study baseline: an energy integration landscape
The landscape in scope covered five business capability areas: meter-to-cash, outage management, energy trading settlement, field work dispatch, and asset monitoring.
On the application side, we modeled the Customer Information System, Meter Data Management platform, Outage Management System, Trading and Risk platform, Enterprise Service Bus, API gateway, data lake, and identity provider. There were others around the edges, naturally, but those were the core components driving migration decisions.
Technology was equally mixed. Two on-prem data centers. A legacy MQ broker cluster. Batch integration servers. A Kubernetes platform in cloud. Managed API services. Cloud event streaming. Direct network links to substations and operational sites. IAM spanning enterprise identity and service-to-service trust. Enough moving parts to confuse anyone trying to jump straight to target state.
The energy-specific complication was this: not every critical dependency was visible at the business layer. Some low-level telemetry and dispatch interfaces looked small in architecture inventories and enormous in operational consequence. We had to model for that asymmetry. Otherwise the migration would optimize what was visible and put the wrong things at risk.
Step 1 — Model the business impact first, not the target platform
This was the first thing we got right, after nearly getting it wrong.
Instead of beginning with cloud services and deployment choices, we started with a business impact view. Business actors included grid operations, customer service, market operations, and field crews. We mapped the business processes most exposed to integration changes: outage restoration, meter read validation, settlement reconciliation. Then we modeled the business services those processes depended on, such as outage communications, consumption data exchange, and dispatch support.
This is not abstract housekeeping. It changes the migration conversation.
For example, outage management depended on near-real-time integration with field systems and customer notification channels. If a cloud move introduced sequencing issues or a subtle latency penalty in the wrong place, restoration performance would suffer. Not in theory. In production. During bad weather. With executives suddenly becoming very interested in middleware.
We initially made the classic mistake of modeling capabilities too high up. “Customer operations.” “Grid management.” “Trading support.” All true, all useless for migration sequencing. Those abstractions hid the differences between settlement tolerance and outage-response urgency. Once we dropped to business-process and service level, the tradeoffs became sharper and much more honest.
Tie every business service to migration concerns you can measure, or at least assess: latency sensitivity, resilience expectations, compliance exposure, operational windows, user impact. If you don’t, the business layer becomes decorative wallpaper.
A simple sketch of the logic looked like this:
Not a full ArchiMate view, obviously, but the point is sequence: actor, process, service, impact.
Step 2 — Trace application collaboration and integration dependencies
Once the business impact was clear enough, we moved into application architecture.
Here we built a cooperation and dependency view showing which applications provided and consumed integration services, where interactions were synchronous versus asynchronous, which batch interfaces still mattered, and where shared middleware created bottlenecks.
This is where hidden truth surfaced.
Meter Data Management published validated reads to billing, forecasting, and regulatory reporting. Straightforward enough. But the ESB also enriched those messages using customer master data, and a nightly file transfer supported trading reconciliation under a settlement deadline that nobody outside a small operational circle seemed to remember. The file transfer looked archaic, but its business timing was immovable.
In ArchiMate terms, we used application components, application services, data objects, and relationships like serving, access, and flow. The exact notation mattered less than the discipline: keep interface contracts separate from transport technologies. That was one of our early corrections, and it mattered more than I expected.
We had initially collapsed APIs, message queues, and file drops into one fuzzy “integration interface” concept. It made the diagrams look cleaner and the migration planning much worse. A customer API exposed through a managed gateway, an MQ-based async integration, and a scheduled settlement file are not interchangeable just because they all move data between systems.
We started tagging integration paths visually: critical, fragile, replaceable, transitional. That helped conversations with delivery teams because they could see where we expected redesign, where we expected coexistence, and where we were simply trying not to break something before quarter-end.
A rough dependency slice looked something like this:
Again, simplified. But even that level of explicitness was enough to expose bad assumptions.
Step 3 — Model the data and information objects nobody wants to talk about
Cloud migration discussions love platforms and tend to underplay data. That is understandable. It is also usually a mistake.
In our case, the critical information objects included meter readings, customer account events, outage tickets, asset telemetry, settlement files, and market messages. We did not model every enterprise data domain. We modeled the information objects that materially affected migration sequencing, compliance, or integration design.
That cut through a lot of noise.
Meter readings, for instance, had multiple lives. Some belonged in cloud-managed analytics and event pipelines. Some needed local processing or buffering because of timing, intermittency, or operational constraints. Some triggered retention and sovereignty questions depending on region and regulatory interpretation. Asset telemetry was even more sensitive because it lived in a grey zone: useful for enterprise analytics, yes, but also close enough to operational control that nobody sensible wanted to centralize it blindly.
This was also where Kafka-style thinking entered the model properly. People often ask, “Should we use Kafka?” That is the wrong first question. The better question is: which information objects benefit from ordered event streams, replay, decoupled consumers, and elastic retention—and which ones do not? For customer events and some meter domains, event streaming made architectural sense. For dispatch-adjacent control flows, we were much more conservative.
ArchiMate helped because data objects could be traced to application services, constraints, and migration plateaus. Compliance stopped being a note in the margin and became something we could model explicitly.
A table we wish we had at the beginning
Early on, we kept relearning the same lessons. Eventually we summarized them in a table that became oddly useful in workshops.
If I were starting again, I would use this on day two.
Step 4 — Add the technology layer, but only after the dependency story is clear
Only then did we model the technology layer.
That order matters more than people think. If you start with cloud architecture diagrams, the migration quietly turns into a platform design exercise. Useful, yes. Complete, no.
Our technology view represented retained on-prem nodes, a cloud landing zone, managed integration services, event broker capabilities, API management, container runtime, identity and secrets services, and connectivity into OT and edge environments. We also modeled technology services before naming products. That forced better conversations.
For example, instead of debating one API vendor against another too early, we asked: what technology services do we actually require? External API exposure, policy enforcement, throttling, analytics, certificate handling, OAuth integration with enterprise IAM, regional deployment controls. Once those were explicit, product choices became grounded rather than tribal.
Hybrid reality showed up very clearly. Some integration paths would remain on-prem for years. Some could be replatformed quickly. Some needed redesign into cloud-native patterns. Customer-facing APIs moved first to managed cloud API services. OT-adjacent integrations stayed on-prem, but we introduced buffered event replication and selective exposure patterns around them. IAM also became a first-class concern here, because hybrid integration without coherent identity quickly becomes a trust problem disguised as networking.
And yes, Kafka came back in this layer too. We did not model “Kafka” everywhere. We modeled event streaming as a technology service, then mapped where a managed Kafka-compatible service fit, where a cloud-native event hub was sufficient, and where neither solved the real problem.
Keep deployment detail out of executive views. That sounds obvious, but teams forget it all the time.
The first serious mistake: we modeled the target state too early
We did this. It was elegant. It was also nearly useless.
The architecture team produced a polished target-state diagram showing the future hybrid-cloud integration platform. APIs fronted by managed gateway services, event streams replacing old broker patterns, containerized mediation where necessary, IAM centralized, observability improved—all the right shapes in all the right places.
Stakeholders nodded politely.
Then the useful questions started.
How do we get there without breaking coexistence? Which funding wave pays for the event bridge? What remains on-prem in phase one? Which release train carries customer notification migration? What has to happen before trading interfaces can move? Which domains are technically movable but operationally too risky right now?
Our beautiful target-state diagram answered none of that.
That was the turning point. Cloud migration is not one architecture. It is a sequence of temporary architectures. If you cannot represent those temporary states, your model may impress architects and still fail delivery.
So we pivoted hard into implementation and migration modeling.
Step 5 — Use plateaus to represent transitional reality
This turned out to be the most valuable part of the whole model.
We used plateaus to define meaningful transition states:
- Plateau A: current-state on-prem integration backbone
- Plateau B: cloud API front door with legacy ESB retained
- Plateau C: event-driven coexistence for meter and customer domains
- Plateau D: selective retirement of batch and broker components
The key was not the labels. It was what changed between plateaus.
In each one, we showed which application services had moved, which interfaces were redirected, which technology services had been introduced, and which business risks were reduced or newly created. That gave program leadership a credible roadmap and gave operations teams something even more important: visibility into temporary complexity. TOGAF training
Our first attempt at plateaus was wrong, though. We defined them by infrastructure milestones. Landing zone ready. Cluster deployed. IAM integrated. Those are engineering achievements, but they are not transition architectures in a business sense.
We reworked them around operationally meaningful outcomes instead. Customer channels decoupled from shared ESB transformations. Meter event coexistence with replay capability established. Dispatch-adjacent flows retained locally with monitored replication. Those were safer and much easier to govern.
Do not invent a plateau for every sprint. That way lies chaos. Plateaus should represent business-safe transition points.
Step 6 — Model work packages, gaps, and dependencies in plain language
Once plateaus were stable enough, we added work packages, deliverables, gaps, and outcomes.
And we named them in language delivery leads would actually recognize.
Not “Implement target-state integration decoupling framework.” Nobody wants that.
Instead:
- Move customer notification APIs to cloud gateway
- Introduce event bridge between on-prem broker and cloud stream
- Retire duplicate transformation logic
- Establish observability for hybrid integration flows
- Implement service identity pattern for cloud-to-on-prem integration
- Add replay capability for meter event processing
That last one started as a gap. We had plans for event streaming but no credible replay design for meter data processing under partial failure. Modeling that gap explicitly did two things: it made the risk visible, and it justified investment without architecture sounding precious.
In the outage domain, one work package isolated outage messaging from shared ESB transformations. Another delivered secure cloud-to-substation connectivity patterns. These were not theoretical constructs. They aligned directly to release planning and budget decisions.
This is a subtle but important point: implementation elements should not exist to complete the model. They should exist to justify and sequence actual work.
Step 7 — Bring in motivation and constraint elements where they matter
We used motivation elements selectively.
The main drivers were data center exit, faster digital service delivery, resilience improvement, and cost transparency. The main constraints were OT latency sensitivity, regulatory reporting deadlines, data sovereignty, and change freeze windows during peak seasonal demand.
Too much motivation modeling becomes unreadable very quickly. Too little, and the migration looks arbitrary. The balance is to use drivers, requirements, constraints, and assessments where they explain a real architectural choice.
One concrete example: settlement cut-off windows materially changed migration sequence for trading integrations. It was tempting to move some of those interfaces earlier because their technical dependencies looked manageable. But once the timing constraint was modeled explicitly, it became obvious that a failed transition would hit operational and regulatory commitments in ways the business would not tolerate. So those interfaces stayed on-prem longer.
That was not architectural cowardice. It was judgment.
What the final model looked like for different stakeholders
There was no one master diagram. I do not believe in that, and in practice it almost always fails.
What we ended up with was one underlying repository and several purposeful views:
- an executive migration heatmap showing domains, risk, and sequencing
- a business-impact view for operations leaders
- an application dependency view for domain architects
- a technology coexistence view for platform teams
- a transition roadmap view for program governance
That is the only sane approach. If a single ArchiMate view tries to satisfy executives, operations, security, platform engineering, and delivery teams at once, it usually satisfies none of them. ArchiMate modeling guide
Repository discipline matters here. One model. Several views. Tight semantics. Minimal vanity.
A concrete walkthrough: outage management migration slice
This slice became the narrative center of the program because everyone understood it.
Current state first: the Outage Management System was integrated through the ESB with an SMS/email provider, a field dispatch system, the customer mobile app, and GIS services. The pain points were familiar—shared transformations, poor observability, difficult failover, and too much hidden coupling between customer notifications and dispatch-adjacent processes.
The target intent was more nuanced than “move outage to cloud.” Customer-facing channels would shift to cloud API management and event-driven notification services. Dispatch-adjacent control integration would remain local. That distinction mattered a great deal.
We modeled it step by step.
At the business layer, we linked outage restoration and customer communications to the relevant services and actors. Then at the application layer, we separated notification services from dispatch support services and traced their dependencies. Data and event flows came next: outage ticket updates, customer status events, dispatch messages, GIS context. Then we added the technology coexistence view showing cloud API gateway, event broker, IAM trust, and the retained local integration paths needed for field dispatch.
Only after that did we define the plateau transition:
- In Plateau B, customer notification APIs moved behind the cloud gateway.
- In Plateau C, notification events were published through a cloud event service with replay and monitoring.
- Dispatch integration remained on-prem, with selective event replication for visibility and analytics.
- In Plateau D, some shared ESB transformations were retired.
The caution here is worth underlining: customer notification channels were relatively easy to move. Dispatch-adjacent services were not. Treating them as one migration domain would have been a serious mistake.
That asymmetry is common in energy. The architecture model needs to make it visible.
What we chose not to migrate, and why that mattered
This is the part many migration stories skip because it sounds less ambitious.
We deliberately retained certain broker functions near OT environments. We kept some legacy file exchanges because external market participants still required them. We retained local buffering services where field connectivity was intermittent and operationally sensitive.
Retain is not a failure state. It is an architecture decision.
In fact, one of the signs that a cloud migration model is credible is that it includes stable hybrid boundaries. Not everything should move on the same timeline, and some things should not move at all until the surrounding operational context changes.
That realism improved trust in the model. Operations teams, in particular, stopped treating architecture as a cloud sales function and started engaging with it as a risk management tool.
Common modeling traps in cloud migration programs
A few traps showed up repeatedly.
Modeling products instead of services and responsibilities. This happens the moment diagrams fill with vendor icons and nobody can explain what capability is actually required.
Treating all integrations as equal. They are not. A nightly settlement file and a customer preference API do not deserve the same migration assumptions.
Ignoring transition architecture. This is probably the biggest one. If you only model current and target, you hide the dangerous part.
Missing non-functional constraints. Latency, cut-off windows, resilience expectations, sovereignty, IAM trust boundaries. These are not side notes.
Confusing application service with interface endpoint. Very common. Usually leads to poor transition planning.
Overmodeling until diagrams become unusable. ArchiMate absolutely allows this. Resist it. ArchiMate tutorial
And a trap specific to sectors like energy: assuming cloud-native target patterns can be adopted uniformly across OT and IT. They cannot. Sometimes not yet. Sometimes not ever.
Practical guidance for integration architecture leads
If you are doing this for real, start with two or three migration slices, not the whole estate. Pick areas where the dependency story matters and where stakeholders will actually engage.
Use color, tags, or viewpoints to show criticality and transition status. It sounds basic, but it helps enormously.
Validate dependency models with operations and support teams, not just app owners. App owners know intent; support teams know what fails at 2 a.m.
Model fallback paths for operationally critical services. Especially in outage, dispatch, and settlement contexts.
Keep vendor-specific target detail in separate artifacts unless the audience genuinely needs it.
And revisit plateaus after every major release wave. Transitional architecture drifts quickly if nobody tends it.
One opinion, because I have seen this play out more than once: the best migration model is the one delivery teams argue with and then use, not the one architects quietly admire. ArchiMate in TOGAF
Measuring whether the model was actually useful
We tried to judge usefulness by outcomes, not aesthetics.
Did it reduce ambiguity in migration sequencing? Yes. Did it cut down surprise dependencies discovered late? Not entirely, but enough to matter. Did it create a clearer distinction between cloud-ready and cloud-constrained workloads? Absolutely. Did it improve steering decisions about coexistence investment? Very much so. And perhaps most importantly, did it improve communication between platform teams, operations leaders, and business stakeholders? More than any single previous artifact we had used.
It was not perfect.
Some legacy integrations remained poorly understood. Data ownership issues took longer than expected to resolve. Vendor roadmaps shifted mid-program, which meant some technology assumptions had to be revisited. That is normal. A useful architecture model does not eliminate uncertainty; it makes uncertainty visible enough to manage.
Closing: model cloud migration as a sequence of operationally safe decisions
What changed for me in this program was simple.
I stopped thinking about cloud migration as a destination architecture and started thinking about it as a sequence of operationally safe decisions. That sounds less glamorous, and it is much closer to the truth.
ArchiMate 3 was useful because it let us put business impact, integration dependencies, hybrid technology reality, and transition states into one connected model. Not one giant unreadable view. One underlying model with views that different stakeholders could actually use to make decisions.
For integration leaders in energy, that is the real challenge. The hard part is not drawing the future cloud. Most teams can do that by week three.
The hard part is making the interim states visible enough that the business can move without breaking the services it depends on.
And if your model can do that, people will forgive the notation.
FAQ
How much ArchiMate detail is enough for a migration program?
Enough to answer the migration decisions in front of you. Usually less than architects first think and more than executives initially expect.
Should cloud vendors appear directly in the model?
Sometimes, but not too early. Model required technology services first. Add vendor mapping where procurement, engineering, or deployment decisions need it.
How do you model OT/IT boundaries in ArchiMate?
Usually through a combination of application, technology, and constraint elements, plus explicit communication paths and deployment boundaries. The key is to show where trust, latency, and operational control assumptions change.
When should a legacy integration platform be retained rather than replaced?
When operational risk, external dependency, or transition cost outweighs the immediate benefits of replacement. Retain should be deliberate, not accidental.
Are plateaus worth the effort in fast-moving programs?
Yes. Especially in fast-moving programs. Without plateaus, teams tend to confuse roadmap slides with executable transition architecture.
Suggested assets for the article
If this were published with supporting visuals, I’d include:
- a migration overview diagram
- an application dependency view
- a transition/plateau roadmap view
- an outage-management slice view
- the migration concerns table above
Those five artifacts would carry most of the message. The rest is conversation, which is exactly what architecture should enable.
Frequently Asked Questions
How do you model a cloud migration with ArchiMate?
Model baseline and target architectures as Application and Technology layer elements. Use the Implementation and Migration layer to define transition architectures, work packages, and migration events showing the journey between states.
What ArchiMate viewpoints are most useful for cloud migration?
Application Portfolio views (what moves vs retires), Technology Stack views (current vs target infrastructure), and Migration Roadmap views (plateaus and work packages). The Motivation layer links migration to business drivers.
How does ArchiMate handle multi-cloud architectures?
Each cloud provider appears as Technology Services or Nodes. Application Components are assigned to these nodes. Multi-cloud dependencies show through Serving and Flow relationships. Data residency constraints appear in the Motivation layer.