Invalid License

License is not configured.

User Story

Service providers wants to deploy PortaSwitch across several collocations to eliminate (or at least minimize) losses caused by a single collocation outage (or unrecoverable failure). Geo-redundant configuration is aimed to address this requirement however it assumes that:

  • all business-critical components (data storages, PortaBilling, PortaBilling API and PortaSIP Media Server Cluster in particular) support such type of a configuration;
  • system allows to provide quick fail-over with minimal risks:  i.e. human factor in fail-over process excluded (fail-over is performed automatically), critical billing and service configuration data is replicated to remote collocation);
  • system provides fail-over that leads to zero or minimal revenue losses: i.e. fail-over takes minimal possible time (it's a matter of seconds or few minutes at most), inconsistent state of billing entities (balances, XDRs) is inevitable but shall be minimized, double-billing issues must be eliminated.


Use Cases

Use case #1: Geo-redundant configuration of PortaSIP Media Cluster

Roles: ITSP, PortaSwitch, PortraSIP Media Cluster (PSMSC), Configurator, Administrator.

Preconditions

  • PortaSwitch servers are installed into two distant collocations.
  • Administrator assigned servers (via Configurator) from different collocations to primary site ("Main") and secondary site ("Secondary") (configured via Configurator).
  • Administrator deployed database servers as follows:
    • primary billing database was deployed on the Main site. DB-BILLING@Main for further reference.
    • primary sip and media databases were deployed on the Main site. DB-SIP@Main and DB-UM@Main respectively for further reference.
    • r/o replica of the primary billing database was deployed on the Secondary site. DB-BILLING@Secondary for further reference.
    • r/o replica of the sip database was deployed on the Secondary site. DB-SIP@Secondary for further reference. This replica is optional;
    • r/o replica of the media database was deployed on the Secondary site. DB-UM@Secondary for further reference
    • r/w auxiliary billing database with deltas and interim XDRs was deployed on the Secondary site. DB-BILLING-DELTA@Secondary for further reference.
    • r/w sip database was deployed on the Secondary site. DB-SIP-DELTA@Secondary for further reference. Note: it is not a replica of DB-SIP@Main.
  • Administrator added PortaBE instances (at least one per each site).
  • Administrator added PortaAdmin instances (at least one per each site).
  • Administrator configured virtual billing environment "Telco".
  • Administrator reserved 2 VIP addresses for configuration of geo-redundant PSMSC. Must be exactly one VIP per site. VIP1@Main and VIP2@Secondary respectively for further reference.

Use scenario #1.1 Configuration of PortaSIP Media Clusters (cluster per site)

Scenario:

  • Administrator makes a copy of an existing configuration and opens it.
  • Administrator adds required quantity of DispatchingNode instances however at least two per each site (for further reference DispatchingNode@Main and DispatchingNode@Secondary). All DispatchingNode instances are added to "Telco" billing environment.
  • Administrator assigns VIP1@Main to all instances of DispatchingNode@Main.
  • Administrator assigns VIP2@Secondary to all instances of DispatchingNode@Secondary.
  • Administrator adds required quantity of ProcessingNode instances however at least two per each site (for further reference ProcessingNode@Main and ProcessingNode@Secondary). All ProcessingNode instances are added to "Telco" billing environment.
  • Administrator configures additional parameters (if required) and applies a configuration.
  • Once configuration is applied PSMSC is up and running and two addresses can be used for registration and calling services: VIP1@Main and VIP2@Secondary.
  • ITSP provides services and provisions it's customers' devices either by specifying primary/secondary addresses (VIP1, VIP2) or sets up DNS so that proxy name is resolved into two addresses (VIP1, VIP2). For those users who are located closer to Main site ITSP may provision VIP1@Main as primary address and VIP2@Secondary as a fallback address. Otherwise addresses may be provisioned other way around.


Use case #2: Site failovers

Roles: User, PortaSwitch, PSMSC

Preconditions:

  • the same as in "Use case #1"
  • PSMS Cluster is configured (according to "Use  scenario #1.1)

Use scenario #2.1 Enable standalone mode

Main flow

  • PortaSwitch runs in normal mode
  • Datacenter hosting servers of primary site is stroke by a severe power disruption so all servers of primary site go offline. (alternative flow: AF-2.1)
  • Secondary site detects Main site outage and decision is made to enable standalone mode
  • PSMSC@Secondary is signaled about mode change from "normal" to "standalone".
  • PSMSC@Secondary operates in standalone mode.
  • Services are provided according to Use-Case #3.2

Alternative flows
AF-2.1 Network connection between primary and secondary site gets broken. Further scenario repeats the main flow.

 


Use scenario #2.2 Enable normal mode

  • PSMSC@Secondary runs in standalone mode
  • Main site appears online after the outage.
  • Secondary site detects Main site is back again and decision is made to enable normal mode.
  • PSMSC@Secondary is signaled about mode change from "standalone" to "normal".
  • PSMSC@Secondary dumps all active calls' accounting data into DB-BILLING@Main
  • PSMSC@Secondary switches to normal mode;
  • Services are provided according to Use-Case #3.1.
  • Data accumulated in DB-BILLING-DELTA@Secondary is merged back to DB-BILLING@Main to update balances and load XDRs accumulated while system was operating in standalone mode. 

Use case #3: Providing the service during the fail-overs and while system operates in different modes

Roles: User, PortaSwitch, PSMSC, PortaBE

Preconditions:

  • the same as in "Use case #1"
  • PSMS Cluster is configured (according to "Use  scenario #1.1)

Use scenario #3.1 Processing call when primary and secondary sites are up and running normally

Main flow

  • PortaSwitch runs in normal mode
  • Proxy located on the primary site is configured as preferred in User's phone so User registers on the Main site. (Alternative flow: AF-3.1)
  • PSMSC@Main caches User's registration record in DB-SIP@Main.
  • User places an outgoing call, so call is dispatched and processed by PSMSC@Main; call is authorized by PortaBE@Main
  • When User finishes the call PortaBE@Main updates/adds billing information into DB-BILLING@Main


Alternative flows
AF-3.1 
  • Proxy located on the secondary site is configured as preferred in User's phone so User registers on the Secondary site
  • PSMSC@Secondary caches User's registration record in DB-SIP@Main.
  • User places an outgoing call, so call is dispatched and processed by PSMSC@Secondary; call is authorized by PortaBE@Main
  • When User finishes the call PortaBE@Main receives accounting data from PSMSC@Secondary and updates/adds billing information in DB-BILLING@Main

Use scenario #3.2 Processing call that overlaps with primary site failure

  • PSMSC@Secondary site runs in standalone mode
  • User attempts to register at PSMSC@Main. Registration fails since primary site is unavailable.
  • User attempts to register at PSMSC@Secondary. Registration succeeds. PSMSC@Secondary caches registration record in DB-SIP-DELTA@Secondary
  • User places an outgoing call. Call is dispatched and processed by PSMSC@Secondary and call is authorized by PortaBE@Secondary
  • When User finishes the call PortaBE@Secondary receives accounting data from PSMSC@Secondary and updates/adds billing information in DB-BILLING-DELTA@Secondary

 

Use scenario #3.3 Processing call that overlaps with site's failover ("split-brain" case, call started on the primary site)

  • User registers at PSMSC@Main (see Use scenario #3.1)
  • User places an outgoing call, call is authorized in PortaBE@Main.
  • User speaks over phone.
  • Connection between primary and secondary sites breaks, an outage of primary site is detected thus standalone mode is enabled for PSMSC@Secondary.
  • Since Main site is up and running as usual (there is only connection with secondary site is broken) User continues to speak on the phone.
  • User hangs up, PortaBE@Main receives an accounting data from PSMSC@Main and bills the call normally (billing information is saved in DB-BILLING@Main)

Use scenario #3.4 Processing call that overlaps with site's failover ("split-brain" case, call started on the secondary site)

  • PSMSC@Secondary site runs in standalone mode (network connections between sites Main and Secondary is broken, sites function independently)
  • User registers at PSMSC@Secondary (see Use scenario #3.2)
  • User places an outgoing call, call is authorized in PortaBE@Secondary.
  • User speaks over phone.
  • Connection between primary and secondary sites restores, PSMSC@Secondary is notified about mode change so it dumps active calls accounting data to PSMSC@Main and continues to operate in normal mode.
  • User continues to speak and hangs up after a while.
  • PortaBE@Main receives accounting data from PSMSC@Secondary and bills the call normally (billing information is saved in DB-BILLING@Main).

Use scenario #3.5 Processing call that overlaps with site's failover (primary site failure, call started on primary site)

  • User registers at PSMSC@Main (see Use scenario #3.1)
  • User places an outgoing call, call is authorized in PortaBE@Main.
  • User speaks over phone when primary site fails and goes offline entirely.
  • Call does not go through anymore hence aborted. Call is not billed at this point since both processing node (that was handling the call) and billing are dead.
  • An outage of primary site is detected on the secondary site so PSMSC@Secondary switched to standalone mode.
  • After a while primary site appears online again.
  • PSMSC@Main starts and finds a persistent call accounting record in DB-SIP@Main.
  • PSMSC@Main reads persistent call accounting record (identity, call connect time, etc), derives call duration (using call start time and time when Main site failed) and sends accounting data to PortaBE@Main.

    Since failure of whole Cluster differs from single/several nodes failure in several important aspect it requires specific handling which will implemented in separate project.

  • PortaBE@Main bills the call normally (billing infomration added/updated in DB-BILLING@Main).
  • PSMSC@Main removes persistent call accounting record from DB-SIP@Main.

Use scenario #3.6 Processing call that overlaps with site's failover (primary site failure, call started on secondary site)

  • This use scenario is the same as use scenario #3.4