{"id":269213,"date":"2023-07-10T11:21:54","date_gmt":"2023-07-10T16:21:54","guid":{"rendered":"https:\/\/www.webscale.com\/?p=269196"},"modified":"2023-12-29T15:31:00","modified_gmt":"2023-12-29T20:31:00","slug":"building-for-the-inevitable-next-cloud-outage-part-1","status":"publish","type":"post","link":"https:\/\/www.webscale.com\/blog\/building-for-the-inevitable-next-cloud-outage-part-1\/","title":{"rendered":"Building for the Inevitable Next Cloud Outage \u2013 Part 1"},"content":{"rendered":"

The following is based on a talk by Pavel Nikolov of Section (acquired by Webscale in 2023)<\/a> at the KubeCon+CloudNativeCon Europe 2022 event. This first post will discuss the challenges in building for the next cloud outage. Part Two will demonstrate how to deploy a Kubernetes application across clusters in multiple clouds and regions with built-in failover to automatically adapt to cloud outages. You can also read Pavel\u2019s column on this topic in TechBeacon<\/a>.<\/em><\/p>\n

Every few months we read about the widespread impact of a major cloud outage. These events are unpredictable and inevitable, and, quite frankly, keep site reliability engineering (SRE) teams up at night. No matter your type of business, it is prohibitively expensive to deploy your applications everywhere around the world at the same time while still ensuring high availability.<\/p>\n

Public cloud remains the most popular data center approach among the cloud native community, with multi-cloud growing in adoption. However, adopting a multi-cloud strategy isn\u2019t as simple as hitting the \u201cgo\u201d button. What\u2019s more, despite best efforts at building out redundancy, the cloud providers cannot guarantee 100% uptime. As such, it\u2019s not a question of if your servers or services will go down but rather when. And it will probably happen when you are either not prepared or least expect it (hello middle of the night support calls).<\/p>\n

This is true for a number of reasons. For one, there are external factors, such as your Domain Name System (DNS) going down or upstream internet provider connectivity issues, that are outside the control of the public clouds. Then, too, there are the human factors involved, like when we make mistakes in code deployment that can be difficult to roll back. Of course, there are also natural disasters that can take down entire regions or cause significant headaches for services around the globe.<\/p>\n

As a result, organizations spend a significant amount of time and money prepping disaster recovery plans while preparing for that next inevitable cloud outage.<\/p>\n

Disaster Recovery to the Rescue (maybe)<\/h3>\n

The vast majority of organizations fall into one of four disaster recovery categories when it comes to responding to an outage:<\/p>\n

    \n
  1. Active \/ active deployment strategy:<\/strong> If your primary server goes down, you flip the switch on your DNS and your request goes to a second active server. While this is the fastest and least-disruptive disaster recovery, you\u2019re among the lucky few if your IT budget supports this option!<\/li>\n
  2. Active \/ passive deployment strategy:<\/strong> This is very similar to active \/ active but it\u2019s cheaper because you\u2019re not paying for the hosting of the passive instance or cluster when you\u2019re not using it. However, you have to spin up the passive instance and flip the switch on your DNS before service is restored, delaying the return to service.<\/li>\n
  3. Periodic backup of your databases:<\/strong> In this instance, when your service goes down you must first spin up your code, restore the backups, and then continue serving as normal. While viable, this should not be considered a rapid response and can potentially extend service outages over more than 24 hours. The only thing worse is\u2026<\/li>\n
  4. No disaster recovery strategy:<\/strong> Truth be told, far too many organizations fall into this category. It\u2019s understandable; you\u2019re busy building features and don\u2019t have time to think about disaster recovery. When something happens, you\u2019ll figure it out!<\/li>\n<\/ol>\n

    The challenge with any of these disaster recovery strategies (except for the fourth one, of course) is that they require a high level of discipline. Your entire team needs to understand what will happen and know what they must do when an outage occurs, and even the best laid plans will likely require some level of human intervention to restore service. In addition, as you add new features or components to your system, you\u2019ll need to test your disaster recovery plan to account for changes that have occurred. Ideally, this should happen at least every quarter \u2013 preferably every month \u2013 and it\u2019s easy to get caught up in our day-to-day delivery deadlines, putting off review of the disaster recovery plan until it\u2019s too late.<\/p>\n

    Multi-Cluster Disaster Recovery<\/h3>\n

    Since you\u2019re reading this blog, let\u2019s assume you\u2019re running a modern Kubernetes containerized application. Let\u2019s further assume that your application is running on multiple distributed clusters to maximize availability and performance. How does that impact disaster recovery?<\/p>\n

    Just because you have multiple clusters does not mean automatic failover during an outage. The culprit is often DNS. First off, DNS servers can (and often do) become unavailable. But even if the servers themselves don\u2019t go down, DNS configuration can cause problems during outages. DNS uses TTL (time to live) settings to handle routing, and the problem is that there is no guarantee that, worldwide, all providers will honor your TTL. This can effectively mean that distributed clusters are available but effectively invisible during an outage.<\/p>\n

    But what if there was another approach to disaster recovery? In our next post we\u2019ll discuss a strategy using BGP + Anycast to significantly improve availability and recovery. If you\u2019re eager to jump ahead, feel free to watch Pavel\u2019s KubeCon talk<\/a>.<\/p>\n

    Webscale CloudFlow\u2019s Cloud-Native Hosting Solution Addresses Reliability (and much more)<\/h3>\n

    On the other hand, if you need a solution today, why not turn to Webscale CloudFlow? As we know all too well, outages will happen eventually. It can be prohibitively expensive and labor intensive to maintain disaster recovery strategies for your organization. Fortunately, Webscale CloudFlow offers a wide range of Cloud-Native Hosting solutions that address the complexity of building and operating distributed networks. The complexities of routing across multi-layer edge-cloud topologies are perhaps the most daunting when it comes to building distributed systems. This is why organizations are increasingly turning to solutions like Webscale CloudFlow that take care of this for you.<\/p>\n

    In particular, Webscale CloudFlow\u2019s Kubernetes Edge Interface (KEI)<\/a>, Adaptive Edge Engine (AEE)<\/a> and Composable Edge Cloud (CEC)<\/a> work together to improve application availability. With KEI you can set policy-based controls using simple commands in tools like kubectl that control, among other things, cluster reliability and availability. AEE uses advanced artificial intelligence to interpret those commands and automatically handle configuration and routing in the background. Finally, Webscale CloudFlow\u2019s Composable Edge Cloud features a heterogeneous mix of different cloud providers worldwide, ensuring application availability even when a provider network goes down.<\/p>\n

    To learn more, get in touch<\/a> and we\u2019ll show you how the Webscale CloudFlow platform can help you achieve the reliability, scalability, speed, security or other custom edge compute functionality that your applications demand.<\/p>\n

    Read Part 2 \u00bb<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"

    The following is based on a talk by Pavel Nikolov of Section (acquired by Webscale in 2023) at the KubeCon+CloudNativeCon Europe 2022 event. This first post will discuss the challenges in building for the next cloud outage. Part Two will demonstrate how to deploy a Kubernetes application across clusters in multiple clouds and regions with […]<\/p>\n","protected":false},"author":36,"featured_media":269286,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_et_pb_use_builder":"","_et_pb_old_content":"","_et_gb_content_width":"","_aioseo_description":"","_aioseo_keywords":"","_aioseo_og_article_section":"","_aioseo_og_article_tags":"","_aioseo_og_description":"","_aioseo_og_title":"","_aioseo_title":"","_aioseo_twitter_description":"","_aioseo_twitter_title":"","_author_photo":"field_6513304084a08","_doc_url":"","_dp_original":"269192","_et_autogenerated_title":"","_et_body_layout_enabled":"","_et_body_layout_id":"","_et_builder_dynamic_assets_loading_attr_threshold":"2","_et_builder_module_features_cache":[],"_et_builder_version":"","_et_default":"","_et_enabled":"","_et_footer_layout_enabled":"","_et_footer_layout_id":"","_et_header_layout_enabled":"","_et_header_layout_id":"","_et_pb_ab_current_shortcode":"","_et_pb_ab_subjects":"","_et_pb_built_for_post_type":"","_et_pb_custom_css":"","_et_pb_enable_shortcode_tracking":"","_et_pb_excluded_global_options":"","_et_pb_first_image":"","_et_pb_gutter_width":"","_et_pb_module_type":"","_et_pb_page_layout":"et_right_sidebar","_et_pb_page_z_index":"","_et_pb_post_hide_nav":"default","_et_pb_row_layout":"","_et_pb_show_page_creation":"","_et_pb_show_title":"on","_et_pb_side_nav":"off","_et_pb_static_css_file":"","_et_pb_truncate_post":"","_et_pb_truncate_post_date":"","_et_post_bg_color":"#ffffff","_et_post_bg_layout":"light","_et_template":[],"_et_theme_builder_marked_as_unused":"","_et_use_on":"","_gallery_link_target":"","_global_colors_info":"","_lh_copy_from_url-original_file":"","_version_history":"","_wp_old_date":["2023-06-25"],"_wpcode_auto_insert":"","_wpcode_auto_insert_number":"","_wpcode_conditional_logic":[],"_wpcode_conditional_logic_enabled":"","_wpcode_library_id":"","_wpcode_library_version":"","_wpcode_location_extra":"","_wpcode_note":"","_wpcode_priority":"","_wpcode_shortcode_attributes":[],"_wpmf_gallery_custom_image_link":"","ao_post_optimize":[],"author_photo":"268246","doc_url":"","et_enqueued_post_fonts":[],"rank_math_contentai_score":{"wordCount":"100","linkCount":"0","headingCount":"100","mediaCount":"62.22"},"rank_math_description":"","rank_math_facebook_image":"","rank_math_facebook_image_id":"","rank_math_internal_links_processed":["1","1","1","1","1"],"rank_math_og_content_image":{"check":"429cdaeaa6fd567e5fb48c8a6508571a","images":[]},"rank_math_seo_score":"24","rank_math_title":"","version_history":"","wp-smpro-smush-data":[],"wp-smush-animated":"","wpmf_filetype":"","wpmf_order":"","wpmf_size":"","_":"","_bj_lazy_load_skip_post":[],"_divi_filters_post_type":"","_et_dynamic_cached_attributes":[],"_et_dynamic_cached_shortcodes":[],"_et_pb_ab_bounce_rate_limit":"","_et_pb_ab_stats_refresh_interval":[],"_et_pb_content_area_background_color":"","_et_pb_dark_text_color":"","_et_pb_light_text_color":"","_et_pb_section_background_color":"","_job_location":"","_job_locations":"","_links_to":"","_links_to_target":"","_product_image_gallery":"","_schema_code":"","_synced_version":"","_wp_attachment_context":"","_wp_attachment_image_alt":[],"_wpie_source_url":"","_yoast_wpseo_content_score":"","_yoast_wpseo_focuskeywords":"","_yoast_wpseo_metadesc":"","_yoast_wpseo_opengraph-image":"","_yst_prominent_words_version":"","inline_featured_image":[],"job_location":[],"job_locations":"","options":"","original-file":"","post_views_count":"","rank_math_analytic_object_id":"","rank_math_canonical_url":"","rank_math_focus_keyword":[],"rank_math_news_sitemap_robots":"","rank_math_primary_category":"1","rank_math_primary_ccategory":"","rank_math_primary_job_locations":"","rank_math_primary_partners_category":"","rank_math_primary_pr_category":"","rank_math_primary_press_release_year":"","rank_math_rich_snippet":"","rank_math_robots":[],"rank_math_schema_Article":[],"rank_math_schema_Organization":[],"rank_math_schema_VideoObject":[],"rank_math_shortcode_schema_s-23675683-fff5-4300-88fe-da8afc8b1bb9":"","rank_math_shortcode_schema_s-307bbc91-c6b1-41aa-950d-c50d435a949c":"","rank_math_shortcode_schema_s-63a052dbc0384":"","rank_math_shortcode_schema_s-63a052dbc039d":"","rank_math_shortcode_schema_s-63a052dbc03a6":"","rank_math_shortcode_schema_s-63a052dbc03aa":"","rank_math_shortcode_schema_s-63a052dbc03b5":"","rank_math_shortcode_schema_s-63a052dbc03ba":"","rank_math_shortcode_schema_s-63a052dbc03bd":"","rank_math_shortcode_schema_s-63b6dd7d53a96":"","rank_math_shortcode_schema_s-63b6dd7d53a9f":"","rank_math_shortcode_schema_s-63b6dd7d53aa2":"","rank_math_shortcode_schema_s-63b6dd7d53aa4":"","rank_math_shortcode_schema_s-63b6dd7d53aa7":"","rank_math_shortcode_schema_s-63b6dd7d53aa9":"","rank_math_shortcode_schema_s-63b6dd7d53aab":"","rank_math_shortcode_schema_s-63b6dd7d53aad":"","rank_math_shortcode_schema_s-63b6dd7d53aaf":"","rank_math_shortcode_schema_s-63c15fcf43311":"","rank_math_shortcode_schema_s-63c15fcf43322":"","rank_math_shortcode_schema_s-63c15fcf43325":"","rank_math_shortcode_schema_s-63c15fcf43327":"","rank_math_shortcode_schema_s-63c15fcf43329":"","rank_math_shortcode_schema_s-63c15fcf4332a":"","rank_math_shortcode_schema_s-63c15fcf4332c":"","rank_math_shortcode_schema_s-63c15fcf4332e":"","rank_math_shortcode_schema_s-63c15fcf43330":"","rank_math_shortcode_schema_s-63f52c5ed40bb":"","rank_math_shortcode_schema_s-6409f40a9b7d5":"","rank_math_shortcode_schema_s-64354a3892419":"","rank_math_shortcode_schema_s-6440158136148":"","rank_math_shortcode_schema_s-6446d2f9353ee":"","rank_math_shortcode_schema_s-6446d2f9353f3":"","rank_math_shortcode_schema_s-6447c0fe4673c":"","rank_math_shortcode_schema_s-64e4d743542d7":"","schema_code":"","smush-complete":"","smush-info":"","smush-stats":[],"synced_version":"","wpmf_remote_video_link":"","_exp":"","_inc":"","_mc4wp_settings":[],"_post-subtitle":"","_pwh_dcfh_contact_email":"","_pwh_dcfh_contact_form_id":"","_pwh_dcfh_form_fields":"","_pwh_dcfh_ip_address":"","_pwh_dcfh_page_id":"","_pwh_dcfh_read_by":"","_pwh_dcfh_referer_url":"","_pwh_dcfh_user_agent":[],"_section1_col1":"","_section1_col2":"","_section1_col3":"","_section1_col4":"","_section2_col1":"","_section2_col2":"","_section2_col3":"","_section2_col4":"","_section2_col5":"","_section2_col6":"","_section3_col1":"","_section3_col2":"","_section3_col3":"","_section3_col4":"","_section3_col5":"","_section3_col6":"","_section4_col1":"","_section4_col2":"","_section4_col3":"","_section4_col4":"","_section4_col5":"","_section4_col6":"","_section5_col1":"","_section5_col2":"","_section5_col3":"","_section5_col4":"","_section5_col5":"","_section5_col6":"","_section6_col1":"","_section6_col2":"","_section6_col3":"","_section6_col4":"","_section6_col5":"","_section6_col6":"","_select_author":"","_test":"","_wp_attachment_backup_sizes":[],"_yoast_wpseo_estimated-reading-time-minutes":[],"_yoast_wpseo_focuskw":[],"_yoast_wpseo_focuskw_text_input":[],"_yoast_wpseo_linkdex":[],"_yoast_wpseo_meta-robots-nofollow":[],"_yoast_wpseo_meta-robots-noindex":[],"_yoast_wpseo_primary_category":[],"_yoast_wpseo_title":[],"_yoast_wpseo_wordproof_timestamp":"","exp":"","inc":"","post-subtitle":[],"rank_math_schema_BlogPosting":[],"section1_col1":"","section1_col2":"","section1_col3":"","section1_col4":"","section2_col1":"","section2_col2":"","section2_col3":"","section2_col4":"","section2_col5":"","section2_col6":"","section3_col1":"","section3_col2":"","section3_col3":"","section3_col4":"","section3_col5":"","section3_col6":"","section4_col1":"","section4_col2":"","section4_col3":"","section4_col4":"","section4_col5":"","section4_col6":"","section5_col1":"","section5_col2":"","section5_col3":"","section5_col4":"","section5_col5":"","section5_col6":"","section6_col1":"","section6_col2":"","section6_col3":"","section6_col4":"","section6_col5":"","section6_col6":"","select_author":"","test":"","footnotes":""},"categories":[1,124],"tags":[],"acf":[],"_links":{"self":[{"href":"https:\/\/www.webscale.com\/wp-json\/wp\/v2\/posts\/269213"}],"collection":[{"href":"https:\/\/www.webscale.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.webscale.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.webscale.com\/wp-json\/wp\/v2\/users\/36"}],"replies":[{"embeddable":true,"href":"https:\/\/www.webscale.com\/wp-json\/wp\/v2\/comments?post=269213"}],"version-history":[{"count":0,"href":"https:\/\/www.webscale.com\/wp-json\/wp\/v2\/posts\/269213\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.webscale.com\/wp-json\/wp\/v2\/media\/269286"}],"wp:attachment":[{"href":"https:\/\/www.webscale.com\/wp-json\/wp\/v2\/media?parent=269213"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.webscale.com\/wp-json\/wp\/v2\/categories?post=269213"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.webscale.com\/wp-json\/wp\/v2\/tags?post=269213"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}