Thursday, October 23, 2014

Ottawa shooting brings about Police State?


RESOLVED: Missing NETLOGON and SYSVOL shares on Azure domain controller

So I recently built a VM on Azure in Southeast Asia to do some testing for a customer. I had a few issues connecting the server to AD but eventually figured out Windows Firewall and a bad DNS configuration was the culprit. Soon after joining and promoting the box to a DC, I started my Lync Server 2013 installation.

This was when things went sideways...

I noticed the Lync Server tossing errors about AD being "partially" prepared which was odd since I had a well built out environment in North America. I thought it might have something to do with latency between Singapore and North America but soon realized that wasn't the issue. In an attempt to install the local configuration store it wouldn't get past the first page and produced an error saying something about not being able to find the configuration store. However, running a Get-CsManagementConnection did return the SCP for SQL.

I moved on to basic troubleshooting of the DC by running a "dcdiag" which revealed errors about the server being unsuitable and another error relating to a missing NETLOGON share. Ah hah!! I also noticed SYSVOL wasn't there as well. Coupled with those two were countless errors about DFS replication in Event Viewer. So after some digging I found the following worked for me:

Follow the instructions in this KB article: http://support.microsoft.com/kb/947022

In doing so, the SYSVOL share magically appeared!

Once the share was created I restarted the NETLOGON service only to notice the following in Event Viewer:


I manually created the "scripts" and "policies" folders in File Explorer and restarted the NETLOGON service again.

This worked and my Lync installation continued without issue!

FIXED: Lync Server 2013 pre-requisites fail to install on Windows Server 2012 R2 in Azure

So I've noticed recently that builds of Windows Server 2012 R2 which are pre-patched in Azure IaaS fail on the pre-requisite installation for Lync Server 2013. This also became apparent on a SQL Server 2012 installation as well and they're related to the same issue.

During the feature installation process we attempt to install .NET Framework 3.5 however the error you receive is that the 'source' location wasn't specified or can't find the files.


Warning seen from the wizard when installing the feature:


How to resolve: 

Install the following patch: http://support2.microsoft.com/kb/3005628 

...and retry your feature installation again.

Install Lync Server 2013 pre-reqs:

Install-WindowsFeature RSAT-ADDS, Web-Server, Web-Static-Content, Web-Default-Doc, Web-Http-Errors, Web-Asp-Net, Web-Net-Ext, Web-ISAPI-Ext, Web-ISAPI-Filter, Web-Http-Logging, Web-Log-Libraries, Web-Request-Monitor, Web-Http-Tracing, Web-Basic-Auth, Web-Windows-Auth, Web-Client-Auth, Web-Filtering, Web-Stat-Compression, Web-Dyn-Compression, NET-WCF-HTTP-Activation45, Web-Asp-Net45, Web-Mgmt-Tools, Web-Scripting-Tools, Web-Mgmt-Compat, Windows-Identity-Foundation, Desktop-Experience, Telnet-Client, BITS

Tuesday, October 7, 2014

Unwinding a cocked-up Azure deployment

First off, for those of you who believe my title is vulgar, read this. For everyone else, read on...

I started deploying Virtual Machines (VMs) to Azure only recently and found myself giddy as a school boy anticipating what I could build, however this quickly descended into a befuddling exercise leaving one heck of a mess in my Azure portal. As with any software offering with many knobs and buttons, and very little in the way of a traditional 'wizard' to walk new admins through the right way forward, you can get yourself in trouble quickly.

Working with customers in this area I've seen many different ways to deploy Cloud Services, VNets, VMs, and Storage Accounts. Because little guidance exists on how these various components impact each other, people generally forge ahead with little planning and eventually make mistakes.

This post will help demystify the core components of Azure IaaS and give you guidance on how to fix a cocked-up deployment. After all, you need to know what you're dealing with before you attempt to fix it. We'll begin by identifying and describing the building blocks of an Azure IaaS deployment:

1. Naming Conventions

This topic seems somewhat self explanatory however it's worth mentioning as it will make dealing in PowerShell easier and easier to adopt your topology by other Azure Admins. Consider using a format such as this:

[object name]-[location]-[data]

For example, if I'm creating a Virtual Network in South Central US with a network starting at 10.2.x.x then I might use:

vnet-southcentralus-10.2

Key Takeaway: Certainly don't take my suggestion as the 'right' way to do this either. I'm still working out and perfecting my own ideas on how descriptive these objects should be. You'll probably start with something similar, change it, then change it many more times before you get it right.

2. Virtual Networks

From my perspective vNETs are one of the most important components to get right so I'll start with them first. Like most people I just wanted to get started building VMs in Azure so I didn't pay much attention to this aspect. I can't say it enough now though; planning your vNET topology is foundational to a successful implementation!

Virtual Networks in Azure define the access boundary and the geographic location of your virtual machines. Start with planning your network address space you'll use in Azure including any segmentation via subnets.

 

Key takeaway: Create your vNETs first, define your address range, then carve out your subnets. Any VM on any subnet in the same vNET will be able to talk (route) without any intervention from you.

If you need VMs in different geographic regions, create those vNETs in those regions first. You can perform vNET to vNET routing through the use of Dynamic Routing Gateways (http://msdn.microsoft.com/en-us/library/azure/dn690122.aspx).

3. Storage Accounts

Since your VMs will need a place to reside, you should create the location in advance even though it can be created at the time you build your VM. Refer to my first point above around naming conventions for a reason why. You likely want to have control over what these objects are called vs. the Azure service giving a cryptic name like "p12asgd798asfg98weiop" as a storage account. You also get the opportunity to be in control of where that account is created geographically.


Once your Storage Account is created, add a container for your VHD files (i.e. vhds) to hold your VM disks.

Key takeaway: If you're building VMs in East and West US datacenters, create the associated storage accounts in those datacenters as well.

4. Disks

In Azure IaaS Disks are somewhat self explanatory as they are built on the basis of a VHD file and can be attached to a VM during creation time. If you delete a VM and want to recreate it, you'll need to define a disk based on the VM before doing so....more on this later.

5. Cloud Services

This mysterious object has been at the root of many bunged up deployments and remains for many people a source of frustration in terms of understanding its importance and relevance to the overall architecture. The Cloud Service is basically a container for similar VMs which a business would define as a "service". Let's take Lync Server 2013 for example...

In a Lync environment you have Front-End, Back-End (SQL), File shares, Edge servers, Office Web App servers, and Mediation servers. An Azure admin running a Lync pool would conceivably put all these VMs into a single Cloud Service. You might decide to have a Cloud Service for "core" bits such as Active Directory servers, and yet another for client VMs or development boxes. The point is, each Cloud Service should encase all that makes up that defined service. Why? Well you can start and stop a cloud service to bring an environment online/offline depending on need easier than any other method.


Be aware though that you can only perform one VM operation for each Cloud Service such as a start/stop/update. For this reason I've seen some people create a Cloud Service for each VM which can have long term consequences in terms of the number of CS objects and overall management overhead.

6. Virtual Machines

Now we arrive at the last item; the Virtual Machine itself. Up to this point we've defined everything we need to establish the basis for a solid Azure environment in which the VM will reside. If you've planned the environment to your liking, read no further.


The Virtual Machine will consume disk and network (vNET) resources in the geographic region you've created them in. Remember, the vNET in "West US" should have a Storage Account in "West US", making the VM reside in "West US" after all.

Now on to what we came here for....

Now...how to fix your Azure environment

I'll give you an example of what I did to resolve my many issues. Some of which you may want to implement and some you may simply toss away. My problem stemmed from the lack of conceptual understanding of the geographic placement of vNETs and Cloud Services. I wanted some VMs to run in the West coast datacenter and some in the East. My storage accounts were all in the West and some of my networking was East and some was West. A mess really....

To fix it I followed these "simple" steps:
  1. Define my naming convention for networks, VMs, storage accounts, etc.
  2. Build vNET and define address range including all subnets making sure no overlap existed with my existing environment.
  3. Build my storage accounts and set up my "vhds" container.
  4. Create my Cloud Service objects.
  5. Stopped my VMs
  6. Copied the VHD files to the new storage accounts (renaming them along the way). For more information on how to do this check out: http://jasonshave.blogspot.ch/2014/09/how-to-copy-vhd-file-between-storage.html
  7. Delete the existing VMs making sure to KEEP the attached disk objects.
  8. Create new disk objects based on the new home location of my VHD files from step 6 above.
  9. Recreate my deleted VM by choosing the "New VM", "From Gallery", then selecting "My Disks" to locate the objects from step 8 above.
  10. Be sure to deploy the VM into an existing Cloud Service, choose the correct vNET, and Storage Account.
  11. Repeat this for all your VMs!
While this seems like a daunting task, consider this a lesson learned. I know I did!!

Friday, October 3, 2014

Here is something strange....HP gets 10-year ban in Canada

HP gets a 10-year ban in Canada for corruption charge and subsequent conviction. Try going to cnn.com and search for the story or anything close to it.

You won't find anything. Hmmmm. Strange?

This isn't an old story. It happens to be written on a Canadian web site last Friday!

Thursday, September 18, 2014

HOW TO: Copy VHD file between storage accounts in Azure

I recently attempted to rebuild and reconstruct my Azure IaaS environment and had to figure out an easy way to copy and keep track of my progress. The following script was created to copy data between two storage accounts in my Azure environment. It will display progress throughout the copy process.

Requirements: Azure PowerShell module (http://azure.microsoft.com/en-us/documentation/articles/install-configure-powershell/)

Note: The copy process may complete sooner than expected as a 125GB vhd file which is only full of 60GB of data will appear to complete at the 50% mark.



Start by pinning the Azure PowerShell module to your taskbar once you've installed it. You can right-click on the link and choose "Run ISE as Administrator" and this will open an editor window where you can paste in the script below:

##Set variables for the copy
$storageAccount1 = "source storage account here"
$storageAccount2 = "destination storage account here"
$srcBlob = "source file (blob)"
$srcContainer = "source container"
$destBlob = "destination file (blob)"
$destContainer = "destination container"

##Get the key to be used to set the storage context
 
 

$srcStorageAccountKey1 = Get-AzureStorageKey -StorageAccountName $storageAccount1
$srcStorageAccountKey2 = Get-AzureStorageKey -StorageAccountName $storageAccount2

##Set the storage context
 
 
$varContext1 = New-AzureStorageContext -StorageAccountName $storageAccount1 -StorageAccountKey ($srcStorageAccountKey1.Primary)

$varContext2 = New-AzureStorageContext -StorageAccountName $storageAccount2 -StorageAccountKey ($srcStorageAccountKey2.Primary)

##Start copy operation
 
 
$varBlobCopy = Start-AzureStorageBlobCopy -SrcContainer $srcContainer -Context $varContext1 -SrcBlob $srcBlob -DestBlob $destBlob -DestContainer $destContainer -DestContext $varContext2

##Get the status so we can loop
 
 
$status = Get-AzureStorageBlobCopyState -Context $varContext2 -Blob $destBlob -Container $destContainer

While($status.Status -eq "Pending"){

$status = Get-AzureStorageBlobCopyState -Context $varContext2 -Blob $destBlob -Container $destContainer

[int]$vartotal = ($status.BytesCopied / $status.TotalBytes * 100)
[int]$varCopiedBytes = ($status.BytesCopied /1mb)
[int]$varTotalBytes = ($status.TotalBytes /1mb)

$msg = "Copied $varCopiedBytes MB out of $varTotalBytes MB"

$activity = "Copying blob: $destBlob"

Write-Progress -Activity $activity -Status $msg -PercentComplete $varTotal -CurrentOperation "$varTotal% complete"

Start-Sleep -Seconds 2

}

Some things to remember...
  • An aborted or failed file copy may leave a zero byte file on your destination container. Use the GUI or both the "Get-AzureStorageBlob" and "Remove-AzureStorageBlob" commands to view and remove the failed data. Failure to do this will result in a hung PowerShell session.
  • Making a copy of a file in the same container is basically instantaneous.

Understanding client sign in behavior before and after a failure

In this post I'll cover the changes in Lync 2013's sign in process and discuss the various scenarios which could result in an outage or client-facing impact to the user experience.

If you've studied the differences in the sign-in behavior of Lync 2010 versus Lync 2013 you'll know the new "lyncdiscover" process improves the discovery of your pool's Edge server. For those of you who have not studied this change, I'll explain it briefly here.

How lyncdiscover works:
Expanding on Lync Server 2010's Mobility feature which came as a Cumulative Update (CU), the lyncdiscover process involves the publishing of two DNS names pointing to either your reverse proxy (when Internet facing) or to your front-end server/HLB VIP (when LAN facing).

The Lync 2013 client will query lyncdiscoverinternal.contoso.com first and if it resolves and can connect, it assumes your endpoint is on the inside of the network (LAN). If the host name is unresolvable, we query lyncdiscover.contoso.com and connect. In either case we if authenticate, an XML response is given to the endpoint containing the user's home pool SIP proxy and internal pool name among other things.

We then attempt a connection based on the response given in the XML document and cache the successful connection in a file called "EndpointConfiguration.cache' located in the user's profile.

The key takeaway is that the response is customized to your authenticated query and cached by the client. This is important when you have multiple Edge servers in your topology capable of proxying remote users to internal pools since you now have the ability to direct users to their regional Edge resources.

NOTE: This behavior is observed in the Lync 2010/2013 mobility clients, the Lync 2013 desktop client, and the Modern UI client for Windows 8.x. However only the desktop client will fall back to SRV lookup.

How does this compare with Lync 2010?
When we compare this capability to Lync 2010, the client would always query a DNS SRV record first (i.e. _sip._tls.contoso.com) returning a static response. You could publish multiple SRV records with varying weights however this would always return a static ordered list. The result would yield all users in the organization signing into the same Edge proxy unless an outage occurred.

The following TechNet article explains the behavior well: http://technet.microsoft.com/en-us/library/gg398758.aspx

Failure scenarios and recovery options:
So you've done your reading about the sign in process and differences between Lync 2010 and 2013 and by now you're wondering what happens if the various components fail.

Failure Behavior Recovery Options
Query to lyncdiscover fails Fallback to SRV lookup Use GeoDNS solution to provide a response to the query
SRV lookup fails Query another SRV record in DNS Plan for adding a DNS SRV record for each Edge proxy adjusting weights to give priority
Next hop from Edge proxy fails Client sign in will fail No automatic recovery is possible.

Example scenario #1:
Let's say you have a front-end pool and Edge pool in New York and London. All systems are up except for the Edge in New York. Alice is a user homed on the New York pool and is connected externally with the Lync 2010 client. She connects for the first time and queries the DNS SRV record "_sip._tls.contoso.com" and receives a weighted response with New York's Edge proxy in priority sequence followed by London's Edge proxy. Alice queries the FQDN provided in the SRV record and attempts a connection to New York's Edge however this proxy is down. The Lync 2010 client will use the second, higher weighted DNS SRV record and attempt a connection to London's Edge proxy. Alice's connection is proxied to the next hop server which is London's front-end pool which will also proxy her connection to New York's front-end pool. Lastly, the Lync client caches the London Edge proxy and internal registrar from New York in her cache (EndpointConfiguration.cache file).

Now in this scenario Alice has signed and she will appear fully functional however she won't be able to establish media (voice/video/desktop or app sharing) with a user who doesn't have a public IP or an IP on the same LAN. There are a couple of reasons for this in play; one of which has to do with the New York pool's association with an Edge pool for media. The media relationship is established in the Lync topology file when the administrator created the pool and made a static association. Lync users authenticating to the pool will receive a Media Relay Access Server (MRAS) token which will be used in the STUN/ICE/TURN candidate exchange/negotiation making it possible to establish media sessions across NAT devices. If the Lync Edge pool in New York is down, the front-end pool cannot receive an MRAS token for the user, as such there will be no relay candidate for media establishment. When the Lync client negotiates media with another endpoint/user a check is first performed using the host IP (LAN/WiFi) to see if the two users are on the same network or are at least both directly reachable (both using public IP's). In the case where both IP's are routable, media negotiation should succeed.

Example scenario #2:
Given the above example topology of New York and London sites, assume all infrastructure is operational with the exception of the London pool which incurred in outage overnight. Bob is a user homed in the London pool and is connecting externally using the Lync 2010 client. This isn't Bob's first time connecting so we look at the EndpointConfiguration.cache file for the internal pool name/IP and attempt a connection since we don't know if Bob is internal or external. This will fail since Bob is Internet-facing in this scenario. The Lync client will use the same cache file and attempt a connection to the London Edge proxy/IP which will succeed however since the Edge server cannot communicate with it's next hop (London pool), his sign in fails.

It's important to understand there is no automatic recovery option available here. Since the connection to the Lync Edge in London was successful, but the next hop pool is down, Bob's Lync client will not fall back to a DNS SRV lookup. The Lync 2013 client behavior is the same given this scenario regardless of an administrator initiating a pool failover.

So you're probably wondering, if Bob was the CTO and you absolutely needed him logged in, how could you make this happen? Well you could obviously walk him through the manual configuration of connecting Lync to the New York Edge server by specifying the Edge FQDN (i.e. lync2013nyc-access-edge.contoso.com:443) however the next hop front-end pool wouldn't be able to register the user until a pool failover was invoked. This puts pressure on the Lync administrator to invoke the pool failover command. Ultimately some intervention is required and this may include shutting down the VIP for the Edge pool or the servers themselves. Alternatively you could change the next hop to an available server and publish the topology. Just remember your firewall rules need to permit inbound TCP/5061 to the next hop.

Sticky Edge?
As I dug into this further for a customer of mine recently I thought about the scenario where a Lync remote user has signed into an Edge server which wasn't their "home Edge". In Alice's case I thought the London Edge proxy would be cached in the EndpointConfiguration.cache file making it nearly impossible for her to ever connect back to her home Edge proxy even if the service was restored.

Consider Alice's scenario again but this time on a large scale.....you have a large population of 150,000 users signing into an Edge pool in London due to a failure in New York which resulted in their DNS SRV lookup signing them into the London Edge pool. In what case would they ever sign back into the New York proxy if the Lync client stored/cached their Edge proxy sign in? In a worst case scenario you could potentially have the majority of 150,000 users signing into London's Edge pool along with the London population without any realistic way of getting them back to New York! In what scenario would they ever sign back into New York???

The Good News!
Well the good news is that we "expire" Edge proxy information in the EndpointConfiguration.cache file if it has been more than 24 hours since the last access time of the file. The same is true for internal registrar pools if the time has been greater than 14 days. This means if Alice, being a New York user, finds the London Edge through DNS SRV query and signs in successfully, she will stay signed into the London Edge for 24 hours no matter how many times she signs in or out regardless of the New York Edge server's state. Once this time has expired, we will query Lyncdiscover again and find the New York Edge and sign Alice into that proxy if available.

Some Guidance...
If you require full redundancy of remote user sign in between two geographic locations with automatic failover and sign in, you won't ever achieve this. The biggest hurdle is the next hop from Edge to next proxy or registrar and even from TMG (or similar reverse proxy) to its next hop. The best you can do is make the DNS names available through GeoDNS/GSLB and ensure the Lync infrastructure is redundant (Enterprise Pools with redundant servers/NICs/etc.).
  • Use sub-domains for your web farm FQDNs (i.e. externalpoolname.lyncweb.contoso.com). Using something like "lyncweb" as a sub from your SIP domain allows you to delegate the subdomain to GeoDNS.
  • Publish DNS SRV records for all possible Edge proxies for external users.
  • Publish DNS SRV records for all possible internal registrars for internal users.
  • Test everything!
Hopefully this sheds some light on the external access behavior of the various failure scenarios. Cheers!