Have a JSON headache in KQL? Try mv-expand or mv-apply 15TH MAY 2023/MZORICH One of the more difficult things to learn in KQL (apart from joining tables together) is how to deal with multi-value sets of data. If you work in particular types of data, such as Azure AD sign in data, or Security Alert data, you will see lots of these data sets too. There is no avoiding them. What do I mean by multi-value? We are talking about a set of data that is a JSON array and has multiple objects within it. Those objects may even have further nested arrays. It can quickly get out of hand and become difficult to make sense of. Let’s look at an example. Azure AD Conditional Access policies. Each time you sign into Azure AD, it will evaluate all the policies in your tenant. Some it will apply, and maybe enforce MFA, some may not be applied because you aren’t in scope of the policy. Others may be in report only mode or be disabled entirely. At the end of your sign in, the logs will show the outcome of all your policies. The data looks a bit like this: We get this massive JSON array, then within that we get an object for each policy, showing the relevant outcome. In this example we have 6 policies, starting from 0. Within each object, you may have further arrays, such as the ‘enforcedGrantControls’. This is because a policy may have multiple controls, such as requiring MFA and a compliant device. You can have a look in your own tenant simply enough by looking at just the Conditional Access data. SigninLogs | take 10 | project ConditionalAccessPolicies Where multi-value data can get tricky is that the order of the data, or the location of particular data can change. If we again take our Conditional Access data, it can change order depending on the outcome of the sign in. Any policies that are successful, such as a user completing MFA, or a policy that failed, by a user failing MFA, will be moved to the top of the JSON array. So, when I successfully complete MFA on an Azure management web site, the ‘CA006: Require multi-factor authentication for Azure management’ (as seen above) policy will go to the top of the array. When I sign into something other than Azure, that policy will be ‘notApplied’ and be in a different location within the array. Why is this a problem? KQL is very specific, so if we want to run an alert when someone fails ‘CA006: Require multi-factor authentication for Azure management’ we need to make sure our query is accurate. If we right-click on our policy and do the built-in ‘Include’ functionality in Sentinel: It gives us the following query: SigninLogs | project ConditionalAccessPolicies | where ConditionalAccessPolicies[3].displayName == “CA006: Require multi-factor authentication for Azure management” We can see in this query that we are looking for when ConditionalAccessPolicies[3].displayName == “CA006: Require multi-factor authentication for Azure management”. The [3] indicated shows that we are looking for the 4th object in our array (we start counting at 0). So, what happens when someone fails this policy? It will move up the JSON array into position 0, and our query won’t catch it. So how do we deal with these kinds of data? I present to you, mv-expand and mv-apply. mv-expand mv-expand, or multi-value expand, at its most basic, takes a dynamic array of data and expands it out to multiple rows. When we use mv-expand, KQL expands out the dynamic data, and simply duplicates any non-dynamic data. Leaving us with multiple rows to use in our queries. mv-expand is essentially the opposite of summarize operators such as make_list and make_set. With those we are creating arrays, mv-expand we are reversing that, and expanding arrays. As an example, let’s find the sign-in data for my account. In the last 24 hours, I have had 3 sign-ins into this tenant. Within each of those, as above, we have a heap of policies that are evaluated. I have cut the screenshot off for the sake of brevity, but I can tell you that in this tenant 22 policies are evaluated on each sign in. Now to see what mv-expand does, we add that to our query. SigninLogs | project TimeGenerated, UserPrincipalName, Location, ConditionalAccessPolicies | mv-expand ConditionalAccessPolicies If we run our query, we will see each policy will be expanded out to a new record. The timestamp, location and username are simply duplicated, because they are not dynamic data. In my tenant, I get 22 records per sign in, one for each policy. If we look at a particular record, we can see the Conditional Access policy is no longer positional within a larger array, because we have a separate record for each entry. Now, if we are interested in our same “CA006: Require multi-factor authentication for Azure management” policy, and any events for that. We again do our right-click ‘Include’ Sentinel magic. We will get the following query SigninLogs | project TimeGenerated,UserPrincipalName, Location, ConditionalAccessPolicies | mv-expand ConditionalAccessPolicies | where ConditionalAccessPolicies.displayName == “CA006: Require multi-factor authentication for Azure management” This time our query no longer has the positional [3] we saw previously. We have expanded our data out and made it more consistent to query on. So, this time if we run our query, we will get a hit for every time the policy name is “CA006: Require multi-factor authentication for Azure management”, regardless of where in the JSON array it is. When we run that query, we get 3 results, as we would expect. One policy hit per sign in for the day. Once you have expanded your data out, you can then create your hunting rules knowing the data is in a consistent location. So, returning to our original use case, if we want to find out where this particular policy is failing, this is our final query: SigninLogs