Articles Blog

Highline Excel 2016 Class 18: Clean & Transform Data: Replace, Flash Fill, Text To Columns, Formulas

Highline Excel 2016 Class 18: Clean & Transform Data: Replace, Flash Fill, Text To Columns, Formulas


Welcome to Highline Excel
2016 class video number 18. OK, if you want to
download this file– Busn218-Video18 or the PDF
file– click on the link below the video. Hey, we got to talk more about
cleaning and transforming data. Now, back in video
number 3, we already saw how to use Power Query to
clean, transform, and import data. But sometimes we
don’t need Power Query because Find and Replace,
Flash Fill, text-to-column, and formulas can do the
job more efficiently. Now, for example,
Find and Replace– if it’s a simple character, we
want to use Find and Replace– not Power Query. Flash Fill up– we have a
simple data cleaning task, Flask Fill is amazing. Text-to-columns has a few
amazing efficient examples of when we should
use text-to-columns, and not Power Query. And of course, formulas–
the hallmark of formulas is if you want the solution
to update instantly, formulas are your ticket. Now, let’s go over
to this sheet Replace and see our first example. Now here we have some
data we need to clean. Here is product codes
our system doesn’t use a dash it uses a forward slash. So we simply want to
find all the dashes and replace them
with forward slash. We can highlight to
isolate this range. Home, Editing, Find and
Select– there it is, Find. There it is, Replace. Now, the keyboard for
Find is Control-F. If you only memorize
that– because Control-F works in all systems,
including web pages and Word. But if you remember
Control-F, you can always click back
and find Replace. Now, of course, there is
a keyboard for Replace. It is Control-H. Now find what? I’m going to type a
dash, tab, forward slash. And I could say, Find Next. You might want to do that if
there were dashes, sometimes you didn’t want to replace. But in general, we’re
using this feature because we got this output
from a different system. We know there’s always dashes. So we simply click Replace All. And instantly, it
says replaced all 14. Click OK. Click Close. Now over here, I
want to try this with Flash Fill, which
is another way we could solve this problem. In our prerequisite
class we used Flash Fill. We haven’t seen it
in this class yet. But it is amazing. I’m going to type Carlota
and then simply type my forward slash 3039 Enter. And as soon as I start
to type the next example, if there is a consistent
pattern Flash Fill gives us this ghost
list with suggestions. As soon as I hit Enter– boom! The column is filled. Now, would we use Flash
Fill or Find and Replace? In general, if it’s a simple
single character like this, I think I could use
Find and Replace more quickly than I could
type everything out and use Flash Fill. But both will work. Now I’m going to delete this. We can also use Find
and Replace, Control-H, with not single characters
but multiple characters. So if I wanted to find the
dash and I wanted to replace it with space, forward
slash, space– now, I can click Replace All. 14 replacements. Click Close and boom! There it is. So Find and Replace,
what does it do? It replaces one
or more characters with a new set of one
or more characters. When to use it? When you have a simple set of
characters you want to replace. Now let’s go over to
the sheet Flash Fill. We’ll talk about Flash Fill. What does Flash Fill do? It can clean or transform data
based on an example you give it next to the data set. When do you use it? When you have a quick
one-time data cleaning task. For example, here’s
first and last names. If we simply want first–
I’m going to type M-O Enter. As soon as I type a G, I see
that ghost Flash Fill list. Enter. And there, I have
extracted first name. Now how does it do that? Microsoft calls it
program by example. That means behind the scenes, as
soon as we gave it an example, it built a little program
to extract first name. Now the reason that
this worked so easily is because there was a
simple pattern over here, a space separated the two
words in every single instance. Now I’m going to
scroll down here. Here’s a similar
example, but notice we want to get the initial
for the first name– so S, J, G, and the
last name Ripper, Mac. But notice we don’t have
a consistent pattern. So in order to get
Flash Fill to work, you have to do two things. First is you have to be familiar
with your data in the column. And you have to give it
enough examples for it to learn the pattern. Now, notice our situation is
1, 2, 3 words with 2 spaces, 3 words with 2 spaces,
2 words with 1 space. So this is a small column. So we can visually see that
there are only two situations. If you had a huge
column you would have to be familiar with
what data is in the column. Now here, I’m going to
give it S– first letter of the first name. And then Ripper, the last name. And I’m going to
choose to go down two, because there’s
only two examples. I have to give it
an example for both. So G and then Tran, Enter. Now notice there’s
no ghost list. So we’re going to have to
force Flash Fill to look to the left– look
at our examples by going to Data ribbon tab. And there it is, Flash Fill or
there’s the keyboard Control-E. And just like that,
we have extracted first letter of the
first name and last name. We knew the data. We gave it enough examples. Flash Fill works like a charm. Now, what if we wanted the
last name and change the case? Well this one’s easy, right? Jim. Enter. And I’m just going to
use Control-E. What if we wanted the initials? M space D and I’m going to
assume that it will guess it right– Enter Control-E. Whoa! That’s one drawback
with Flash Fill, it is trying to build a
program behind the scenes and sometimes it gets it wrong. Control-Z– Delete. Let’s try this again
with a ghost list. Capital M space D, Enter. Capital G, oh, look at that. The ghost list got
it wrong again. Space, but now I’m going
to insist that we use T. And notice, it’s
got that R there. I’m going to delete it. Now when I hit Enter,
I’m pretty sure if I Control-E that will work. Sometimes you have to
mess around with it. Now here we want to
reverse the names, insert a comma and a
space, and change the case. So I’m going to type Dim
comma space M-O Enter. And I’m going to try Control-E.
And sure enough, that works. So if you have a
simple situation, Flash Fill is amazing. And here’s another
example, if we wanted to insert parentheses
and a dash for our phone number, not using custom number
formatting or something like that. I can simply type
open parentheses 206 close parentheses space
587 dash 4545 Enter Control-E. It should get that one
right, and it does. Now here’s another situation
similar to this one. If we want the last color–
if we were to just type red Enter Control-E.
Of course, it would assume we wanted the first one. So we need to give
it two examples. I’m going to give it red. And then green. And sure enough, Control-E,
it should get it right. We gave it enough examples,
meaning red green. It knew what to do, which was
get everything after the dash. Another example of when
Flash Fill is just awesome. We have asset ID and in order
to import this into the system it requires a lead apostrophe
so that it knows that its text. I’m going to type a lead
apostrophe 90513 Enter. And this is simple enough. Control-E. And sure enough,
if you click in every cell, you can see it’s got
that lead apostrophe. Another example of where Flash
Fill is better than anything else. We’re not using Power
Query or formulas. This is simply
something I dumped and all I want is the number. It’s consistent enough. I type 1574 Enter
Control-E. And boom! There it is. All of the views extracted. Now let’s go look at an example
where Flash Fill might not be so good. Here we have some
comma separated data. The delimiter is a comma. I have 1, 2, 3 fields. If I wanted to separate this
out into three separate columns, I could try to use Flash Fill. Carlota Enter. I’m going to Control-E
because this one’s easy. This is, hey, get
everything before the comma. But watch what happens for date. If I type 05 slash
05 slash 2016 Enter. If I were to Control-E, right
here– what in the world? Dates are known to have
a problem with Flash Fill when it has leading zeros. Control-Z. The only way you
can use Flash Fill that I know of in this situation
is you’d actually have to highlight all of the
cells, Control-1 and pre-apply the right number formatting. So Custom and I’d have to come
here and do mm for months, with a leading zero slash,
dd, leading zero for day, slash yyyy Enter. And now if I come here and
Control-E, it knows what to do. That’s a lot of hassle. If we were still using Flash
Fill to try and separate this out, the region right
here, if I typed west and Enter Control-E. Of
course, that will work. And if this was our goal to
get product, date, and region we could certainly
do it that way. Our next example,
we’re going to learn there’s a feature
called text-to-columns that is much better for
this particular situation. All right, so Flash
Fill– we definitely want to use Flash Fill to clean. Not often is it
going to transform data sets, but the
cleaning part– absolutely. And it’s a quick one
time cleaning task. We don’t need it
linked like we could do with formulas or Power Query. Now let’s go over
to text-to-columns. Now what does
text-to-columns do? It splits a single
column of text into multiple columns based
on a delimiter or fixed width. Now fixed width, we almost
always can use Flash Fill. Like for example, extracting
state abbreviation– we just type those two letters,
Control-E, and we’re done. But delimiter– we just
saw in our last example how Flash Fill had
trouble with splitting the data into multiple columns. Ah, but text-to-columns
does two things that nothing else does better. It can take text dates
such as ISO dates and convert them
to serial numbers more quickly than
any other feature. And text-to-columns
can take text numbers and convert them back to numbers
faster than any other feature. And when to use? Well Flash Fill for breaking
apart certain data and Power Query for separating
and cleaning mostly replaced text-to-columns. But as we mentioned
up here, there’s a couple of really good uses. Now, let’s start over
here with the example we just did with Flash Fill. Here’s our column and we
want to break it apart based on the delimite we
have here, which is a dash. And we don’t want the second
column, which is the product. We want the department,
the date, and the amount. Now text-to-columns, you
highlight the column. We do not need a field
name for this feature. Data and there it
is text-to-columns. We can also use the
keyboard Alt-A-E. And there’s our dialog
box for text-to-columns. It’s been around for
decades and it is amazing. Step 1 is always what
is the delimiter? What is going to split the data? Either delimiter or fixed width. Delimiter. Now notice the
preview down here. Let’s click Next for step 2. Step 2 is just, what
is your delimiter? Now notice the preview. I’m going to uncheck Tab
and say Other and do a dash. And just like that, our preview
shows that it’s broken it apart into 1, 2, 3, 4. Now, one limitation
for text-to-columns is we’re only allowed
one delimiter. If we use Power Query–
earlier in the class we’ve had a Power Query
example, where our delimiter was multiple characters. In the old days,
before Power Query, we’d have to run
text-to-columns multiple times when we had multiple
delimiters or do other tricks. But here, it’s broken
it apart perfect. So step 2 is delimiter. Next. Now, step 3 gives us the
options of a couple things. We can format and we can tell
it to skip certain columns. Now let’s talk
about this format. Look at the default,
it’s General. And you can read this General
converts numeric values to numbers. And what they
really meant to say was text numbers back to
values, and date values to dates, which really should
have been text dates or ISO dates back to
serial number dates. That’s why this
feature is so amazing. The default is set up perfectly
to handle text, dates, and numbers. The other thing is when
we get to our ISO data is we can actually
use this feature and tell it what
form of date we have. Now for us, we’re just going
to leave it as General. We want to come down here. And for the second
column– notice we can click on
whichever columns we want and change
the formatting. We say Skip for
our second column. And there it is, it
says skip column. Now we’re ready to go
because this General works perfect for our three text,
date, and number columns. Now the other part to
step 3 is very important. If you want to replace
it like with ISO dates then that’s the
default. It actually says I will replace
everything in this column, but that’s not what we want. Notice the collapse box. I’m going to click inside here
and very carefully click– and I’ve already put
the field names, right? I’m going to click right
below Department in cell L16. When I click Finish,
text-to-columns that is beautiful. Now remember, we could have
done this with Power Query. But if you have a column
and all you want to do is split it apart and you don’t
need a link to the source data, text-to-columns
is simply amazing. Here are the other two great
uses for text-to-columns. I’m going to highlight this
whole column of ISO dates– year, month, day Alt-A-E.
I don’t have a delimiter so I don’t even need
to worry about step 1. I still don’t have a
delimiter so I don’t need to worry about step 2. It’s only step 3, and
here’s where I go to date and I’m going to select
year, month, day. I’m definitely going to keep
the destination as the top cell in the column I highlighted. That means it will replace
those dates when I click Finish. Boom! Just like that it
has converted those. Now, here’s text numbers. And the reason that
there is nothing that can be text-to-columns
is because if we know the keyboard,
Alt-A-E– and I’m going to do it slow
this first time. Remember, we don’t
have a delimiter here. We’re just going
to use that step 3. So I don’t need this step. I don’t need this step. And remember, General
is the default, which will convert text
numbers to proper numbers. So watch this. Escape Alt-A-E– Notice that
Alt-F will get us to finish. And the funny thing
here is that in order to get to that
underline F, we have to hit Alt-F a second time. So you’re ready? It’s Alt-A-E Alt-F and done. So text-to-columns
pretty amazing! If we want to split
a single column and we don’t need
it linked or we need to take text,
dates, or numbers and convert them back to proper
dates and proper numbers, text-to-columns rule. Now I want to go over
and talk about formulas. And I actually want to
start on formulas, too. Because we’ve done a lot
already in this class. In video number 8, we talked
about text formulas and text functions. And in video 11, we talked
about lookup functions. In both of these videos we saw
extensive examples of formulas to convert data sets. Now we had all sorts of
examples in video number 8, of extracting state with
left, zip with right, first name with left in search,
and a bunch of other examples. Including the ever
important Trim function for cleaning data. Each individual example is
an example of cleaning data. Together we, in essence,
transform this data set into a data set that we
can use for data analysis. We also very importantly
saw lookup functions. Now lookup was one
of our functions. But we saw many functions–
VLOOKUP, HLOOKUP, straight lookup. But we are converting this data
set into a different data set with expanded columns. We had looked up the price,
we calculated the revenue– just that alone is
transforming our data set into something we can use, for
example, for a pivot table. We also saw an example
where we use index and match to flip a table
upside down, which is another example
of transforming an original data set into some
new data set with formulas. But now, for our
example in this video, I want to go over to formula 1. What formulas can do? Well they can clean or
transform data sets. When to use them? When the data is
in Excel already. We don’t need to import it. If we needed to import it,
we’d be using Power Query. We don’t have big data. You don’t want to use formulas
on very large ranges of cells because then calculation
time slow down. Again, what would we use? We’d use Power Query
and PowerPivot data model when we have big data. And when do we use formulas? The hallmark for
formulas is when you want the solution
to automatically update without having to use
the Refresh button. So here’s an example. We have a template for
grading, and we always just dump our student names
right into the first column. And I want the formula
to automatically get first and last. And then I enter my
grades and so on. Here’s a perfect example,
where I want to, in essence, clean the data,
extract first and last, and have it always
automatically update. Well the first is easy. We simply say, hey, Left–
I’m going to look at that text right there. The pattern for
first name is easy, it’s everything before
the first space. So for comma,
number of characters were getting from
the left, I need to Search for a space in
double quotes within that cell. Remember back to video
8, Search tells us the position of that space. So counting from the left,
it would tell us the space is in the sixth position. That’s not how many characters
we want from the left, we want one less than
wherever the space is. So I subtract 1
close parentheses. And that’s our formula that
will always work for first name. Now a last name, when we do
not have a consistent pattern is a pretty tricky formula. Because notice, two words
with a space but sometimes we get one, two, three
words with two spaces. Now one option is to
use this huge formula. And it’s a fine formula, lots
of people use it for this. I’m going to show you a
slightly different version. Now this is going to
be conceptually kind of sideways and unusual. But it’s one of the
more efficient formulas and you can see it down here. It’s one of the more
efficient formulas I’ve seen for exactly this
situation, where we sometimes have a middle, sometimes
don’t have a middle. Now, the first thing
we’re going to do is we’re going to
use this substitute function to find the space and
insert a whole bunch of spaces. So I’m going to use substitute. The text I’m looking through
is that, comma, the old text is a space in double
quotes, comma. And now the new text– the
length of the text in the cell, but all as spaces. So I’m going to use
the Repeat function. The Repeat function can
repeat a single space, however many times you tell it. We’re going to tell it, please
look at the length of this. So it’ll always count
exactly how many characters that there are, that’s
what the length does. The repeat will then
repeat that space. And I’m going to hit
the F9 key just to look. Look at that! So now substitute will
find a space in here and always put a
count of spaces that is exactly equal to the length. Now I’m going to Control-Z. When we close parentheses
Control-Enter, we’re going to get
something ridiculous. Double click and send it down. I’m going to the second
one, F2 highlight F9. You could see that’s ridiculous! Why do we do that? Well, now we can
go from the right. And if I say Right, the
length of whatever that is, it will always end up somewhere
in this sea of spaces. And then it will have Pham
with a bunch of extra spaces and we can use
the Trim function. So Control-Z, right of all
of that text right there, comma– how many from the right? L-E-N. LEN close
parentheses close parentheses Control-Enter
double click and send it down. I actually forgot this. I should go up one. And now I have the last
name every single time. F2, I just need to give it a
haircut with the Trim function. No, no the trim function removes
all the extra spaces– close parentheses Control-Enter
double click and send it down. That is a pretty wild formula. Now if you count the
number of functions this is six functions. This one over here
is seven functions. I kind of like this one. But there it is. There’s an example of
formulas to clean our data. And were using the
formulas because we always want it to update. Next time we dump
our names here, instantly these will update. All right, in this video we
talked about cleaning mostly and a little bit
about transforming with either formulas,
in particular when we want everything
to automatically update. We saw text-to-columns,
in particular when we have text
dates or text numbers or we’re splitting data’s
based on a delimiter. We don’t need to link
like in Power Query. Flash Fill when
we’re cleaning data. And Replace when we
have a single character in a column or a range
that we need to replace. All right, now next
video we’re going to talk about the amazing
Advanced Filter feature for transforming data sets. All right, we’ll
see you next video.

14 thoughts on “Highline Excel 2016 Class 18: Clean & Transform Data: Replace, Flash Fill, Text To Columns, Formulas”

  1. The last formula I would describe as "von hinten durch die Brust ins Auge". 🙂

    https://dict.leo.org/forum/viewUnsolvedquery.php?idThread=199570

  2. just to share what trick i prefer to turn string numbers and string dates (in correct format) to their correct counterparts: you can copy an empty cell, highlight the range you want to convert and paste special while selecting add as your special. super neat upgrade to the "perform a mathematical operation / do a +0" that i learned from you 🙂

  3. Here is a problem i would love some help on. If you could tell me how or point me to the right formula to do this i would greatly appreciate it and owe you one. I've got a workbook that i've created for bidding jobs. I use a software that exports my take-off of a set of plans to an excel sheet, Examples include quanties of an item or lengths of an item. I have all this set up in a workbook so all i do is paste it on a sheet called data and it populates all my other lists for me. The problem is when i'm doing say, hours it takes to lay x length of pipe i have to type a formula in the hours column that says E3 * bla bla bla , gives you 20 hrs to lay E3 (120ft) of pipe which is than totaled at the bottom and populates my summary page where it does the rest of my bid calculations for me. I need a way to pull a formula from the data page that will then tell the hours column next to E3 how to calculate the hours it will take for a certain task. I do this manually now and it takes forever. i have a formula for the hours it takes to install x number of fittings, x length of pipe, etc. I have it as far as this right now, I can either use the index formula with the drop down lists i created to pull the answer of the formula i have typed in, which doesn't help since it's always 0 because its not calculating on the hours sheet but on the data sheet. Or i can use what i've currently got typed in to pull the formula but it won't let me add the = sign in front because it says it's an error. " =CONCATENATE("=",INDEX('VU360 Data'!L2:L6,MATCH(M3,'VU360 Data'!J2:J6,0))) " or at least anytime i try to take out the quotation marks from around the = sign it says error in formula. Tell me you know a way to accomplish this.

  4. in the column where you converted numbers to Text by using Flash Fill and putting in the tick ` mark, how would you have done this without Flash Fill?

  5. hello sir, I am from India. The trick @ 10:10 is still not working for me. It is showing few dates on the left side like @ 9:45 . Is it because of the region because we have different date format??

Leave a Reply

Your email address will not be published. Required fields are marked *