Articles Blog

Data Structures: Solve ‘Contacts’ Using Tries

Data Structures: Solve ‘Contacts’ Using Tries


Hi, I’m Gayle Laakmann McDowell, author of Cracking the Coding Interview. Today I want to talk about the contacts problem. So in this
problem we’re asked to implement a very very simple contact list and the contacts
list will need to be two different, will need to do two things. It will
need to first of all, add things to it, but then it will need to, given a string,
figure out how many contacts start with that string. So of course one way we
could do this is we could just store this in a giant list or something like
that, maybe a sorted list, and walk through and for any given string
just walk through that whole list and see, do, ask each string do you start with
this sub string, do you start with this, do you start with this? But that’ll get pretty
inefficient. This is the kind of problem though that a try is perfect for. So all we need to do essentially is just build a try for ourselves. So store all of these
people into a try and then when we get a string just walk that, and then walk through that try with that substring and then when we get to that substring and say
okay how many children do you have? So that’s the basic implementation and
that’s what we have to do. Now one little optimization is, you know, one
thing we could do is when we get some substring, find that node, and then walk
through all of its children and count how many there are. But even easier actually
is just count, let each node in the try actually store how many children it
has. It’s just, you know, that’s actually a very easy thing to keep track of as we
go and we’ll make our code actually a lot more efficient, also shorter to write
as it turns out. So we’ll go ahead and take that approach. So we’ve already started
us off here with the basic outline of how the try will work. I’ve used here, I’ve
just created a node class, I don’t need for this problem an
overall try class, so I’m just going to node class and it will store each child, and
by the way, in this character, this problem we only have the characters a to z lowercase.
It’s very very simple. So what I’ve done is I have created a simple array that
has 26 slots for each possible letter. Of course if there is no next contact that starts with that letter then it’s going to be null. You could also if you wanted
to, use a hashmap instead. And that actually has some benefits it takes up
more memory but it’s more flexible if you need to change certain
things about it. So if you found this problem really overwhelming, here’s a good place to stop and retype
this code yourself and now that I’ve outlined the basic stuff see if you can
get the rest of it. But I’ll go ahead and start writing this. So get node is going
to need to get, is going to just get the node that’s at that value, so one
function that I bet will be useful for me, and this is actually going to be static, is given a character, return the index in that array. Because I think both
getting and setting will need that. So that’s going to be done it’s actually very simple it can
actually just be done like this. But it’ll be a little easier for me to do it
that way. Now get node is just going to return children of the char index and for
set node not too dissimilar get the char index of c and set that equal to
node. Alright so what add will do, as our
next function, what add will do, and we can actually add this to give ourselves a little
initial method but what I will do is we’ll walk through recursively, and it’ll
pull off the next character, pull off each character as s is walking through it,
and then once we get to the end the end of s insert a node there, actually it will insert nodes as we go along, as necessary. So, and again we’re going to start off with the
very first character of s so if I’m at the end, if index is at the end of s then I’m
done, nothing to do. Otherwise pull off the current character of s, char at of index,
and then I’ll call this char code actually and then I’ll do get char index
here current, spelled that wrong. Ok now I’m going to say, hey, check this
even exists so check if there is a child for this character and so I’m going to
get node of char code. Actually I guess I can just call current here. Now if
child was null then I need to create that child. So if so then set node of
current comma new node. Actually want to get the value here. The child is going to be new node child
and then go and recurse. Call child dot add of s comma index plus one. Now the
other, one other thing I have to do here I have to track how many characters, I
need to know how many there are and let me hold off on that and go down to this
find. method. Now let’s turn to how find count works. So what find counts going to do is it’s
going to also operate recursively and it’s going to walk through the nodes for each
character of s traversing this tree, for each character of s, and then if
there’s no next child with the right character then return 0 because that
string, that substring actually can’t be found, or if I get to the end of s then I
return the current count. So this is going to require me to actually keep the
current count in h node so I need to have this size variable here so I’m
gonna keep the size of each node. Now what’s going to happen is that every single
time I call add on a string, a string actually gets, something actually just gets
added beneath that. So every single time add gets called, I just have to increment the size
Then on this find count what I say is if on at the end of s then return the size,
otherwise go to my next node. So I have to first of all get my next node so node that equals get node of s dot char at of index. If the child is null
return 0 because there’s actually nothing with that substring then.
Otherwise the recurse and do find count of s comma index plus 1. Ok so that’s how I implement this, now
let’s run this and see how we did. Alright it’s looking good, so that’s,
this is a really really simplistic implementation of a try. We could if we wanted to do a hashmap
instead of a simple character array but for this problem because of the
simplicity of it a character array is sufficient. So it’s, it seems like
this really overwhelming thing to, oh my god I have to implement a try, but ultimately
it’s really not that bad, just think very clearly what does implementing a try really
mean? In this case it just means we need to have these, you know, these adds method,
this find count, and we need to do some basic getting and setting operations. It’s
really not that bad in the end. So keep tries in mind as a really really useful
data structure for interview questions What I’ve seen is a lot of times when
you see a problem where it’s framed, where there’s some sort of validation against
some other input that’s where you’d often end up using a try. So keep that mind as you do future problems. Good luck.

21 thoughts on “Data Structures: Solve ‘Contacts’ Using Tries”

  1. I don't think this will find the count of contacts that exist containing that substring or start with that substring

  2. At 2.24, for children instance variable's data type, she said though HashMap would take more memory but they(HM) would make certain operations easier. What does she mean by that? Wouldn't taking an array of length 26 every time is inefficient, since a node would not always have all 26 characters as children?

  3. If anyone managed to implement this in Swift let me know. Id love to take a look at it. Ive tried dictionaries with already implemented words for quick lookup .etc. but im still timing out on some of the tests

  4. The variable CharCode is not being used. By the way thanks a lot for your videos they are great and helping me a lot. =D

  5. question – at the add() function, shouldn't we check if the contact we try to add already in the trie tree? if so, the "size++" is wrong since there will be no adding to the number of contacts on that string route… what am I missing?

  6. What if I want to delete a contact from my contacts' directory? I've implemented deletion also – https://github.com/suyash248/ds_algo/blob/master/Trie/contacts_list.py

  7. This logic would not work with duplicates, correct? A Trie will not have duplicates but by this logic size still gets incremented. Am I missing something?

  8. Code would be clear if she actually walked through an example like "bat". Can someone walk through the code using this example?

  9. isn't it wrong you need to return the count of contacts that start with the string 's' and not all the strings which have prefixes as 's'.

  10. Is there any practical application where this solution is, in fact, the best solution? I simply cannot think of one.

Leave a Reply

Your email address will not be published. Required fields are marked *