Articles Blog

Trie Data Structure

Trie Data Structure


Hello friends my name is Tushar and today
I’m going to give an introduction to the Trie data structure and then we’re going
to talk about how to insert,search and delete in the trie data structure. So what is trie? Trie is a tree data
structure used for storing collections of strings. If 2 strings have a common prefix then
they will have a same ancestor in this tree. So if you have thousands of strings or
if you have hundreds of thousands of strings then you can use trie to store
all the strings and then it’s really easy to search if a string exist in this
collection of strings or not.So for example trie is an ideal data structure for
storing our dictionary.Trie can also be used ,used to do prefix based search and
you can sort the strings lexographically in the trie.Another
alternative is to use a hash table but with hash table you cannot do a prefix
based search and also a regular hash table takes more space than a trie. So next let’s talk about how to do insertion
into a trie.As I said before trie is a tree data structure and this tree
is made up of collections of trie node.So every trie node has 2,has 2 main components
a map where the key is character and value is trie node.This is
used to establish the parent-child relationship and as we look through the
examples it’ll be clear how it is being used and the second one is boolean
and endOfWord indicating that the character represented,representing this trie node
is a end of word or not. If your
character set consists of just lower cases or just upper cases then you could
replace this map with a array of size 26 but for generalisation purpose I’m using a
map so it also works with,for example unicode character set. So next what I’m going to do is,I’m gonna
insert this 5 words into a trie. So initially the trie,our trie is a root and root has a empty map so this map here
is empty and then boolean end of word is false. So then we’re going to take this word
abcd and insert into this trie.So index is pointing to a and we
are at the root of this trie node. So first we check is does a,this a is
present in the map of this root.So map of the root was empty so ‘a’ is not present so then we create a new trie node whose
end of word is false so this boolean is false for this trie node and
then we put ‘a’ into this root’s map and for the value we put the reference of this
newly created trie node.So effectively we have created a relationship like this.
Then we increment this to ‘b’ and then we move here. So here we check does ‘b’ belong,is ‘b’
present in this trie node,in the map of this trie node. So again since this was newly created
‘b’ is not present so we create another trie node,the value
false,endOfWord value false and put ‘b’ in the map of this trie node and point it
to here and then we move here and we increment this guy to ‘c’. So does ‘c’ exists here? So again this was a
newly created trie node so it’s map is empty so ‘c’ does not exist,so we
create ‘c’ and since ‘c’ is the last character of
this world,in this case this value here will be true and then,we are going
to put a ‘c’ in the map of this trie node and point it to,point to,point a reference
to this trie node. So this is how we inserted abc into an
empty trie. So all we did was we started with root,put ‘a’
there,then we went to ‘b’ put, then we went to this guy put ‘b’ there,
then we went to this here and put ‘c’ there and then finally we reached here and since this was the last,this was
the last,this represents the last character of this node,we mark this,
endOfWord is true. So next let’s try to insert this word ‘ab
gl’.So again we started with ‘a’ and we started root,does root have ‘a’. So root does have ‘a’. So we jump here. Does this trie node
have ‘b’ so this trie node does have ‘b’ so we jump here and our pointer is now at ‘g’ so
does this trie node have ‘g’,it does not have ‘g’ so we’re going
to create a new trie node whose endOf Word is false,it’s and then we put ‘g’
into the map and the reference of this in the value of the map here and
then we move 1 more. So this comes to ‘l’.Does ‘l’ exist
here? ‘l’ does not exist here so we are going to again create a new trie node and since
this is the last character of this word this value will be true and we’re gonna put
‘l’ here and point this way. So now we have done inserting ‘abgl’.So
next let’s try to insert ‘cdf’ So again we started root.Does root
have ‘c’,so root does not have ‘c’ so what I’m going to do is I’m going to insert ‘c’
create a new trie node whose endOfWord is false and point this here and then we
jump here and our index is now at ‘d’. Does this guy have ‘d’,it does not have ‘d’ so
we create a new trie node with the value false and point this to ‘d’ and then
put into the map ‘d’ and the reference of this trie node and then move to ‘f’.So f does
not exist here so I’m going to create a new trie node. It’s value will again be true
because this is the last character of this word and we’ll say that ‘f’ points this way. So finally we are done inserting ‘cdf’. Next let’s try to insert ‘abcd’.So again
we start from root again we start with root,’a’ is there so we
follow the pointer of ‘a’ and reach here so here b is also there so we follow the
pointer and reach ‘c’.So c is also there so we follow the pointer and reach this
point and in here,so now our pointer is pointing to ‘b’.So here ‘b’ does not exist.
So what we’re gonna do is we’re going to add ‘b’ here.So add ‘b’,create a new,create
a new trie node since this is a last character
mark this as true, add ‘d’ and reference of this guy in the
map here and this is how we added ‘d’.Finally let’s add this word ‘lmn’. So where,our root again we’ve come back to
root,’l’ does not exist so we’re gonna add ‘l’ and we are going to create a new
trie node for ‘l’,the value false and then we move here,’n’ does not exist so we’re going to add ‘m’ and follow
the trie node here,false and then we move this
to ‘n’ and ‘n’ does not exist here so we’re going to create ‘n’ and follow this
trie node and ‘n’ jumps here and since this is the last,’n’ is the last character,
this value becomes true. So this is how my trie looks after I’m
done inserting all this 5,5 words in my trie. So as you can see if two words have the
same prefix,for example this and this in this case they share the common ancestors. So here ‘abc’ and ‘abgl’ so they have
same, they have same prefix so they share
this common ancestor and from this point they bifurcate.So the time
complexity for this is pretty straightforward to see if the average
length of the word is ‘l’ and total number of words are ‘n’ then the time
complexity is O(lxn).So next let’s look at how
we’re gonna do a search in this trie. There are two kinds of searches,prefix
based search where we are checking if there is at least one word which starts with a
given prefix or not and then the whole word search where we’re checking if the entire word exists in the trie or not. So let’s
start with the prefix based search and see if there is any string and this
trie which starts with ‘ab’.So we are at root, we check if the map,if this map at the
root contains ‘a’ or not. So it does,it does contain ‘a’,so we jump to that
trie node so we reach here.Then here we check if the map in this trie node
contains ‘b’ or not.So it does contain ‘b’,so
what this means is there is at least 1 word which does start with ‘ab’.So this
should return true. There’s no reason to look forward into
this trie.Let’s look at ‘lo’.So again we start at root,we check is there,does ‘l’ exist in
the map of the root or not,so ‘l’ does exist so we follow it’s trie node and reach
here. Then we check does ‘o’ exist in this trie node
or not,so ‘o’ does not exist here so what it means is there is no word
which starts with prefix ‘lo’ so this should return false.Let’s try
some whole word search So let’s try and start with ‘lmn’.So again
we are at root.We check does root have ‘l’,so it does have ‘l’ so we follow here. Then we check does this trie node have ‘m’,
so it does have ‘m’ so we follow here and then we check does this trie node have
‘n’ so we,it does have ‘n’ so we follow here and reach this point. Once we reach this point,we check the,
is this node is the endOfWord value here is true or not and that is very important.So
endOfWord here is true,it means that ‘l’ ‘m’ and ‘n’ does exist in this trie so this should
return true.Let’s do ‘ab’. So again we start at root, we check ‘a’ exists or
not.So ‘a’ does exist so we follow here. Then we,then we are here and we
check if ‘b’ exists in this trie node or not.It does exist so we follow here and now we look at the
value here so this value,endOfWord is false. It means that there is no entire word
‘ab’ in this trie so this should return false. What about ‘cdf’, so we are at root does ‘c’ exist,it does so we move to,
we move to it’s forward pointer to reach ‘d’,does ‘d’ exist, it does,so we follow it’s pointer to
reach ‘f’,reach this node,does ‘f’ exist in this node, it does exist so we follow this
pointer and reach here and then the endOfWord is true,it means that
‘cdf’ does exist and finally ‘ghi’.Does ‘g’
exist in the root node? ‘g’ does not exist in the root node which means that word ‘ghi’ does not exist so this should
return false. So again the time complexity is very simple.
If the lenght of the word is ‘l’ then the time to search will be O(l).
Next let’s look at how we’re going to delete in this trie. So delete also
exists in two flavours,one is delete the entire word or second is delete all
the words which start with the given prefix. So we are trying to delete this world ‘abc’. We are at root,’a’ exist here,we follow the
link reach this point, this trie node.’b’ exist
here, follow the link reach this trie node,’c’ exist here,follow the link and reach this
trie node. So now we have to delete this word ‘abc’ and
we are here. So the thing is we cannot just delete
this node.Why,because if we remove this node from this trie node
then in that case we’ll also lose the word ‘abcd’ which is represented by this link here.So what we
do is since the map,since the map at this trie node is not empty,we cannot get
rid of this,this node so all we do is turn this into false,turn this
endOfWord into false so now if someone search ‘abc’ he would reach this point and since
this value is, endOfWord is false then they would say that ‘abc’ does not exist
in this trie and that’s how we deleted ‘abc’.Next we try to delete ‘abgl’ so we
start with root,follow link,reach this,this trie node,follow link reach
this trie node and then from here we follow this ‘g’ link and reach this trie node
and then we follow from here,we follow this ‘l’ link and reach this trie node.This
trie node has no children,why,because it’s map is empty so then we can safely
delete this guy and then try to delete the nodes and the higher level too. So now that this is deleted,we can
remove this,this character from it’s map and as soon as we remove this
character,now this map is empty so we can delete this as well and go one level up
here and delete this value ‘g’.Now we cannot delete this node because this has
another,another link this also has ‘c’,so at this point we’ll
stop.So we are done deleting ‘abgl’. Next we’ll delete ‘abcd’.So we follow
a,b,c,d and we reach this node.So since this map is empty we can just delete it and
then go 1 level up and delete this, this character and now this is empty so we
can delete it and reach one level up and remove ‘c’ here and now ‘c’ is empty so
we can delete this and similarly ‘b’ will be removed and now ‘b’ is empty so we can
also delete this and remove ‘a’ from here. So in this way we deleted all the
words, all the 3 words from this trie.
Again the time complexity will be if average length is ‘l’ and all the words
are,this number of words are ‘n’ then O(lxN) will be the, will be the
time complexity for deletion. So this is all I have to talk about trie. There are tons and tons of applications
of trie like doing prefix based searching if this is,if the values in this
map can be a values of character map can be stored in sorted format then we can
also sort this entire trie lexicographically and in general there
are many many applications which use trie node and probably we can
discuss that in another video. So next let’s look at the code
implementation for trie data structure. This is the trie node we talked about
before so this is, this is a map with the character and trie node and then
endOfWord. So first thing we do is we initialize
our root in the constructor of trie. Then we’re going to do insert. So this is
the iterative version of the insert and we’re
trying to insert a word into the trie. So first our current becomes root,then
we iterate through characters one at a time in the word and then we tried,we
see if the current,current has this character in it’s Map.So the children is
nothing but this map so we check if current has character in the map. If it does,if it does not so in that
case nodes will be null then we create a new trie node and put this
node and this character into the,into the current’s map and then we set current
equal to node and we keep doing it till we have characters in the word and
in the end we just mark current’s endOfWord as true as we
discussed before in the video and then there is the recursive version which is
pretty much doing the same thing as in the iterative version,getting the,getting the
node for the current’s,from the current’s children and
then checking if it is null.If it is then putting into the map and then calling
insertRecursive recursive function again. Let’s look at the search.Search also is
a search iterative and search recursive.In searchIterative we are looking if this
word exists in this trie or not. So again current becomes root,again
iterate through all the characters in the word,then we get the node for the current’s,
from the current’s map we check if this character exists or not. If this node
is null,it means that this character doesn’t exist in a map and
then we instantly return false otherwise current becomes node and then the end we
just return current endOfWord.So if this,if this word exists in the map then
this current endOfWord should be true. for this word to exist in the trie and
similarly the recursive version is exactly same as the iterative version
and finally we have delete. So we’re trying to delete this entire
word.So this is the recursive version so we first call delete(root,word,0).
So we check if index has reached the word length.If it has,then we
check if current is endOfWord.If current is not endOfWord it means
that this word does not exist in the trie and we’ll not delete anything so we
return false otherwise we set current endOfWord as false and then we return,
we return true if the current’s,current’s map is empty otherwise return false and this we are returning to give an
indication to the calling function that they can delete this reference from
their map. So this is the,this is if we reach
index’s word length otherwise we get the character and then we get the node for
that character from the map.If node is null it means that this word does not
exist in the dictionary,the word we’re trying to delete does not exist in
dictionary so we instantly return false otherwise we call this recursively and
should deleteCurrenNode will either return true or false.If it is true it
means that you got to remove this mapping from the children’s,from this current
map and then if and then return true if the size of the map for the current is
zero otherwise return false. So this is all I have to talk about trie.
Please like this video,share this video comment on this video.Check out my
facebook page and check out my github link. Thanks again for watching this video

100 thoughts on “Trie Data Structure”

  1. Hey Tushar i think there should be correction in if(shouldDeleteCurrentNode).The condition for the deletion of current node must also check if its field isEndOfWord is false.

  2. when u search if the word lmn exists, why do u have to start at the root, the letter l can be anywhere in the trie data structure ?

  3. Thanks for such a clear and to-the-point solution. Most of the solutions available in books/internet use arrays/list for storing children which less-flexible as compared to map/hashtable. I've impemented "Prefix Search" (i.e. "Autocomplete") also – https://github.com/suyash248/ds_algo/blob/master/Trie/trie.py

  4. Sir, is it possible implement autocorrect search using trie? I mean to say that can we implement a search with missplect search string?

  5. hmm 14:23 ??
    you just deleted "ab" too but why ??
    then you even deleted "a" too but why ? the work was already done before !

  6. Nice video !
    If we have words 'by' and 'bye' inserted in our Trie and use this DELETE method to remove word 'bye' then it will also remove work 'by'.

    Thanks

  7. Question: You had "abc" in there, and then you add "abcd". now in the "d" node, its children point to a end node. This means "abcd" is a word in your dictionary, but what about "abc" now? since "c" is no longer pointing to a end node, how can we tell "abc" is also a word in your dictionary?

  8. I cannot even say how thankful I am to you for making this video. Thank you very much! Great explaination.

  9. Great explanation! Since you do not check if a word already exists in the mapping, I suppose duplication is allowed.

  10. Hi,
    Why are you creating extra node when inserting new letter.
    Like when you inserted "a" .you are creating new node with end of word boolean as false.
    Because of this last node is empty .i dont understand this point.can you please explain?

  11. Hi Tushar, Thanks for explaining in detail. But I have one question. Lets say we inserted two words 'abc' and 'abcd' and according to your code if you delete 'abc' it is returning false because abcd is still there. But you are making endOfWord to false so that means you are basically deleting the word but still returning wrong value. Can you explain me?

  12. Sir in the code why do you have make the root final and private and also TrieNode class private?

  13. while inserting the second string, why did not you add the g to the map of the b node instead of the c node?

  14. Thanks for the video, but I do not understand why you need T/F and NULL-nodes at the same time. I would say we can use either of them, but why both? Seems redundant to me.

  15. in every word u are creating extra node poining it and making it tu instead u can do it in ending word nod itself…why aren't u doing that?

  16. A small confusion –> Don't you wanna append to children instead of putting and replacing old children? I'm just tryna run this example insert('help') and insert('hell') through your code and couldn't understand how are you handling children of node 'l' which is 'p' and 'l'. I maybe missing something (Not a typical Java guy)

  17. Thanks a lot , really wounderful , i have a DS question for you , please help me ……Have one DS question, need anser.
    Q : we have a city , in which we have locations, each location has few restaurents.
    One ( ex: KFC) restaurent might be in multiple locations( like HiTechcity, Nampally etc).

    To Given restaurent get all its locations, given location display all restaruents in that location.
    What Data Structure use this problem?

  18. Could you do a video on Succinct Tries?
    http://stevehanov.ca/blog/index.php?id=120
    BTW, These are great videos, and you do a great job of never glossing over steps, even if it is repetitive, making it easy to follow along.

  19. [I may be wrong] @14:30 when you are deleting abcd, I believe the node should get deleted only if the map is empty (as you mention) and endofword is false.

  20. good video although I have one question what about the case when we enter "abc" and "abcc" , if we use hashmap then it will will overwrite the previous reference value for c, so how do we handle this case here.

  21. Excellent Explanation. at 16:37 in video, in Insert method, node = current.children.get(ch) NOT current=node; this will fail at every first character.

  22. What will happen when we have "ask" and "ass" in trie, I want to delete "ass" but deleting is making current.endofword=false, then if someone search for ask, it will not be found. Please clear my confusion

  23. Wow, I know more about deep software system analysis in a few minutes of listening to this man than all my time interacting with devs in almost 6 years.

  24. Thanks for your useful labour here Tushar! I'd recommend getting a "lavalier" or just "lav" microphone; they're the ones that you clip onto your shirt's lapel. It would help the audio quality a _lot_, and make following the explanations a lot easier for international people who only know English as a second language. Lavalier mics are also relatively cheap compared to other kinds of microphone.

  25. So, the code is wrong for deleting. Example – in the current code if there is "ab" and "abc" and you remove "abc" then search for "ab" will return false. Correction in code is –
    if (shouldDeleteCurrentNode) {
    node.children.remove(c); // instead of current.children.remove
    return node.isEmpty(); // instead of current.children.size() == 0
    }

  26. Hey Tushar, If I'm not wrong I delete method is not working as expected. To test it:
    Insert word;
    cat
    card
    Then delete word catd

    Then search cat. It won't be found. Reason is the return statement in the second last line of your delete method.

    I think you should be returning

    current.children.size== 0 && !current.endofword

  27. Hey Tushar,
    Once again great video and thanks for sharing with all. I had one doubt in delete the word in Trie as you are checking for current.children.size() == 0 which is not going to be true when you have 2 or more characters starting at root node. Eg: 'Alpha' & 'Beta' root Trie node will have 'A' and 'B' both at the root which means the currentmap.childred.size() will not be 0. Please share your thoughts about this. Thanks.

  28. Kudos Tushar, you are an epitome of how to put up things in a concise and easy way , was really helpful, kindly upload more such videos

  29. There is a small bug in delete method.

    It should be "return !root.isEndOfWord && root.children.size() == 0;".

    Else that would delete all prefixed characters as well

    Thanks so much Tushor for everything that you do for us

Leave a Reply

Your email address will not be published. Required fields are marked *