Longest Substring Without Repeating Characters

Hey, so the whole idea of me posting leetcode solutions is not because I think I can explain better than thousands of others out there. Its just a place where I write down my thoughts and explanations so that I remember different techniques I worked through :)

Problem

Link: https://leetcode.com/problems/longest-substring-without-repeating-characters/

Statement: Given a string s, find the length of the longest substring without repeating characters.

Example:

Input: s = "abcabcbb"
Output: 3
Explanation: The answer is "abc", with the length 3. 

Constraints:

0 <= s.length <= 5*1e4

Note: It is important to see that the question asks for substring (the characters must be continous) and not subsequence.

Time Complexity: O(\(nlogn\))

So I will just explain the solution as nicely as possible. I wont give code for this since I believe you should write it yourself for practice (there is no sense if once just googles code for a given problem). Although, for questions which are difficult I would probably provide some snippets to enable understanding.

Thought Process: Well, we need to find length of longest substring but I cannot go do O(\(n^2\)). O(\(n\)) seems nice but complicated, I will leave that for now. So, looking at the string I see that it would be nice if I could just linearly search through all of them and see which one is the answer, but issue is that I cannot search through all n possibilities that way. So, I only search through selected values of n so that the complexity does not rise too much. Now, one thing I know is if there is no answer for x length, there is no answer for any length >x. So, boom, this looks like a job for binary search.

Explanation So, we are doing binary search to find the required length of substring, and then doing linear search to confirm the condition (all characters must be unique). Naturally, if there exists even 1 substring which satisfies this condition, we say that the x satisfies the condition. Now, lets take our example of abcabcbb. Our first x=4, for which the answer is:

Now, our search range is between 0 -> 3. Our next x=1, and it is obvious that satisfies condition. So now our search range is between 1 -> 3, and our next x is x=2. Again, its obvious this satisfies the condition. So our final range is between 2 -> 3. Now, lets check for x=3.

As we can see, our answer is 3.

Solution 2: Pointer with Dictionary

Time Complexity: O(\(n\))

Thought Process: So we need O(\(n\)). One thing you must note is that when we are the ones trying to find the answer (as humans, unless you are a bot, I guess you are also welcome here? idk) is that we go linearly from left to right, and whenever a repeating character comes up, we cut off from where it previously occured, and continue. Neat. This will be our algorithm.

Explanation: Consider a dictionary m and a left pointer. Whenever I get a new character never seen before, I insert it into the dictionary with the index where it occured as its value. Whenever I see a character already in the dictionary, I make note of the current substring, and then update the value of key in dictionary with value = i. Simple enough?

A couple of things we do need to note is that a) Well, what about the end?: Simple enough, we add a line which goes from my left pointer (I will tell you the use of this soon) and current index. This will be my last substring which satisfies the condition b) Uh, what if I get something like xyyx, where I have to cutoff after the first index. Fret not, this has been taken care of. We just make sure that any index I consider has to be after left at all costs.

So, in this case I feel a bit of code is important.

# Left is simply an int which remembers where my substring is starting from
# m is my dictionary
# sollen stores the max answer (initialised to 0)
for i in range(len(s)):
    # check if character exists 
    if s[i] in m:
       # if it does, then check if it occurs before left
       # (before it was cutoff)
       if m[s[i]] < left:
          # if it was before it was cutoff, then my substring is valid 
          # I just need to update its value
          m[s[i]] = i
       else:
          # If not, then I need to cut my substring now
          # Store the length of substring
          sollen = max(sollen, i - left)
          # Update my left pointer to next occurance of repeated character
          left = m[s[i]] + 1
          # Update value of repeated character
          m[s[i]] = i

# Check if the last substring might be the answer?          
sollen = max(sollen, len(s) - left)
return sollen