Maximum Product of Word Lengths

Problem

Given a string array words, find the maximum value of length(words[i]) * length(words[j]) where the two words do not share common letters. You may assume that each word will contain only lower case letters. If no such two words exist, return 0.

Example 1:

Input: ["abcw","baz","foo","bar","xtfn","abcdef"]
Output: 16 
Explanation: The two words can be "abcw", "xtfn".

Example 2:

Input: ["a","ab","abc","d","cd","bcd","abcd"]
Output: 4 
Explanation: The two words can be "ab", "cd".

Example 3:

Input: ["a","aa","aaa","aaaa"]
Output: 0 
Explanation: No such pair of words.

Solution

集合暴力法

思路

看到这种需要需要判断有无相同元素的问题，第一时间就想到使用集合set()及其相关运算。题中给出的是字符串，遂先将字符串转成字母集合，然后就两层循环，找出两两对应集合没有交集的字符串的长度乘积最大值。

代码

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
class Solution:
    def maxProduct(self, words):
        """
        :type words: List[str]
        :rtype: int
        """
        max = 0
        set_words = [set(list(w)) for w in words]
        for i,word_1 in enumerate(words):
            for j,word_2 in list(enumerate(words))[i+1:]:
                if not (set_words[i] & set_words[j]):
                    l = len(word_1)*len(word_2)
                    max = l if l>max else max
                    
        return max

评价

速度太慢，用时1500ms，差不多处于后10%了，尽管在时间复杂度上已达到渐近的下限（似乎是 $O(n^2)$)，原因：

使用集合虽然方便但集合运算耗时还是有的（对比与下面的位位运算
在找最大值时没有好好利用Python自带的库函数和语法糖（自己写的就是比它慢！）

Mask位运算(200+ms

思路

将字符串中26个字母分别对应到二进制中的每一位上，然后将其通过或运算生成 mask，每个 mask 代表字符串含有的字母，然后可以通过和运算两两 mask 判断其是否有重复字母。

代码

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
class Solution:
    def maxProduct(self, words):
        """
        :type words: List[str]
        :rtype: int
        """
        masks = {}
        start = ord('a')
        for w in words:
            mask = 0
            for c in set(w):
                mask|=(1<<ord(c)-start)
            masks[mask] = max(len(w),masks.get(mask,0))
        return max([masks[i]*masks[j] for i in masks for j in masks if not i&j]+[0])

评价

速度大大提升，用时200ms，到别人的提交里翻了一下，前面的都是选择用这种位运算的方法。对比上面自己的使用集合的暴力方法，速度的提升主要便是位运算代替集合运算所节省的时间，尤其是判断两者是否有共同元素，使用&运算只需极短的常数时间（一条指令），而集合交集还要两者搜索，时间差距主要在此。另外，还有使用列表生成式的语法糖，也是非常能够精简代码结构（甚至执行时间？

Summary

该题是一道中等难度的判断重复公共元素的题目，不算难，值得学习的是其中用到的位运算。判断重复公共元素，第一时间想到的便是转成集合去解决，当然这个方法具有普适性，但当出现的元素为确定的有限元素（如字母，数码），便可以考虑选择使用位运算，这好像也是一种普遍的解决方法？我在下面的相关话题也看到了位运算的tag。

顺便给出此前遇到的相关的使用位运算的一个题目:

Single Number