【ybt高效进阶2-5-3】前缀匹配-白红宇的个人博客

【ybt高效进阶2-5-3】前缀匹配

发布日期：2021-05-07 07:01:07 浏览次数：22 分类：精选文章

本文共 4366 字，大约阅读时间需要 14 分钟。

为了解决这个问题，我们需要找到每个小串的前缀在大串上的最长匹配长度。由于直接暴力匹配效率太低，我们使用Trie树（前缀树）和Aho-Corasick算法来优化查询过程。

方法思路

Trie树构建：将所有小串插入到Trie树中。每个节点有四个子节点（对应E、S、W、N）和一个失败指针。

失败指针设置：使用广度优先搜索设置失败指针，使得在处理大串时可以沿着失败指针跳转，找到最近的匹配。

处理大串：从大串的第一个字符开始，逐个字符处理，沿着Trie树遍历，遇到终止点时记录长度，沿着失败指针跳转直到找到有效节点。

记录最长长度：在遍历过程中，记录遇到的最长终止点长度。

解决代码


   
   
前缀匹配
   
题目链接：
   
题目大意
   
给定一个大串和一些小串，找出每个小串的前缀在大串上的最长匹配长度。每个字符只有E、S、W、N四个字符组成。
   
思路
   
由于字符串长度较大，使用Trie树和Aho-Corasick算法来优化查询过程。Trie树用于存储小串，失败指针帮助快速跳转找到最近的匹配，处理大串时记录最长匹配长度。
   
代码
   
#include 
    
     #include 
     
      #include 
      
       #include 
       
        using namespace std;struct Node {    int children[4]; // 对应E、S、W、N四个字符    int fail; // 失败指针    bool is_end; // 是否是某个小串的终止点};int main() {    // 读取输入    int n, m;    char s[10000001];    char c[100001][101];    // 初始化Trie树    Node tree[10000001];    int tot = 1; // 总节点数    int root = 0;    // 插入小串到Trie树中    for (int i = 1; i <= m; ++i) {        string str;        do {            char ch = getchar();            if (ch == 'E') str += ch;            else if (ch == 'S') str += ch;            else if (ch == 'W') str += ch;            else if (ch == 'N') str += ch;            else {                while (ch != '\n') {                    ch = getchar();                }                if (ch == '\n') break;            }        } while (true);        if (str.empty()) continue;        // 插入到Trie树中        int current = root;        for (char ch : str) {            int idx = (ch == 'E') ? 0 : ((ch == 'S') ? 1 : ((ch == 'W') ? 2 : 3));            if (!tree[current].children[idx]) {                tree[current].children[idx] = ++tot;            }            current = tree[current].children[idx];            tree[current].is_end = true;        }    }    // 设置失败指针    queue
        
          q;    for (int i = 0; i < 4; ++i) {        if (tree[root].children[i]) {            q.push(tree[root].children[i]);            tree[tree[root].children[i]].fail = root;        }    }    while (!q.empty()) {        int current = q.front();        q.pop();        for (int i = 0; i < 4; ++i) {            if (tree[current].children[i]) {                q.push(tree[current].children[i]);                tree[tree[current].children[i]].fail = tree[tree[current].fail].children[i] ? tree[tree[current].fail].children[i] : root;            } else {                tree[current].children[i] = tree[tree[current].fail].children[i] ? tree[tree[current].fail].children[i] : root;            }        }    }    // 处理大串    int max_len = 0;    int current = root;    for (int i = 1; i <= n; ++i) {        char ch = s[i];        int idx = (ch == 'E') ? 0 : ((ch == 'S') ? 1 : ((ch == 'W') ? 2 : 3));        do {            if (tree[current].children[idx]) {                current = tree[current].children[idx];                if (tree[current].is_end) {                    max_len = max(max_len, i);                }                break;            } else {                current = tree[current].fail;            }        } while (current != root);    }    // 输出结果    for (int i = 1; i <= m; ++i) {        string str;        do {            char ch = s[i];            if (ch == 'E') str += ch;            else if (ch == 'S') str += ch;            else if (ch == 'W') str += ch;            else if (ch == 'N') str += ch;            else {                while (ch != '\n') {                    ch = s[i];                }                if (ch == '\n') break;            }        } while (true);        if (str.empty()) continue;        int len = 0;        int current = root;        for (char ch : str) {            int idx = (ch == 'E') ? 0 : ((ch == 'S') ? 1 : ((ch == 'W') ? 2 : 3));            while (true) {                if (tree[current].children[idx]) {                    current = tree[current].children[idx];                    if (tree[current].is_end) {                        len = max(len, (int)(str.find(ch) - str.find('E') + 1));                    }                    break;                } else {                    current = tree[current].fail;                }            }        }        printf("%d\n", len);    }    return 0;}

代码解释

Trie树结构：每个节点包含四个子节点和一个失败指针，用于存储小串和处理大串。

插入小串：逐个字符插入到Trie树中，标记终止点。

设置失败指针：使用队列处理节点，设置每个节点的失败指针。

处理大串：从大串的第一个字符开始，沿着Trie树遍历，遇到终止点时记录长度，沿着失败指针跳转直到找到有效节点。

查询最长长度：遍历每个小串的前缀，找到最长匹配长度。

这种方法通过预处理小串，建立Trie树和失败指针，优化了大串的查询效率，能够高效处理大数据量的问题。

上一篇：【python】理解列表推导式以及列表推导式嵌套

下一篇：【python】使用pygame写的飞机大战游戏

发表评论

关于作者

喝酒易醉，品茶养心，人生如梦，品茶悟道，何以解忧？唯有杜康！

-- 愿君每日到此一游！

方法思路

解决代码

前缀匹配

题目链接：

题目大意

思路

代码

代码解释

发表评论

最新留言

关于作者

推荐文章