# HG changeset patch # User Eris Caffee # Date 1349244424 18000 # Node ID d11dfc49b5595b96bd60711d55229489ed701ef3 # Parent ef73b96fdcaefe85ac854dd21bb80e8851399c77 Seems to be working, but I need to prove the algorithms and also make ctrie_remove compress down, too. See the final output where the standalone 'n' node could be pushed into it's children diff -r ef73b96fdcae -r d11dfc49b559 src/ctrie.c --- a/src/ctrie.c Tue Oct 02 20:00:38 2012 -0500 +++ b/src/ctrie.c Wed Oct 03 01:07:04 2012 -0500 @@ -61,7 +61,7 @@ ctrie->root->data = NULL; ctrie->size = ctrie->count = 0; - printf("The root node is %p\n", ctrie->root); + // printf("The root node is %p\n", (void *) ctrie->root); return ctrie; } @@ -137,8 +137,7 @@ if ( (NULL == ctrie) || (NULL == key) ) return NULL; - printf("root node is "); - ctrie_print_node(ctrie->root); + // printf("root node is "); ctrie_print_node(ctrie->root); return ctrie_insert_node(ctrie, ctrie->root, key, data); } @@ -151,74 +150,81 @@ struct ctrie_node * p = ctrie_longest_matching_child(node, key); - /* - printf("Longest match is "); + + // printf("Longest match is "); if (NULL != p) ctrie_print_node(p); else printf("(nil)\n"); - */ + if (NULL == p) { // No matching child found. Create new(key, data) as child of node struct ctrie_node * new = ctrie_node_new(key, data); if (NULL == new) - return NULL; + return NULL; new->parent = node; if (NULL == node->child) - node->child = new; - else - { - struct ctrie_node * s = node->child; - while (NULL != s->next_sibling) - s = s->next_sibling; - s->next_sibling = new; - new->prev_sibling = s; - } + node->child = new; + else + { + struct ctrie_node * s = node->child; + while (NULL != s->next_sibling) + s = s->next_sibling; + s->next_sibling = new; + new->prev_sibling = s; + } ctrie->size++; ctrie->count++; - /* - printf("Added "); - ctrie_print_node(new); - */ + + // printf("Added "); ctrie_print_node(new); + return key; } else if (ctrie_str_common(p->key, key) == strlen(p->key)) { // Found a node whose key matches for it's entire length - if (0 == strcmp(p->key, key)) - { - // Key already exists. Error - return NULL; - } + // Does it match the entire rest of the key we are inserting? + if (ctrie_str_common(p->key, key) == strlen(key)) + { + if (p->data) + { + // printf("This node already exists\n"); + // Key already exists. Error + return NULL; + } + else + { + p->data = data; + ctrie->count++; + return key; + } + } else - { - // The node found has a key that is a prefix of the key we are inserting. - // Recurse with the remainder of the key - // printf("Recursing because node's entire key %s is a prefix of %s\n", p->key, key); - return ctrie_insert_node(ctrie, p, key+strlen(p->key), data); - } + { + // The node found has a key that is a prefix of the key we are inserting. + // Recurse with the remainder of the key + // printf("Recursing because node's entire key %s is a prefix of %s\n", p->key, key); + return ctrie_insert_node(ctrie, p, key+strlen(p->key), data); + } } else { // Found a node that matches for only part of it's length. // Split the node and then recurse. - /* - printf("splitting this node: "); - ctrie_print_node(p); - */ + // printf("splitting this node: "); ctrie_print_node(p); // Get the new shorter key for p. int matchlen = ctrie_str_common(p->key, key); char * prefix = malloc(matchlen+1); if (NULL == prefix) - return NULL; + return NULL; strncpy(prefix, key, matchlen); prefix[matchlen] = '\0'; @@ -249,8 +255,8 @@ */ // Now add the user requested new key as a child of p - // printf("Recursing with key %s as a prefix of %s\n", p->key, key); - return ctrie_insert_node(ctrie, p, key + matchlen, data); + // printf("Recursing with node %p key %s as a prefix of %s\n", p->parent, p->parent->key, key); + return ctrie_insert_node(ctrie, p->parent, key, data); } // should not get here @@ -265,13 +271,13 @@ int len = 0; int l; - printf("Searching for match in children of %p\n", node); + //printf("Searching for >%s< in children of %p (%s)\n", key, (void *) node, node->key); struct ctrie_node * p = node->child; while (NULL != p) { l = ctrie_str_common(p->key, key); - printf("common length is %d\n", l); + //printf(" Checking >%s< common length is %d\n", p->key, l); if (l > len) { len = l; @@ -279,6 +285,8 @@ } p = p->next_sibling; } + + //printf("Returning longest match %p\n", longest); return longest; } @@ -294,10 +302,14 @@ int ctrie_str_common(char * s1, char * s2) { + // printf("s1 >%s< s1 >%s<\n", s1, s2); int i = 0; - while ( (*s1 != '\0') && (*s2 != '\0') && + while ( (*(s1+i) != '\0') && (*(s2+i) != '\0') && (*(s1+i) == *(s2+i)) ) + { + // printf("*s1+%d %c *s2+%d %c\n", i, (*(s1+i) == '\0' ? "." : *(s1+i)), i, (*(s2+i) == '\0' ? "." : *(s2+i))); i++; + } return i; } @@ -309,7 +321,9 @@ if (NULL == node) printf("(nil node)\n"); else - printf("%s \t%p\t parent:%p \tchild:%p \tprev:%p \tnext:%p \tdata:%p\n", node->key, node, node->parent, node->child, node->prev_sibling, node->next_sibling, node->data); + printf("%s \t%p\t parent:%p \tchild:%p \tprev:%p \tnext:%p \tdata:%p\n", + node->key, (void *) node, (void *) node->parent, (void *) node->child, + (void *) node->prev_sibling, (void *) node->next_sibling, node->data); } @@ -376,8 +390,10 @@ void ctrie_visit_node(struct ctrie * ctrie, struct ctrie_node * node, void (*op)(char * k, void * d), char * prefix) { + // printf("Visiting node %p with prefix >%s<\n", node, prefix); if (NULL != node->data) { + // printf("node has data\n"); char * k = malloc(strlen(prefix) + strlen(node->key)); if (NULL == k) return; @@ -390,6 +406,7 @@ if(NULL != node->child) { + // printf("node has children\n"); char * k = malloc(strlen(prefix) + strlen(node->key)); if (NULL == k) return; @@ -426,17 +443,15 @@ size_t len = strlen(key); for (int i = 0; i <= len; i += strlen(node->key)) { - node = ctrie_search_children(node, key+i); + node = ctrie_longest_matching_child(node, key+i); if (NULL == node) return NULL; - if (NULL != node->data) + if (strlen(key+i) == strlen(node->key)) return node; } // Should never get here return NULL; - - struct ctrie_node * p = ctrie_longest_matching_child(node, key); } //////////////////////////////////////////////////////////////////////////////// @@ -453,73 +468,21 @@ return p; } - - - - - - - - - - - - - - - - - - - -//////////////////////////////////////////////////////////////////////////////// -void * -ctrie_remove(struct ctrie * ctrie, char * key) - { - if ( (NULL == ctrie) || (NULL == key) ) - return NULL; - - struct ctrie_node * node = ctrie_find_node(ctrie, key); - if (NULL == node) - return NULL; - - void * data = node->data; - - while ( (NULL == node->prev_sibling) && (NULL == node->prev_sibling) ) - { - struct ctrie_node * p = node; - node = node->parent; - free(p); - ctrie->size--; - } - - if (node == ctrie->root) - return data; - - if (node->prev_sibling) - node->prev_sibling->next_sibling = node->next_sibling; - else - node->parent->child = node->next_sibling; - - if (node->next_sibling) - node->next_sibling->prev_sibling = node->prev_sibling; - - free(node); - - ctrie->count--; - - return data; - } - //////////////////////////////////////////////////////////////////////////////// void -ctrie_subtree_list_push(char * k, void * d) +ctrie_subtree_list_push(char * key, void * data) { if (NULL == subkey_list) return; - + char * k = malloc(strlen(key)+1); + if (NULL == k) + return; + strcpy(k, key); if ( (0 == subkey_list_limit) || (list_size(subkey_list) < subkey_list_limit) ) + { list_push_front(subkey_list, k); + // printf("list push >%s<\n", k); + } } //////////////////////////////////////////////////////////////////////////////// @@ -532,25 +495,47 @@ if ( (NULL == ctrie) || (NULL == prefix) ) return NULL; - struct ctrie_node * node = ctrie_find_node(ctrie, prefix); + // struct ctrie_node * node = ctrie_find_node(ctrie, prefix); + + struct ctrie_node * p = NULL; + struct ctrie_node * node = ctrie->root; + size_t len = strlen(prefix); + // printf("len %d\n", len); + int index; + for (int i = 0; i < len; i += ctrie_str_common(prefix+i, node->key)) + { + // printf("i %d\n", i); + p = node; + node = ctrie_longest_matching_child(node, prefix+i); + // printf("longest returned %p %d\n", node, strlen(node->key)); + if (NULL == node) + { + // printf("breaking\n"); + break; + } + index = i; + } + if (NULL == node) return NULL; + // printf("Found longest match as %p >%s< index is %d\n", node, node->key, index); + subkey_list = list_new(); if (NULL == subkey_list) return NULL; subkey_list_limit = limit; - size_t len = strlen(prefix); + len = strlen(prefix); char * pref = malloc(len+1); if (NULL == pref) return NULL; - strncpy(pref, prefix, len-1); - pref[len-1] = '\0'; - - ctrie_visit_node(ctrie, node->parent, ctrie_subtree_list_push, pref); + strncpy(pref, prefix, index); + pref[index] = '\0'; + // printf("Built pref as >%s<\n", pref); + ctrie_visit_node(ctrie, node, ctrie_subtree_list_push, pref); free(pref); - + // printf("Returning\n"); return subkey_list; } @@ -568,3 +553,83 @@ list_delete(list); } + +//////////////////////////////////////////////////////////////////////////////// +void * +ctrie_remove(struct ctrie * ctrie, char * key) + { + if ( (NULL == ctrie) || (NULL == key) ) + return NULL; + + // printf("Looking for %s\n", key); + struct ctrie_node * node = ctrie_find_node(ctrie->root, key); + if (NULL == node) + return NULL; + + // printf("Found node %p ", node); ctrie_print_node(node); + + // We found a node, but is it an actual data node or just a prefix? + if (NULL == node->data) + return NULL; + + // OK, so it's a data node. If it has no children we can delete it. + // Otherwise we just remove the data and maybe collapse nodes + + void * data = node->data; + struct ctrie_node * parent = node->parent; + + if (NULL == node->child) + { + // Remove from sibling list. + if (node->prev_sibling) + node->prev_sibling->next_sibling = node->next_sibling; + if (node->next_sibling) + node->next_sibling->prev_sibling = node->prev_sibling; + + // Point the parent to the new first child, if needed. + if (parent->child == node) + { + if (node->prev_sibling) + parent->child = node->prev_sibling; + else + parent->child = node->next_sibling; + } + + free(node->key); + free(node); + ctrie->size--; + ctrie->count--; + } + else + { + node->data = NULL; + ctrie->count--; + } + + + // printf("merging nodes\n"); ctrie_dump_raw(ctrie); + + // At this point we may have nodes that need to be collapsed together. + // This happens when the parent has only one child. + if (NULL == parent->child->next_sibling) + { + char * k = malloc(strlen(parent->key) + strlen(parent->child->key) + 1); + // UGH! I need a real error signalling mechanism to handle this. + if (NULL == k) + return data; + k[0] = '\0'; + strcat(k, parent->key); + strcat(k, parent->child->key); + free(parent->key); + parent->key = k; + + node = parent->child; + parent->child = node->child; + free(node->key); + free(node); + ctrie->size--; + } + + return data; + } + diff -r ef73b96fdcae -r d11dfc49b559 src/ctrie.h --- a/src/ctrie.h Tue Oct 02 20:00:38 2012 -0500 +++ b/src/ctrie.h Wed Oct 03 01:07:04 2012 -0500 @@ -1,7 +1,7 @@ #ifndef CTRIE_H_ #define CTRIE_H_ -// Compact trie - internal nodes are collapsed together if they have no siblings. +// Compact trie - internal nodes are collapsed together if they have no siblings.F struct ctrie; diff -r ef73b96fdcae -r d11dfc49b559 src/ctrie_test.c --- a/src/ctrie_test.c Tue Oct 02 20:00:38 2012 -0500 +++ b/src/ctrie_test.c Wed Oct 03 01:07:04 2012 -0500 @@ -7,7 +7,7 @@ void print_val(char * key, void * data) { - printf("%s\t\t%s\n", key, data); + printf("%s\t\t%s\n", key, (char *) data); } int main(int argc, char ** argv) @@ -40,12 +40,16 @@ i += 2; } - ctrie_dump_raw(ctrie); ctrie_dump(ctrie); ctrie_walk_keys(ctrie, print_val); + ctrie_dump_raw(ctrie); printf("ctrie is storing %zd words in %zd keys\n", ctrie_count(ctrie), ctrie_size(ctrie)); - exit(EXIT_SUCCESS); + printf("\n\n\n"); + ctrie_insert(ctrie, "a", "indefinite article. Used before an initial consonant sound."); + ctrie_dump_raw(ctrie); + printf("ctrie is storing %zd words in %zd keys\n", ctrie_count(ctrie), ctrie_size(ctrie)); + //////////////////////////////////////////// printf("===================================================\n"); @@ -69,12 +73,12 @@ printf("Nothing found for '%s'\n", word); else { - printf("list size is %d\n", list_size(list)); + printf("list size is %zd\n", list_size(list)); struct list_iterator * it = list_begin(list); struct list_iterator * next = it; while (NULL != next) { - printf("%s\n", list_value(it)); + printf("%s\n", (char *) list_value(it)); next = list_next(it); } ctrie_free_subkey_list(list); @@ -89,23 +93,67 @@ printf("Nothing found for '%s'\n", word); else { - printf("list size is %d\n", list_size(list)); + printf("list size is %zd\n", list_size(list)); struct list_iterator * it = list_begin(list); struct list_iterator * next = it; while (NULL != next) { - printf("%s\n", list_value(it)); + printf("%s\n", (char *) list_value(it)); next = list_next(it); } ctrie_free_subkey_list(list); } + + //////////////////////////////////////////// + printf("===================================================\n"); + strcpy(word, "he"); + printf("Looking for completions of '%s':\n", word); + list = ctrie_get_subkeys(ctrie, word, 0); + if (NULL == list) + printf("Nothing found for '%s'\n", word); + else + { + printf("list size is %zd\n", list_size(list)); + struct list_iterator * it = list_begin(list); + struct list_iterator * next = it; + while (NULL != next) + { + printf("%s\n", (char *) list_value(it)); + next = list_next(it); + } + ctrie_free_subkey_list(list); + } + + + //////////////////////////////////////////// + printf("===================================================\n"); + strcpy(word, "helpe"); + printf("Looking for completions of '%s':\n", word); + list = ctrie_get_subkeys(ctrie, word, 0); + if (NULL == list) + printf("Nothing found for '%s'\n", word); + else + { + printf("list size is %zd\n", list_size(list)); + struct list_iterator * it = list_begin(list); + struct list_iterator * next = it; + while (NULL != next) + { + printf("%s\n", (char *) list_value(it)); + next = list_next(it); + } + ctrie_free_subkey_list(list); + } + + //////////////////////////////////////////// printf("===================================================\n"); strcpy(word, "anybody"); printf("remove '%s'\n", word); ctrie_remove(ctrie, word); + // remove a word that is not in the dictionary, but is a prefix of other words. Should do nothing. strcpy(word, "ann"); printf("remove '%s'\n", word); ctrie_remove(ctrie, word); @@ -118,9 +166,16 @@ printf("remove '%s'\n", word); ctrie_remove(ctrie, word); - ctrie_dump(ctrie); + strcpy(word, "an"); + printf("remove '%s'\n", word); + ctrie_remove(ctrie, word); + + ctrie_dump_raw(ctrie); printf("ctrie is storing %zd words in %zd keys\n", ctrie_count(ctrie), ctrie_size(ctrie)); + + exit(EXIT_SUCCESS); + //////////////////////////////////////////// printf("===================================================\n"); strcpy(word, "anybody");