# SEO Site Analysis


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

------------------------------------------------------------------------

<a
href="https://github.com/abdelkareemkobo/seo_rat/blob/main/seo_rat/seo_site_analysis.py#L16"
target="_blank" style="float:right; font-size:smaller">source</a>

### detect_duplicate_content

``` python

def detect_duplicate_content(
    session:Session, # Active database session
    website_id:int, # Website to search within
    content:str, # Normalized page content
    file_path:str, # Current article path (to exclude self)
    similarity_threshold:float=0.8, # Minimum similarity to flag
)->dict:

```

*Find articles within a website with content similar to the given
content.*

------------------------------------------------------------------------

<a
href="https://github.com/abdelkareemkobo/seo_rat/blob/main/seo_rat/seo_site_analysis.py#L52"
target="_blank" style="float:right; font-size:smaller">source</a>

### analyze_content_groups

``` python

def analyze_content_groups(
    session:Session, # Active database session
    website_id:int, # Website to search within
    similarity_threshold:float=0.8, # Minimum similarity to group
)->dict:

```

*Group similar articles together across a website.*

------------------------------------------------------------------------

<a
href="https://github.com/abdelkareemkobo/seo_rat/blob/main/seo_rat/seo_site_analysis.py#L34"
target="_blank" style="float:right; font-size:smaller">source</a>

### analyze_keyword_cannibalization

``` python

def analyze_keyword_cannibalization(
    session:Session, # Active database session
    website_id:int, # Website to search within
    keyword:str, # Focus keyword to check
)->dict:

```

*Find articles within a website competing for the same focus keyword.*

------------------------------------------------------------------------

<a
href="https://github.com/abdelkareemkobo/seo_rat/blob/main/seo_rat/seo_site_analysis.py#L98"
target="_blank" style="float:right; font-size:smaller">source</a>

### analyze_content_groups_fast

``` python

def analyze_content_groups_fast(
    session:Session, website_id:int, similarity_threshold:float=0.8, num_hashes:int=128
)->dict:

```

*Group similar articles using MinHash for fast approximate similarity.*

------------------------------------------------------------------------

<a
href="https://github.com/abdelkareemkobo/seo_rat/blob/main/seo_rat/seo_site_analysis.py#L94"
target="_blank" style="float:right; font-size:smaller">source</a>

### minhash_similarity

``` python

def minhash_similarity(
    sig1:list, sig2:list
)->float:

```

*Estimate Jaccard similarity from two MinHash signatures.*

------------------------------------------------------------------------

<a
href="https://github.com/abdelkareemkobo/seo_rat/blob/main/seo_rat/seo_site_analysis.py#L83"
target="_blank" style="float:right; font-size:smaller">source</a>

### minhash_signature

``` python

def minhash_signature(
    shingles:set, num_hashes:int=128
)->list:

```

*Compute MinHash signature for a set of shingles.*

------------------------------------------------------------------------

<a
href="https://github.com/abdelkareemkobo/seo_rat/blob/main/seo_rat/seo_site_analysis.py#L78"
target="_blank" style="float:right; font-size:smaller">source</a>

### get_shingles

``` python

def get_shingles(
    text:str, n:int=3
)->set:

```

*Generate n-word shingles from text.*
