SEO Site Analysis

Crawl and analyze entire sites for technical SEO issues and link structure.

source

detect_duplicate_content


def detect_duplicate_content(
    session:Session, # Active database session
    website_id:int, # Website to search within
    content:str, # Normalized page content
    file_path:str, # Current article path (to exclude self)
    similarity_threshold:float=0.8, # Minimum similarity to flag
)->dict:

Find articles within a website with content similar to the given content.


source

analyze_content_groups


def analyze_content_groups(
    session:Session, # Active database session
    website_id:int, # Website to search within
    similarity_threshold:float=0.8, # Minimum similarity to group
)->dict:

Group similar articles together across a website.


source

analyze_keyword_cannibalization


def analyze_keyword_cannibalization(
    session:Session, # Active database session
    website_id:int, # Website to search within
    keyword:str, # Focus keyword to check
)->dict:

Find articles within a website competing for the same focus keyword.


source

analyze_content_groups_fast


def analyze_content_groups_fast(
    session:Session, website_id:int, similarity_threshold:float=0.8, num_hashes:int=128
)->dict:

Group similar articles using MinHash for fast approximate similarity.


source

minhash_similarity


def minhash_similarity(
    sig1:list, sig2:list
)->float:

Estimate Jaccard similarity from two MinHash signatures.


source

minhash_signature


def minhash_signature(
    shingles:set, num_hashes:int=128
)->list:

Compute MinHash signature for a set of shingles.


source

get_shingles


def get_shingles(
    text:str, n:int=3
)->set:

Generate n-word shingles from text.