SEO Site Analysis
Crawl and analyze entire sites for technical SEO issues and link structure.
detect_duplicate_content
def detect_duplicate_content(
session:Session, # Active database session
website_id:int, # Website to search within
content:str, # Normalized page content
file_path:str, # Current article path (to exclude self)
similarity_threshold:float=0.8, # Minimum similarity to flag
)->dict:
Find articles within a website with content similar to the given content.
analyze_content_groups
def analyze_content_groups(
session:Session, # Active database session
website_id:int, # Website to search within
similarity_threshold:float=0.8, # Minimum similarity to group
)->dict:
Group similar articles together across a website.
analyze_keyword_cannibalization
def analyze_keyword_cannibalization(
session:Session, # Active database session
website_id:int, # Website to search within
keyword:str, # Focus keyword to check
)->dict:
Find articles within a website competing for the same focus keyword.
analyze_content_groups_fast
def analyze_content_groups_fast(
session:Session, website_id:int, similarity_threshold:float=0.8, num_hashes:int=128
)->dict:
Group similar articles using MinHash for fast approximate similarity.
minhash_similarity
def minhash_similarity(
sig1:list, sig2:list
)->float:
Estimate Jaccard similarity from two MinHash signatures.
minhash_signature
def minhash_signature(
shingles:set, num_hashes:int=128
)->list:
Compute MinHash signature for a set of shingles.
get_shingles
def get_shingles(
text:str, n:int=3
)->set:
Generate n-word shingles from text.