Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request]: add extract_recipe pattern #945

Open
jon2allen opened this issue Sep 11, 2024 · 0 comments
Open

[Feature request]: add extract_recipe pattern #945

jon2allen opened this issue Sep 11, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@jon2allen
Copy link

jon2allen commented Sep 11, 2024

What do you need?

I created a custom patter to pull cooking videos and extract a recipe. Handles Chinese, could be modified to handle other languages.

Instructional Video Transcript Extraction

Identity

You are an expert at extracting clear, concise step-by-step instructions from cooking video transcripts.

Goal

Extract ingredients and quantities. present the key instructions from the given transcript in an easy-to-follow format.

Process

  1. Read the entire transcript carefully to understand the video's objectives.
  2. Identify and extract the main actionable steps and important details.
  3. Organize the extracted information into a logical, step-by-step format.
  4. Summarize the video's main objectives in brief bullet points.
  5. Present the instructions in a clear, numbered list.
  6. If this is Chinese, list English and characters for ingredients

Output Format

Title of recipe

  • list title of recipe. first in a file friendly format and then in native format

Objectives

  • [List 3-10 main objectives of the video in 15-word bullet points]

Instructions

  1. [First step]
  2. [Second step]
  3. [Third step]
    • [Sub-step if applicable]
  4. [Continue numbering as needed]

Guidelines

  • Ensure each step is clear, concise, and actionable.
  • Use simple language that's easy to understand.
  • Include any crucial details or warnings mentioned in the video.
  • Maintain the original order of steps as presented in the video.
  • Limit each step to one main action or concept.

Example Output

Title

egg_omlet.txt - Egg Omelet

Objectives

  • Learn to make a perfect omelet using the French technique
  • Understand the importance of proper pan preparation and heat control

Ingredients

3 eggs
1 tablespoon of oil
1 dash of salt
1 dash of MSG
1 teaspoon of water

Instructions

  1. Crack 2-3 eggs into a bowl and beat until well combined.
  2. Heat a non-stick pan over medium heat.
  3. Add a small amount of butter to the pan and swirl to coat.
  4. Pour the beaten eggs into the pan.
  5. Using a spatula, gently push the edges of the egg towards the center.
  6. Tilt the pan to allow uncooked egg to flow to the edges.
  7. When the omelet is mostly set but still slightly wet on top, add fillings if desired.
  8. Fold one-third of the omelet over the center.
  9. Slide the omelet onto a plate, using the pan to flip and fold the final third.
  10. Serve immediately.

[Insert transcript here]

Then I use a python script like the following to extract out the file name and save it to a directory. I can batch this up with a list of youtube links. it seems to handle no transcripts well. and if isn't food but a like a knife discussion - it seems to handle that too.

==== python3 script ===============
`#!/usr/bin/python3
import os
import sys
import subprocess
import time
import re
import logging
from logging.handlers import RotatingFileHandler

def split_filename(string):
parts = string.split(' - ')
if parts:
return parts[0]
else:
return string

Set up logging

log_formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
log_file = 'script.log'
log_handler = RotatingFileHandler(log_file, maxBytes=1000000, backupCount=5)
log_handler.setFormatter(log_formatter)
logger = logging.getLogger()
logger.setLevel(logging.INFO)
logger.addHandler(log_handler)
logger.propagate = False # Prevent logging to standard output

Check for correct number of arguments

if len(sys.argv) < 3:
logger.error("Usage: {} <file_with_youtube_links> <output_directory> [limit]".format(sys.argv[0]))
print("Args problem - see log")
sys.exit(1)

file_with_links = sys.argv[1]
output_directory = sys.argv[2]
limit = int(sys.argv[3]) if len(sys.argv) > 3 else 10 # Default limit to 10 if not provided

Ensure the output directory exists

if not os.path.isdir(output_directory):
logger.error("Output directory '{}' does not exist.".format(output_directory))
sys.exit(1)

logger.info("Starting the program")
logger.info("input file: {} ".format(file_with_links))
logger.info("ouptput dir: {} ".format(output_directory))

Open the file with YouTube links

with open(file_with_links, 'r') as fh:
count = 0

for link in fh:
    link = link.strip()
    if count >= limit:
        logger.info("limit: {} reached".format(limit))
        break

    logger.info(f"Processing entry {count + 1}: {link}")

    # Run the command and capture the output
    command = "yt {} | fabric -sp extract_recipe".format(link)
    output = subprocess.getoutput(command)
    lines_to_add = [" ", "### Youtube link ", " ", link, " "]
    output += "\n".join(lines_to_add)

    # Extract the recipe title from the 2nd or 3rd line after the ### Title
    match = re.search(r"### Title(.*?)### Objective", output, re.DOTALL)
    if match:
        filename = match.group(1).strip()  # Remove leading and trailing whitespace
        filename = split_filename(filename)
    else:
        filename = "recipe_{}.txt".format(count)

    # Construct the full path for the output file
    output_file = os.path.join(output_directory, filename)

    # Write the output to the file
    with open(output_file, 'w') as out_fh:
        out_fh.write(output)


    logger.info(f"wrote to file:  {output_file}")
    logger.info(f"Finished processing entry {count + 1}: {link}")

    count += 1
    time.sleep(20)

logger.info("Ending the program")

`

@jon2allen jon2allen added the enhancement New feature or request label Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant