You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I created a custom patter to pull cooking videos and extract a recipe. Handles Chinese, could be modified to handle other languages.
Instructional Video Transcript Extraction
Identity
You are an expert at extracting clear, concise step-by-step instructions from cooking video transcripts.
Goal
Extract ingredients and quantities. present the key instructions from the given transcript in an easy-to-follow format.
Process
Read the entire transcript carefully to understand the video's objectives.
Identify and extract the main actionable steps and important details.
Organize the extracted information into a logical, step-by-step format.
Summarize the video's main objectives in brief bullet points.
Present the instructions in a clear, numbered list.
If this is Chinese, list English and characters for ingredients
Output Format
Title of recipe
list title of recipe. first in a file friendly format and then in native format
Objectives
[List 3-10 main objectives of the video in 15-word bullet points]
Instructions
[First step]
[Second step]
[Third step]
[Sub-step if applicable]
[Continue numbering as needed]
Guidelines
Ensure each step is clear, concise, and actionable.
Use simple language that's easy to understand.
Include any crucial details or warnings mentioned in the video.
Maintain the original order of steps as presented in the video.
Limit each step to one main action or concept.
Example Output
Title
egg_omlet.txt - Egg Omelet
Objectives
Learn to make a perfect omelet using the French technique
Understand the importance of proper pan preparation and heat control
Ingredients
3 eggs
1 tablespoon of oil
1 dash of salt
1 dash of MSG
1 teaspoon of water
Instructions
Crack 2-3 eggs into a bowl and beat until well combined.
Heat a non-stick pan over medium heat.
Add a small amount of butter to the pan and swirl to coat.
Pour the beaten eggs into the pan.
Using a spatula, gently push the edges of the egg towards the center.
Tilt the pan to allow uncooked egg to flow to the edges.
When the omelet is mostly set but still slightly wet on top, add fillings if desired.
Fold one-third of the omelet over the center.
Slide the omelet onto a plate, using the pan to flip and fold the final third.
Serve immediately.
[Insert transcript here]
Then I use a python script like the following to extract out the file name and save it to a directory. I can batch this up with a list of youtube links. it seems to handle no transcripts well. and if isn't food but a like a knife discussion - it seems to handle that too.
==== python3 script ===============
`#!/usr/bin/python3
import os
import sys
import subprocess
import time
import re
import logging
from logging.handlers import RotatingFileHandler
def split_filename(string):
parts = string.split(' - ')
if parts:
return parts[0]
else:
return string
if len(sys.argv) < 3:
logger.error("Usage: {} <file_with_youtube_links> <output_directory> [limit]".format(sys.argv[0]))
print("Args problem - see log")
sys.exit(1)
file_with_links = sys.argv[1]
output_directory = sys.argv[2]
limit = int(sys.argv[3]) if len(sys.argv) > 3 else 10 # Default limit to 10 if not provided
Ensure the output directory exists
if not os.path.isdir(output_directory):
logger.error("Output directory '{}' does not exist.".format(output_directory))
sys.exit(1)
for link in fh:
link = link.strip()
if count >= limit:
logger.info("limit: {} reached".format(limit))
break
logger.info(f"Processing entry {count + 1}: {link}")
# Run the command and capture the output
command = "yt {} | fabric -sp extract_recipe".format(link)
output = subprocess.getoutput(command)
lines_to_add = [" ", "### Youtube link ", " ", link, " "]
output += "\n".join(lines_to_add)
# Extract the recipe title from the 2nd or 3rd line after the ### Title
match = re.search(r"### Title(.*?)### Objective", output, re.DOTALL)
if match:
filename = match.group(1).strip() # Remove leading and trailing whitespace
filename = split_filename(filename)
else:
filename = "recipe_{}.txt".format(count)
# Construct the full path for the output file
output_file = os.path.join(output_directory, filename)
# Write the output to the file
with open(output_file, 'w') as out_fh:
out_fh.write(output)
logger.info(f"wrote to file: {output_file}")
logger.info(f"Finished processing entry {count + 1}: {link}")
count += 1
time.sleep(20)
logger.info("Ending the program")
`
The text was updated successfully, but these errors were encountered:
What do you need?
I created a custom patter to pull cooking videos and extract a recipe. Handles Chinese, could be modified to handle other languages.
Instructional Video Transcript Extraction
Identity
You are an expert at extracting clear, concise step-by-step instructions from cooking video transcripts.
Goal
Extract ingredients and quantities. present the key instructions from the given transcript in an easy-to-follow format.
Process
Output Format
Title of recipe
Objectives
Instructions
Guidelines
Example Output
Title
egg_omlet.txt - Egg Omelet
Objectives
Ingredients
3 eggs
1 tablespoon of oil
1 dash of salt
1 dash of MSG
1 teaspoon of water
Instructions
[Insert transcript here]
Then I use a python script like the following to extract out the file name and save it to a directory. I can batch this up with a list of youtube links. it seems to handle no transcripts well. and if isn't food but a like a knife discussion - it seems to handle that too.
==== python3 script ===============
`#!/usr/bin/python3
import os
import sys
import subprocess
import time
import re
import logging
from logging.handlers import RotatingFileHandler
def split_filename(string):
parts = string.split(' - ')
if parts:
return parts[0]
else:
return string
Set up logging
log_formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
log_file = 'script.log'
log_handler = RotatingFileHandler(log_file, maxBytes=1000000, backupCount=5)
log_handler.setFormatter(log_formatter)
logger = logging.getLogger()
logger.setLevel(logging.INFO)
logger.addHandler(log_handler)
logger.propagate = False # Prevent logging to standard output
Check for correct number of arguments
if len(sys.argv) < 3:
logger.error("Usage: {} <file_with_youtube_links> <output_directory> [limit]".format(sys.argv[0]))
print("Args problem - see log")
sys.exit(1)
file_with_links = sys.argv[1]
output_directory = sys.argv[2]
limit = int(sys.argv[3]) if len(sys.argv) > 3 else 10 # Default limit to 10 if not provided
Ensure the output directory exists
if not os.path.isdir(output_directory):
logger.error("Output directory '{}' does not exist.".format(output_directory))
sys.exit(1)
logger.info("Starting the program")
logger.info("input file: {} ".format(file_with_links))
logger.info("ouptput dir: {} ".format(output_directory))
Open the file with YouTube links
with open(file_with_links, 'r') as fh:
count = 0
logger.info("Ending the program")
`
The text was updated successfully, but these errors were encountered: