Saturday, October 19, 2024

Using UI with AI

If your web or mobile app has multiple user interface (UI) commands, (such as log in, register, search, show products, change user settings), users might struggle to know exactly where to click. The UI would be much more user-friendly if an AI could interpret user speech and convert it into commands that can be handled by the backend. Today’s AI is robust enough to map different phrases that mean the same thing to a single command. For example, a user might say "register" or "create a new account," and both can be mapped to the command "sign_up." The AI can understand both English and Turkish, for example "bana yeni bir kullanıcı oluştur" correctly maps to "sign_up". Here is a demo in Python:
# Listen to user speech and convert it to UI commands
# Şamil Korkmaz, 20.10.2024
import speech_recognition as sr
import openai
import time
import getpass
from datetime import datetime
import json
def record_audio(duration):
"""Record audio from microphone for specified duration"""
recognizer = sr.Recognizer()
with sr.Microphone() as source:
print(f"Recording for {duration} seconds...")
audio = recognizer.record(source, duration=duration)
print("Recording complete!")
return audio
def transcribe_audio(audio):
"""Transcribe audio using OpenAI Whisper"""
try:
# Save audio to temporary file
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
temp_filename = f"temp_audio_{timestamp}.wav"
with open(temp_filename, "wb") as f:
f.write(audio.get_wav_data())
# Use OpenAI client for Whisper
with open(temp_filename, "rb") as audio_file:
transcript = openai.audio.transcriptions.create(
model="whisper-1",
file=audio_file
)
return transcript.text
except Exception as e:
print(f"Error in transcription: {str(e)}")
return None
# Function to get the user command and process it using the chat endpoint
def get_user_command_action(user_command):
messages = [
{'role': 'system', 'content': 'You are an assistant that provides structured JSON responses based on user commands.'},
{'role': 'user', 'content': f"Interpret the following user command: '{user_command}' and provide the action as a structured JSON response. The available actions are:\n1. log_in\n2. sign_up\n3. view_products\n4. search_products\n5. add_to_cart\n6. checkout\n7. view_orders\n8. log_out\n9. contact_support\nRespond with the appropriate action as: {{\"action\": \"<action_name>\"}}"}
]
response = openai.chat.completions.create(
model='gpt-3.5-turbo',
messages=messages
)
action = response.choices[0].message.content
return action
def process_action(action_data):
action = json.loads(action_data).get('action', 'unknown')
# Define action handling logic
if action == 'log_in':
print("Redirecting to Log In page...")
elif action == 'sign_up':
print("Redirecting to Sign Up page...")
elif action == 'view_products':
print("Showing product listings...")
elif action == 'search_products':
print("Initiating product search...")
elif action == 'add_to_cart':
print("Adding product to cart...")
elif action == 'checkout':
print("Proceeding to checkout...")
elif action == 'view_orders':
print("Showing order history...")
elif action == 'log_out':
print("Logging out...")
elif action == 'contact_support':
print("Redirecting to support page...")
else:
print("Unknown action!")
def main():
openai.api_key = "YOUR OPEN AI KEY"
try:
audio = record_audio(5)
# Transcribe audio
print("\nTranscribing audio...")
transcript = transcribe_audio(audio)
if transcript:
print(f"\nTranscript: {transcript}")
print("\nGetting response from GPT...")
response = get_user_command_action(transcript)
if response:
print("\nGPT Response:")
print(response)
process_action(response)
except Exception as e:
print(f"An error occurred: {str(e)}")
if __name__ == "__main__":
main()
view raw ui_with_ai.py hosted with ❤ by GitHub
When you use an API, such as OpenAI, the main disadvantage is that you must pay for every API call. Therefore, using voice commands to control the UI should be limited to paying customers, and there should be rate limits in place to keep costs under control. You might use open-source models like LLaMA to run the AI on your own server, but that would require better computational and memory resources than you currently have.

17.02.2025: Open source AI models like DeepSeek open the door to self hosted AI. You will need a powerful server with lots of RAM and GPU. Such servers might cost more than AI API calls if you use a cloud server. One solution might be to have your own physical server to run the AI model and use the cloud server for the web app, which makes API calls to the AI on your server.

Sunday, October 13, 2024

Evaluating fairness of an Investment/Shareholders' Agreement

When an investor invests in a startup, they present the company with an Investor/Shareholder Agreement, which outlines the rights and obligations of the shareholders within the company, including voting rights, share transfer procedures, dividend policies, and protections for minority shareholders. These agreements safeguard the investor's financial interests and govern their relationship with the company and other shareholders.
It is common for the initial investor to have more rights. Normally, early investors expect the following protections:
  • Board representation (a seat on the board).
  • Veto rights on major financial decisions (e.g., capital raises, mergers, or asset sales).
  • Approval rights over key hires or changes in the business direction.
  • Liquidation preferences to get paid first in case of a company exit.
But sometimes their demands can be excessive. To evaluate the fairness of such agreements, you can upload the proposed agreement to chatGPT and use the following prompt:
Does the contract reflect a balanced distribution of power? If not, what share percentage would correspond to the class B shareholder's power? Is this normal for an initial investor in the startup, considering that convincing the first investor is often the most difficult?
If the agreement grants a 15% share in the company but provides 50% control, which is more aligned with a controlling or near-majority stake, we can say the agreement is not fair. Common unfair clauses in such agreements are:
  • Even though the class A shareholder (founder) appoints the board member, any change in the board representative requires class B shareholder (investor) approval.
  • Class A shareholder cannot transfer shares without class B consent for 3 years, while class B share holder is free to transfer to affiliates or related parties without restriction.
  • Important strategic decisions need approval from the Steering Committee, where both A and B share holders have one representative each. Any deadlock in this two-person committee could give the B shareholder veto power over important business decisions​.
You can use the following questions to persuade the investor to be more flexible:
  • Do we agree that the founder/CEO of an ambitious startup with rapid growth goals needs to be able to act quickly, requiring minimal approval/bureaucracy?
  • To fund rapid growth, we most probably will need other investors. Can we foresee that this agreement might irritate potential investors, lower the company's value, and lead them to request the same privileges? 
  • If the same privileges are granted to other investors, would reaching an agreement on any matter outside of routine business—especially considering the potential for irrational behavior (such as ego conflicts, etc.)—become practically impossible?
The goal is to help the investor see the mutual benefits of more flexible terms. You want to highlight that while protecting their investment is important, collaboration, agility, and attracting future investment will ultimately lead to better outcomes for both parties.