Overview

This chat focuses on your past experience there’s not much you need to do to prepare. However, it would be advisable to pull your thoughts together ahead of time, and here’s the structure you can expect:

  • Experience - covering the type of roles you’ve held, the breadth and depth of responsibility, and size of projects managed. This is also your opportunity to showcase your content expertise in your field of work
  • Technical breadth and depth - At FB, we emphasize collaboration across multiple teams and individuals, so be able to talk about how your work has spanned multiple teams and/or iterations
  • TL (project management) skills - including technical mentoring – think about your role in the setup, execution and delivery of a project
  • People management skills - including upward influencing, mentorship, empathy, people growth etc.
  • Agility - Indicator of your growth opportunities, as measured through your capability and willingness to absorb knowledge from your experience, and use these lessons to become even more effective in new and different situations
  • Motivation - What would you be interested in working on at Facebook, or why are you interested in working to make Facebook better? Do you exhibit a strong general drive and desire to progress as a leader?
  • Meta/Barlas things: IR, DPR, Llama 2, GenAI, LoRA

  • Also, don’t be afraid to be vulnerable and talk about difficult subjects, show senior level leadership qualities as this is a senior role.

Experience

Amazon Music

  • Currently, at Amazon, I lead the Music team for query understanding (finetuned an LLM input text, output is the API call. Trained base LLM, finetuned with API data with human annotations) and personalization.
    • So as a user interacts with an Alexa device, say they say “play music”, we need to help understand the request and personalize it with the customers detail.
      • This involves Entity recognition: identifying song names, artist names, genre which is important because we need to know which playlist/radio station to play (we dont want to just play 1 song)
      • Intent classification:(Play music intent function call name, resolves query to match it to the function call, build arguments, slot filling, multi-step questions) “play songs by Adele” vs “play Adele’s latest music”
      • context understanding: users location, time of day, holidays, kid and explicit (find restaurant, lattitude longitude)
        1. Intent Recognition:
        • By analyzing the query, the system identifies that the user wants to “play music.” So, the main function to be called is playMusic.
      1. Slot Filling:
        • Song Name (track): Extract the song name from the query. In this case, it’s “Hello.”
        • Artist Name (artist): Extract the artist name. Here, it’s “Adele.”
        • Device (device): Extract where the user wants to play the music. Here, it’s “living room speaker.”
        • Volume (volume): Extract the desired volume. Here, it’s “60%.”
      2. Argument Building: Based on slot filling, construct the arguments for the function call.
        • track="Hello"
        • artist="Adele"
        • device="living room speaker"
        • volume=60
      3. Resolving the Query to Function Call: The LLM maps the extracted intent and slots to the function call: playMusic(track="Hello", artist="Adele", device="living room speaker", volume=60)

      4. Handling Multi-step Questions: Sometimes, a single query may not provide all the needed information. The system might need to ask follow-up questions. For instance, if the user just says “Play ‘Hello’”, the system might ask:
        • “Which artist’s ‘Hello’ would you like to play? Adele or Lionel Richie?”
        • Based on the user’s response, it can then construct the appropriate function call.
      5. Execution: Once the function call is constructed, the system will execute it, triggering the desired action.

In essence, using the LLM, you’re dynamically translating natural language instructions into structured function calls that the system can understand and act upon. This approach makes interactions intuitive for users while ensuring precise actions on the backend.

  • Music retrieval process:
    • This process kicks in once we understand the users intent and we want to retrieve music according to the queries criteria
    • indexing, ranking (DL, collaborative), diversification, feedback loop
    • FAISS can be used for music retrieval by creating an index of high-dimensional vectors representing song features. When a user requests music, the query’s embedding is matched against the FAISS index to quickly retrieve similar songs. Over time, user preferences can be incorporated to refine recommendations, ensuring efficient and personalized music selections
  • Ex:
  • Main music recommender system, content based, cold start
    • If a user says, “Play relaxing jazz music,” the system understands the intent (Play music), the genre (Jazz), and the mood (Relaxing). The retrieval system then fetches a playlist of relaxing jazz tracks, ranked based on the user’s historical preferences and other contextual cues.
    • If a user inquires, “Play the latest song by Adele,” the query understanding phase extracts ‘latest song’ as the intent and ‘Adele’ as the entity. The retrieval system then looks for the most recent track by Adele in its database.
  • It’s about gauging both the implicit and explicit needs and delivering a seamless music experience.
  • Our team’s focus is around customer growth so we serve recommendations that will help grow our customer base
    • This includes, Next Best Action via Multi-armed bandit, where we look to educate inactive users by giving them 3 personalized push notifications, prompting them to perform different actions on the app.
      • The number 3 was decided after several experimentation where we didn’t want to bombard the user but still educate them
    • We also have a partnership with Amazon.com retail where we find correlation between retail products and music latent factors and have it on the Amazon.com page item to item

NuAIg

  • Spinoff from Oracle in the healthcare domain automating administrative and operational task
  1. Creating a Clinical Documentation Tool:
    • Named Entity Recognition (NER): To identify specific entities in the text, such as patient names, medication names, diseases, procedures, dates, and other relevant medical terms.
    • Information Extraction: Beyond just recognizing entities, this task involves extracting relationships and attributes associated with these entities. For instance, understanding that a specific drug was prescribed for a particular symptom or disease.
    • Text Classification: To categorize different parts of a clinical note (e.g., diagnosis section, treatment section, patient history).
    • Topic Modeling: To automatically identify the main topics covered in a clinical note, aiding in quick summarization.
  2. Designing an Information Retrieval System: –> FAISS
    • Document Indexing: Efficiently indexing medical guidelines, patient data, and treatment options for rapid retrieval.
    • Query Understanding: Interpreting what a user (possibly a healthcare professional) is looking for, even if their query is in natural, conversational language.
    • Document Ranking: Sorting the retrieved documents by relevance based on the user’s query and possibly other factors like patient specifics.
    • Semantic Search: Using embeddings and other advanced techniques to ensure the retrieval system understands the meaning and context, not just keyword matches.
  3. Automating Claims Processing:
    • Named Entity Recognition (NER): As mentioned earlier, this would be used to identify specific entities like patient names, diseases, treatments, amounts, dates, etc.
    • Text Classification: To categorize different sections of the claim form or to determine if a particular document is, in fact, a claim.
    • Relationship Extraction: To understand the relationships between entities. For instance, connecting a diagnosis with a specific treatment or procedure.
    • Automated Form Filling: Once relevant information is extracted, populating standardized forms or databases using the extracted data.
    • Error Detection: Using NLP to spot inconsistencies or errors in claims, ensuring higher accuracy.

Oracle

  1. Modeling Server Capacity Data to Predict Outages:
    • ML Techniques:
      • Time Series Analysis & Forecasting: Methods like ARIMA, Prophet, or LSTM (Long Short-Term Memory networks) to predict server capacity based on historical data.
      • Regression Models: For predicting capacity, techniques like Linear Regression or Support Vector Regression might be relevant.
      • Random Forest & Gradient Boosting: Ensemble methods that can predict server outages based on a multitude of factors and historical data.
  2. Predicting Server Health Using LogBERT to Understand Anomalies:
    • NLP Techniques:
      • Transfer Learning: Using a pre-trained model like BERT (in this case, a variant called LogBERT) and fine-tuning it to analyze server logs.
      • Semantic Embeddings: Representing server logs as vectors in a high-dimensional space using embeddings derived from models like BERT.
    • ML Techniques:
      • Anomaly Detection: Techniques such as One-Class SVM, Isolation Forest, or Autoencoders can be employed to detect anomalies in the log embeddings.
      • Clustering: Using unsupervised algorithms like K-Means or DBSCAN to cluster similar logs and identify potential anomalous patterns.
  3. Outlier Detection for Current Latency and Storage Models:
    • ML Techniques:
      • Statistical Methods: Techniques like the Z-Score, Box-Plot, or IQR (Interquartile Range) for basic outlier detection.
      • Isolation Forest: A tree-based method designed specifically for anomaly and outlier detection.
      • Density-Based Spatial Clustering (DBSCAN): Useful for detecting clusters in data and identifying points that do not belong to any cluster as potential outliers.
      • Autoencoders: Neural network-based approach where the network is trained to reproduce the input data, but anomalies produce higher reconstruction errors.

Research

  • I am a research fellow at the University of South Carolina where I collaborate on a few publications I focus mostly on NLP with a little vision and multimodality
  • EMNLP:
    • Counter Turing Test (CT2): AI-Generated Text Detection is Not as Easy as You May Think – Introducing AI Detectability Index (Best Paper Award)
      • Our definition of ADI is a composition of two linguistic measures: lexical measure (burstiness) and syntactic measure (perplexity). We have combined them based on empirical observation using a density function defined by Le Cam’s Lemma. We have also discussed and self-criticized in Appendix F, pg. 28/33, that ADI could be reformulated using similar, alternative features. Finally, we have discussed and hinted how future researchers may extend the definition of ADI.
      • Additionally, regarding the novelty of the work, to the best of our knowledge, such a metric or definition has not yet been formally pursued in our community of researchers, or on this topic. If you disagree, we would really appreciate you citing resources for the same.
      • The research paper conducts a comprehensive examination of the effectiveness of AI-Generated Text Detection (AGTD) techniques across 15 Large Language Models (LLMs). It finds that existing methods of detection are not sufficiently robust against state-of-the-art models. In response to this, the authors present the AI Detectability Index (ADI), a new metric that fuses two linguistic measures - burstiness (lexical) and perplexity (syntactic). This metric is derived empirically using Le Cam’s Lemma, and its composition is also explored in depth within the paper’s appendices. The authors suggest that ADI can be further refined and expanded upon by future research. They emphasize the novelty of their work, asserting that this type of metric has not been previously explored in the scientific community, underscoring the paper’s innovative nature.
  • CONFLATOR: Code Mixing:
    • Certainly, I’ll explain the given passage in simpler terms:
      1. Switching-Point Based Rotary Positional Encoding:
      • The authors introduce a new way to handle positional encoding in neural networks. Positional encoding is a technique used in Transformer architectures (a popular neural network model) to understand the position or order of words in a sentence.
      • The new technique revolves around the idea of “switching points.” Whenever a switch from one language to another occurs in a code-mixed sentence, they change the rotation (or tweak the positional encoding). This helps the model learn when and how languages are mixed within a sentence.
    1. CONFLATOR:
      • This is a new neural network model designed specifically for languages that are code-mixed, like Hinglish.
      • The primary innovation in CONFLATOR is its use of the aforementioned switching-point based rotary positional encoding. Initially, the model looks at each word individually to determine if a switch has occurred. Then, it examines pairs of words (bigrams) to refine its understanding.
    2. Empirical Evidence:
      • The authors claim to have evidence that CONFLATOR successfully learns the patterns of how languages are mixed together in Hinglish. They compare its performance to other models that use different methods to understand the order of words. Their findings suggest that CONFLATOR does a better job at this than other models, as depicted in “Figure 5” (which we don’t have access to in the given text). - In a nutshell, this paper is about introducing a new technique and model for understanding and processing sentences where two languages are mixed together, with a specific focus on the mix of Hindi and English known as “Hinglish.”
  • Textual Diffusion with Hallucination
    • Where we’re looking to incorporate factual ground truth during the denoising process to see if that can help mitigate hallucination.

Projects mentioned LoRA

Of course! I’ll explain the procedures and intentions behind each of these tasks:

  1. Few-Shot Learning with Pre-trained Language Models:

    • Performed few-shot learning with pre-trained LLM: This means that a small amount of data was used to adapt (“fine-tune”) pre-existing language models (likely designed for broad tasks) to perform a more specific task. The fact that the models are pre-trained indicates that they already have a good grasp of the language due to previous training on large datasets.

    • such as GPT and BERT from HuggingFace’s libraries: The pre-trained models used were GPT and BERT, which are prominent models for understanding language context. These models were sourced from HuggingFace, a leading provider of state-of-the-art language models.

    • Experimented with more sophisticated fine-tuning methods such as LoRA: After starting with basic fine-tuning, more advanced methods were employed. LoRA (Localized Re-adaptation) is one such method that provides a sophisticated way to adapt a pre-trained model to a new task using a limited amount of data.

    • Used PyTorch framework: All the experiments and model training were done using PyTorch, which is a popular deep learning framework. This gives information about the tools and libraries that might have been employed during the procedure.

  2. Multitask Training for Recommender Systems:

    • Implemented a multi-task movie recommender system: A recommender system was developed that can handle multiple tasks simultaneously. In this context, it might mean recommending various types of content or handling different aspects of recommendations concurrently.

    • based on the classic Matrix Factorization and Neural Collaborative Filtering algorithms: The foundational techniques used for this recommender system are:

      • Matrix Factorization: It’s a technique where user-item interactions are represented as a matrix, and then this matrix is decomposed into multiple matrices representing latent factors. This is a traditional technique often used in recommendation systems.
      • Neural Collaborative Filtering: This is a more modern technique that uses neural networks to predict user-item interactions, thus providing recommendations.

In summary, the first task involved adapting large, general-purpose language models for specific tasks using a small amount of data, while the second task was about building a multi-task recommendation system using traditional and neural techniques.

Technical Breadth

  • I love collaboration and thinking outside the box. Amazon devices, the primary goal was for users to shop.
  • So what I’ve been trying to do is, find correlation between retail items and songs, both for the website and Alexa as well.
  • Item to item recommendations are bread and butter of Amazon

People management

  • I like to lead with empathy
  • Mentorship: make sure everyone has a mentor, helping them find one if not
  • people growth
  • upward influencing: offerring solutions, understanding the perspectives and goals

Agility

Motivation

  • The research coming out of Meta is an inspiration in itself, Meta is a trailblaizer in so many domains:
  • Text to speech: Voicebox where its able to do speech generation tasks it was not necessarily trained on
  • Pure NLP: No language left behind project with translations between 200 languages and including the work with low-resource languages is something I really connect with.
  • Recommender system: embedding based retrieval and so many more
  • And I imagine Smart glasses org to be a culmination of all of this research and so, to be given the opportunity to work there would be a true joy.

Questions for the manager

  • Team structure, I assume since it’s lenses, theres collaboration with a vision team. Are there other modalities at play?
  • Hallucination is the biggest problem with LLMs
  • Smart Glasses (SG) Language AI
  • We focus on Conversational AI, SG Input AI, Knowledge-enriched Discovery AI, Privacy ML and AI Efficiency. Our system powers critical SG launches thrusted by the Meta leadership. We have strong scientists and engineers, solving challenging AI problems with both Cloud based large models and On-Device ML models. Join us if you are passionate about AI-centric next-gen computing platforms and pushing advanced AI at production scale!

  • Our team: Smart input team’s mission is to enhance input and messaging experience on these smart glasses. Imagine being able to receive your whatsapp messages, and being able to get more context (summary) and respond in a natural way just like how you would have a conversation with a human, all while wearing your glasses and not taking your attention off the things you are doing (like biking, cooking, walking with your grocery bags).
  • Our tech: We want to bring ChatGPT capabilities on-device. We build the capabilities similar to what ChatGPT can do for messaging but with a model that is much smaller to be able to fit on the glasses. This is a niche tech space with big opportunities to innovate on LLMs, on-device ML, privacy ML such as Federated learning, on-device personalization. This team aims to ship these cool technologies to drive end user value.
  • While the rest of the world is going after making LLMs work on the servers, we are taking a bigger challenge to make LLMs work on-device.

  • In a more constrained use case, such as using natural language processing (NLP) to interpret voice commands for Amazon Music, the problem of hallucination might be less prominent. In this case, the system is less likely to “make things up” because it’s not primarily generating content but rather interpreting and executing commands based on user input. If the NLP system doesn’t understand a command, it’s more likely to ask for clarification or fall back to a default action rather than inventing an inappropriate response.
  • However, hallucination could still be an issue if the system uses language models to generate responses or explanations. For example, if you ask Alexa for information about a particular artist or song, and it uses a language model to generate the response, it could potentially hallucinate details that aren’t true.
  • In any case, whether hallucination is a problem depends on the specific application and how the system is designed and used. It’s an active area of research in AI to find ways to mitigate this issue, especially as language models are being used in more diverse and impactful applications. Techniques like fine-tuning the model on specific tasks or data, utilizing structured data sources to ground the responses, or using model validation to check the outputs could help to limit hallucination.