Skip to content

Conversation

@StanChan03
Copy link
Collaborator

@StanChan03 StanChan03 commented Feb 27, 2025

Introduces CoT for sem_extract as well as supporting DeepSeek-R1 CoT for other semantic operators.

Example output with sem_map:

0        Probability and Random Processes  Applied Probability and Stochastic Processes  Okay, so I need to figure out what a similar c...
1     Optimization Methods in Engineering                      Engineering Optimization  Okay, so I need to figure out what a similar c...
2  Digital Design and Integrated Circuits                          Digital Logic Design  Okay, so I need to figure out what a similar c...
3                       Computer Security                                 Cybersecurity  Okay, so I need to figure out what a similar c...

Example output with sem_filter:

                                             Reviews  filter_label                                 explanation_filter
0  I absolutely love this product. It exceeded al...          True  Okay, so I need to figure out if the claim is ...
1  Terrible experience. The product broke within ...         False  Okay, so I need to determine whether the claim...
2           The quality is average, nothing special.         False  Okay, so I need to figure out whether the clai...
3                Fantastic service and high quality!          True  Okay, so I need to determine if the claim is t...
4              I would not recommend this to anyone.         False  Okay, so I need to determine whether the claim...

Users will need to specify the reasoning_parser. Example:
df = df.sem_map(user_instruction, return_explanations=True, strategy=ReasoningStrategy.ZS_COT)

New Type for models: ReasoningStrategy(Enum)

@StanChan03
Copy link
Collaborator Author

StanChan03 commented Feb 27, 2025

extract_deepseek_cot.py is broken right now. Getting APIconnectionError. Running Deepseek on Ollama, but issue with function_call['name']. Seems to be Litellm error

@StanChan03 StanChan03 changed the title DeepSeek CoT Support DeepSeek CoT Support for Filter and Map Mar 3, 2025
@StanChan03 StanChan03 marked this pull request as ready for review March 3, 2025 00:29
@StanChan03 StanChan03 requested review from liana313 and sidjha1 March 3, 2025 00:29
@liana313
Copy link
Collaborator

The user shouldn't need to specify the type of reasoning -- we should set it based on the model, or just take the output string and check whether there is a token or something else


# post process results
postprocess_output = postprocessor(lm_output.outputs, strategy in ["cot", "zs-cot"])
postprocess_output = postprocessor(lm_output.outputs, model, strategy in ["cot", "zs-cot"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could consider having class ReasoningStrategy(Enum) to be more clear on what all the different methods are. As I read the code, it feels things are getting a bit unwieldy.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I agree

@liana313 liana313 requested a review from harshitgupta412 April 2, 2025 23:30
@liana313 liana313 merged commit d90d861 into main Apr 9, 2025
13 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants